Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
-
Updated
Jun 11, 2024 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Intel® End-to-End AI Optimization Kit
One ETL tool to rule them all
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Collaborative Filtering based on Google Analytics 360 data from BigQuery.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An open protocol for secure data sharing
New generation decentralized data lake and a streaming data pipeline
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created by Matei Zaharia
Released May 26, 2014