Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,269 public repositories matching this topic...
🏆 Spark4You Design patterns
-
Updated
May 17, 2024 - Shell
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
May 17, 2024 - Python
Official code repository for GATK versions 4 and up
-
Updated
May 17, 2024 - Java
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
May 17, 2024 - Scala
Final project for the course 'Architecture for Large Data Volumes', taught in the Bachelor's program in Data Science at ITAM
-
Updated
May 17, 2024
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
-
Updated
May 17, 2024 - TypeScript
New generation decentralized data lake and a streaming data pipeline
-
Updated
May 16, 2024 - Rust
Host a Docker container for the Spark history server / Spark UI of AWS Glue jobs
-
Updated
May 16, 2024 - Dockerfile
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
May 16, 2024 - C++
Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.
-
Updated
May 16, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 414 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia