pyspark
Here are 3,394 public repositories matching this topic...
the portable Python dataframe library
-
Updated
Jun 4, 2024 - Python
Stack Overflow response time prediction machine learning modelling
-
Updated
Jun 4, 2024 - Jupyter Notebook
Open Targets python framework for post-GWAS analysis
-
Updated
Jun 4, 2024 - Jupyter Notebook
Simple and Distributed Machine Learning
-
Updated
Jun 4, 2024 - Scala
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
-
Updated
Jun 4, 2024 - Python
Calculating Edit Distance with PySpark
-
Updated
Jun 4, 2024 - Jupyter Notebook
Hopsworks - Data-Intensive AI platform with a Feature Store
-
Updated
Jun 4, 2024 - Java
An open source, standard data file format for graph data storage and retrieval.
-
Updated
Jun 4, 2024 - C++
A database-like benchmark of feature generation from time-series data
-
Updated
Jun 4, 2024 - Jupyter Notebook
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
-
Updated
Jun 4, 2024 - Python
Dataproc templates and pipelines for solving simple in-cloud data tasks
-
Updated
Jun 4, 2024 - Python
Data Mining Course 2023/24 at AGH UST
-
Updated
Jun 3, 2024 - Jupyter Notebook
A tool for building feature stores.
-
Updated
Jun 4, 2024 - Python
πππ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker πΊ
-
Updated
Jun 3, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."