Open source platform for the machine learning lifecycle
-
Updated
Jun 10, 2024 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Open source platform for the machine learning lifecycle
lakeFS - Data version control for your data lake | Git for data
Activities of Data Analysis.
Free High-Quality Financial Data in Azure
Collection of Apache Spark docker images for OKDP
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
An Apache Spark application to analyze word frequencies and compute TF-IDF weights across multiple text file sets using Spark's MLlib library.
REST API for Apache Spark on K8S or YARN
Experiment tracking server focused on speed and scalability
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
In the IBM Advanced Data Science specialization, an interactive real-time web application was developed using LSTM networks in TensorFlow to predict stock market trends for global companies.
Fully managed Apache Parquet implementation
Don't Panic. This guide will help you when it feels like the end of the world.
Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.
spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations
This project includes both Diabetes Prediction using Machine Learning Algorithms and Graph Analysis using Neo4j. Have a look at the Report for complete understanding.
The Proxima platform.
Created by Matei Zaharia
Released May 26, 2014