Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
May 21, 2024 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Lean and mean distributed stream processing system written in rust and web assembly.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Cloud-native, data onboarding architecture for Google Cloud Datasets
The framework for fast development and deployment of RAG systems.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Framework for standardizing, transforming, and applying quality checks to time series data.
One framework to develop, deploy and operate data workflows with Python and SQL.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Performance Observability for Apache Spark
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Conductor OSS SDK for Python programming language
Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."