1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
May 23, 2024 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Visualize and compare datasets, target values and associations, with one line of code.
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Automatically find issues in image datasets and practice data-centric computer vision.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Monitor the stability of a Pandas or Spark dataframe ⚙︎
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Code review for data in dbt
Open-source metadata collector based on ODD Specification
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Swiple enables you to easily observe, understand, validate and improve the quality of your data
Add a description, image, and links to the data-profiling topic page so that developers can more easily learn about it.
To associate your repository with the data-profiling topic, visit your repo's landing page and select "manage topics."