The open-source tool for building high-quality datasets and computer vision models
-
Updated
May 29, 2024 - Python
The open-source tool for building high-quality datasets and computer vision models
Archeologická mapa České republiky
Interactively explore unstructured datasets from your dataframe.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Python package to make URL extraction, generalization, validation, and filtration easy.
Package that builds a JSON inventory/manifest from public primary or derived datasets
Client interface for all things Cleanlab Studio
A web service for semi-automated conversion of raw imaging data to BIDS
Rebalancing chemical reaction
Repository for Data Curation Process Ontology
🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
A curated, but incomplete, list of data-centric AI resources.
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023)
Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.
Gene Curator is an open-source platform for managing and curating genetic data. It facilitates gene data analysis, entry, and reporting, serving genetics researchers with tools for efficient data handling.
Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
Graph-based NLP framework leveraging a curated database and an intuitive CLI for advanced, context-rich language understanding.
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Track model training experiments with MLflow and FiftyOne!
Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.
To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."