☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
Updated
Jan 30, 2024 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
💄 Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Advanced and Fast Data Transformation in R
Like awk but with SQL and table joins
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
📄 Concise selector to extract JSON from HTML.
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
A simple Spark-powered ETL framework that just works 🍺
A curated list of Clojure resources for dealing with domain-specific languages.
Data transformation and utility functions for R
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."