lakehouse

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.

Updated Jun 10, 2024
Java

ByConity / ByConity

Star

ByConity is an open source cloud data warehouse

cloud sql clickhouse s3 snowflake olap kubernets clickhouse-database tiktok bytedance lakehouse

Updated Jun 10, 2024
C++

tspannhw / FLiPStackWeekly

Star

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

streaming cloudera apachespark apachekafka timspann apachenifi lakehouse apacheflink apacheiceberg

Updated Jun 10, 2024

datastrato / gravitino

Star

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

metadata data-catalog datalake stratosphere federated-query lakehouse model-catalog metalake skycomputing ai-catalog opendatacatalog

Updated Jun 9, 2024
Java

prestodb / pbench

Star

Presto/Prestissimo Benchmark Toolset

benchmarking presto lakehouse

Updated Jun 7, 2024
Go

ComputeAI / computeAI-integrations

Star

Supercharge Your Compute for Analytics & AI

sql database analytics data-warehouse olap parquet iceberg hive-table lakehouse external-table

Updated Jun 7, 2024
Jupyter Notebook

apache / amoro

Star

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

bigdata datalake lakehouse

Updated Jun 9, 2024
Java

data-dot-all / dataall

Star

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws data-science data aws-s3 redshift etl-framework aws-glue aws-lake-formation lakehouse lakeformation

Updated Jun 10, 2024
Python

apache / doris-streamloader

Star

Stream Loader for Apache Doris

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Jun 6, 2024
Go

lakesoul-io / LakeSoul

Star

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

python rust streaming sql big-data spark arrow postgresql pytorch flink datalake vectorized velox huggingface datafusion lakehouse lakesoul

Updated Jun 6, 2024
Java

bmsuisse / lakeapi

Star

API for distributing Data Lake Data

api sql rest-api data-lake columnar fastapi deltalake lakehouse duckdb polars

Updated Jun 10, 2024
Python

databricks / terraform-databricks-examples

Star

Examples of using Terraform to deploy Databricks resources

aws azure terraform gcp databricks terraform-module lakehouse databricks-module

Updated Jun 6, 2024
HCL

fraibacas / lakehouse-poc

Star

Run an open-source data LakeHouse locally using Docker Compose

docker-compose prefect apache-superset apache-iceberg lakehouse

Updated May 31, 2024
Python

paradedb / helm-charts

Sponsor

Star

Helm chart for deploying ParadeDB on Kubernetes

Updated Jun 7, 2024

adidas / lakehouse-engine-docs

Star

The Goal of this project is to provide documentation for the Lakehouse Engine framework.

framework big-data spark data-engineering databricks data-quality delta-lake great-expectations lakehouse lakehouse-engine

Updated May 20, 2024
HTML

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

framework big-data spark data-engineering databricks data-quality delta-lake great-expectations lakehouse configuration-driven

Updated May 20, 2024
Python

ekote / Build-Your-First-End-to-End-Lakehouse-Solution

Star

Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)

data-science machine-learning tutorial workshop apache-spark data-engineering warehouse parquet powerbi data-pipeline microsoft-azure data-factory dataflows delta-lake lakehouse microsoft-fabric

Updated May 29, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the lakehouse topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lakehouse topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lakehouse

Here are 76 public repositories matching this topic...

ytsaurus / ytsaurus

prestodb / presto

apache / doris

StarRocks / starrocks

ByConity / ByConity

tspannhw / FLiPStackWeekly

datastrato / gravitino

prestodb / pbench

ComputeAI / computeAI-integrations

apache / amoro

data-dot-all / dataall

apache / doris-streamloader

lakesoul-io / LakeSoul

bmsuisse / lakeapi

databricks / terraform-databricks-examples

fraibacas / lakehouse-poc

paradedb / helm-charts

adidas / lakehouse-engine-docs

adidas / lakehouse-engine

ekote / Build-Your-First-End-to-End-Lakehouse-Solution

Improve this page

Add this topic to your repo