Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance of duckdb_stream_fetch_chunk #12105

Open
2 tasks done
Giorgi opened this issue May 17, 2024 · 6 comments
Open
2 tasks done

Slow performance of duckdb_stream_fetch_chunk #12105

Giorgi opened this issue May 17, 2024 · 6 comments
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation under review

Comments

@Giorgi
Copy link
Contributor

Giorgi commented May 17, 2024

What happens?

Compared to Python API, duckdb_stream_fetch_chunk is very slow.

Output for Python script (the first column is the value that goes in the where condition):

IGE98q: 1014 0.38 seconds
Wg: 952 0.09 seconds
k: 16939 0.09 seconds
K: 12933 0.09 seconds
wTXW: 982 0.08 seconds
H: 13036 0.08 seconds
N6lCHzQ: 973 0.08 seconds
N8t: 984 0.08 seconds
8MrfK3KC2: 1003 0.09 seconds
y42HuWky: 960 0.08 seconds
36opg6NrR: 939 0.08 seconds
Total time=1.2280151844024658

Time needed for duckdb_execute_prepared_streaming + duckdb_stream_fetch_chunk for one of these keys is 0.559 seconds (8MrfK3KC2, takes 0.09 seconds in Python):

image

This was originally raised in the .NET client repo: Giorgi/DuckDB.NET#188

I can either share the parquet file (26GB) or you can generate it yourself based on the instructions here: Giorgi/DuckDB.NET#188 (comment) (You will get different keys, but it should behave similarly)

To Reproduce

I use this Python script to measure Python execution count:

import duckdb

keys = open('keys.txt').readlines()#[:10]

import time

start = time.time()
for g in keys:
    g = g.strip()
    start2 = time.time()
    x = duckdb.sql(f"""SELECT col71, col1, col2, col3, col20 FROM read_parquet('test.pq') where col71 = '{g}'""").df()
    print(f"{g}: {len(x)} {time.time()-start2:.2f} seconds")

print(f"Total time={time.time()-start}")

C code fragment:

auto query = "SELECT col71, col1, col2, col3, col20 FROM read_parquet('S:\\test.pq') where col71 = '8MrfK3KC2'";
state = duckdb_prepare(con, query, &preparedStatement);

auto start = std::chrono::high_resolution_clock::now();

duckdb_execute_prepared_streaming(preparedStatement, &result);

duckdb_data_chunk chunk = duckdb_stream_fetch_chunk(result);

auto finish = std::chrono::high_resolution_clock::now();

auto microseconds = std::chrono::duration_cast<std::chrono::microseconds>(finish - start);
std::cout << microseconds.count() << "us\n";

OS:

Windows 11 x64

DuckDB Version:

0.10.2

DuckDB Client:

C, Python

Full Name:

Giorgi Dalakishvili

Affiliation:

Space International

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@Tishj
Copy link
Contributor

Tishj commented May 17, 2024

In what way is the Python code streaming?
We're just calling .df here, which materializes directly as a Pandas DataFrame, no?

@Giorgi
Copy link
Contributor Author

Giorgi commented May 17, 2024

Same result in dev version:

image

@Giorgi
Copy link
Contributor Author

Giorgi commented May 17, 2024

In what way is the Python code streaming? We're just calling .df here, which materializes directly as a Pandas DataFrame, no?

I didn't say that Python code is streaming. If Python can materialize so fast, why is C API slow (Which isn't even materializing the result set)?

@Tishj
Copy link
Contributor

Tishj commented May 17, 2024

Why are you comparing a streaming result set against a materialized one then?
You can materialize in the C API as well

A more equivalent test would be using:

con.execute(...)
con.fetchmany(duckdb.__standard_vector_size__)

It sounds like what you're struggling with is the performance of the streaming result collector, which is a known issue
Likely related to #11494

@Giorgi
Copy link
Contributor Author

Giorgi commented May 17, 2024

I switched the .NET implementation to a streaming result set to avoid issues like JDBC ResultSet consumes unlimited memory but if that comes with a 10x performance regression, I will switch it back.

@Giorgi
Copy link
Contributor Author

Giorgi commented May 17, 2024

Switched back to using non-streaming result and here are the results:

image

duckdb_execute_prepared_streaming and duckdb_stream_fetch_chunk should have a big warning in the docs.

@Mytherin @szarnyasg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation under review
Projects
None yet
Development

No branches or pull requests

4 participants