Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Reduce S3 API calls #3252

Closed
1 of 2 tasks
polyzos opened this issue Apr 23, 2024 · 1 comment
Closed
1 of 2 tasks

[Feature] Reduce S3 API calls #3252

polyzos opened this issue Apr 23, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@polyzos
Copy link

polyzos commented Apr 23, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Overall Apache Paimon offers low TCO, but in terms of S3 API calls it makes way more compared to Iceberg.
Using version 0.7 you can see the following in the screenshot

On the left it's the S3 API calls Apache Iceberg makes and on the right its Apache Paimon (~3k/hour I believe)
Screenshot 2024-04-17 at 2 00 57 PM

I believe @JingsongLi mentioned some improvements were made in the upcoming 08 version, but as this is important for many companies we need to keep it as low as iceberg if possible.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@polyzos polyzos added the enhancement New feature or request label Apr 23, 2024
@polyzos
Copy link
Author

polyzos commented May 8, 2024

More context on this: turns out it was the effect of not having compaction running in the background., which wasn't clear in the first place.

This means that although partition/snapshot/manifest file expiration happens automatically
https://paimon.apache.org/docs/master/maintenance/manage-snapshots/#expire-snapshots
it results in more files being generated, but not physically deleted because compaction is disabled.

This means that although the generated files keep increasing, they are also never removed, but Paimon will still have to access/list more metadata files (more and more over time), in order to find out what's still relevant and what's not.

Turning compaction, in the long run will keep the requests stable.

At the same time it is also recommended to use version 0.8+ that brings many improvements as well

@polyzos polyzos closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant