Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Added sharded reading #3288

Open
1 of 2 tasks
scottxing opened this issue Apr 30, 2024 · 0 comments
Open
1 of 2 tasks

[Feature] Added sharded reading #3288

scottxing opened this issue Apr 30, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@scottxing
Copy link

scottxing commented Apr 30, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

When a large amount of data passes through paimon cdc, about 100 million records are dropped to the paimon ods table. The table attribute sets changelog as input. Then, at this time, I write a flink sql job (using the consumer-id setting), and read This table is inserted into another paimon dwd table (the changelog attribute is lookup). After starting this job, the checkpoint has been stuck at 0% and cannot be completed, so the snapshot cannot be committed. As a result, my other flink sql job cannot check the paimon dwd table. to data. This leads to the phenomenon that a large amount of data from one paimon table must be completely written to another paimon table before it can then be transferred from this paimon table to the next. Data cannot flow smoothly from job to job like a stream.

Solution

Added sharded reading. For large-volume paimon tables, when the job reads, sharding is set up, similar to Flink CDC. After one shard is completed, the next shard is moved on to ensure smooth checkpointing. Let data flow between various paimon tables.

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@scottxing scottxing added the enhancement New feature or request label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant