Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Hidden dangers in lookup function #3256

Open
1 of 2 tasks
JCJCut opened this issue Apr 24, 2024 · 1 comment
Open
1 of 2 tasks

[Bug] Hidden dangers in lookup function #3256

JCJCut opened this issue Apr 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@JCJCut
Copy link

JCJCut commented Apr 24, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

Paimon: 0.8
Flink: 1.16

Compute Engine

Flink

Minimal reproduce step

SecondaryIndexLookupTable ----- RocksDBSetState.java ----- get(K key) ------ List<byte[]> valueBytes = cache.getIfPresent(keyBytes);
'lookup.cache' = 'full', 'lookup.cache-rows'='20'
table.exec.state.ttl = 5 min
with massive data

What doesn't meet your expectations?

In my case, joinKey <> PrimaryKey.
Looking at the source code in RocksDBSetState.java ----- get(K key), it will query the db (with the condition valueBytes==null) when the cache cannot read the data .
There are some hidden dangers in the conditions for querying db. After logging, I found that there may be a situation where valueBytes is not null but the list is empty, which will result in no longer querying db. In some extreme cases, it seems that there is data in the db, but due to this query condition, the indexState returns an empty mapping, indirectly leading to lookup failure.
Can u further test and change the condition for querying db to "if (value Bytes!=null&&value Bytes. isEmpty())"?


I have another question. When two identical tasks are running at the same time, one task(task 1) can successfully read the lookup table data, but the other cannot(task 2). In task2, through logging, I force IndexState to read the cache and db, but they return empty data.

Then I found that the data read by the two tasks during the last lookup was different.
Task1
2024-04-23 01:27:47.347 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-0.orc with {include: null, offset: 3, length: 1159, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 01:27:47.555 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-1.orc with {include: null, offset: 3, length: 889, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 01:27:47.778 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-2.orc with {include: null, offset: 3, length: 1052, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}

2024-04-23 02:02:28.815 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-0.orc with {include: null, offset: 3, length: 1015, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:02:28.965 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-1.orc with {include: null, offset: 3, length: 887, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:02:29.586 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-2.orc with {include: null, offset: 3, length: 1181, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}

Task2
2024-04-23 02:02:33.165 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-0.orc with {include: null, offset: 3, length: 1159, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:02:33.326 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-1.orc with {include: null, offset: 3, length: 889, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:02:34.201 INFO Reading ORC rows from data-ed87638b-834b-46a2-a192-37130bd74f8c-2.orc with {include: null, offset: 3, length: 1052, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}

Read records after the last lookup
2024-04-23 02:03:56.862 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-0.orc with {include: null, offset: 3, length: 1015, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:03:56.978 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-1.orc with {include: null, offset: 3, length: 887, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2024-04-23 02:03:57.973 INFO Reading ORC rows from data-75ec10a0-651d-42a1-87f2-8a7a0df66652-2.orc with {include: null, offset: 3, length: 1181, schema: struct<_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,xxx>, includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}

It can be visually seen that in task2 where lookup failed, a portion of the data was read after the last lookup. That is to say, the data read by task1 will eventually be read by task2, but the time nodes they read are different. What is the reason for this differentiation? Can u optimize it? I personally believe that the issue occurred during the refresh function.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@JCJCut JCJCut added the bug Something isn't working label Apr 24, 2024
@JingsongLi
Copy link
Contributor

Hi @JCJCut , is there some data to reproduce this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants