Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When an offset value is passed into the queryIterator interface, the resulting data is lost. #2017

Open
1 task done
lentitude2tk opened this issue Apr 2, 2024 · 2 comments
Labels
kind/bug Something isn't working

Comments

@lentitude2tk
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

  1. Build a data ID/vector column, the relevant schema information is as follows:
fields = [
    FieldSchema(name=USER_ID, dtype=DataType.INT64, is_primary=True,
                auto_id=False),
    FieldSchema(name=AGE, dtype=DataType.INT64),
    FieldSchema(name=DEPOSIT, dtype=DataType.DOUBLE),
    FieldSchema(name=PICTURE, dtype=DataType.FLOAT_VECTOR, dim=DIM)
]

schema = CollectionSchema(fields)
  1. Insert data for USER_ID and AGE from 1 to 5000 for the purpose of tracking whether the iterator is working correctly.
def insert_data(collection):
    rng = np.random.default_rng(seed=19530)
    batch_count = 5
    filter_set: set = {}
    for i in range(batch_count):
        entities = [
            [(i * NUM_ENTITIES + ni) for ni in range(NUM_ENTITIES)],
            [(i * NUM_ENTITIES + ni) for ni in range(NUM_ENTITIES)],
            [float(ni) for ni in range(NUM_ENTITIES)],
            rng.random((NUM_ENTITIES, DIM)),
        ]
        collection.insert(entities)
        collection.flush()
        print(f"Finish insert batch{i}, number of entities in Milvus: {collection.num_entities}")
  1. Using queryIterator (with offset) will result in the loss of data for the first 'limit-2' items, and each concurrently returned batch does not meet expectations
def query_iterate_collection_with_offset(collection):
    expr = f"10 <= {AGE} <= 100"
    query_iterator = collection.query_iterator(expr=expr, output_fields=[USER_ID, AGE],
                                               offset=10, batch_size=50, consistency_level=CONSISTENCY_LEVEL)
    page_idx = 0
    while True:
        res = query_iterator.next()
        if len(res) == 0:
            print("query iteration finished, close")
            query_iterator.close()
            break
        for i in range(len(res)):
            print(res[i])
        page_idx += 1
        print(f"page{page_idx}-------------------------")

Expected Behavior

The expected result:
{'age': 20, 'id': 20}
{'age': 21, 'id': 21}
{'age': 22, 'id': 22}
{'age': 23, 'id': 23}
{'age': 24, 'id': 24}
{'age': 25, 'id': 25}
{'age': 26, 'id': 26}
{'age': 27, 'id': 27}
{'age': 28, 'id': 28}
{'age': 29, 'id': 29}
{'age': 30, 'id': 30}
{'age': 31, 'id': 31}
{'age': 32, 'id': 32}
{'age': 33, 'id': 33}
{'age': 34, 'id': 34}
{'age': 35, 'id': 35}
{'age': 36, 'id': 36}
{'age': 37, 'id': 37}
{'age': 38, 'id': 38}
{'age': 39, 'id': 39}
{'age': 40, 'id': 40}
{'age': 41, 'id': 41}
{'age': 42, 'id': 42}
{'age': 43, 'id': 43}
{'age': 44, 'id': 44}
{'age': 45, 'id': 45}
{'age': 46, 'id': 46}
{'age': 47, 'id': 47}
{'age': 48, 'id': 48}
{'age': 49, 'id': 49}
{'age': 50, 'id': 50}
{'age': 51, 'id': 51}
{'age': 52, 'id': 52}
{'age': 53, 'id': 53}
{'age': 54, 'id': 54}
{'age': 55, 'id': 55}
{'age': 56, 'id': 56}
{'age': 57, 'id': 57}
{'age': 58, 'id': 58}
{'age': 59, 'id': 59}
{'age': 60, 'id': 60}
{'age': 61, 'id': 61}
{'age': 62, 'id': 62}
{'age': 63, 'id': 63}
{'age': 64, 'id': 64}
{'age': 65, 'id': 65}
{'age': 66, 'id': 66}
{'age': 67, 'id': 67}
{'age': 68, 'id': 68}
{'age': 69, 'id': 69}
page1-------------------------
{'age': 70, 'id': 70}
{'age': 71, 'id': 71}
{'age': 72, 'id': 72}
{'age': 73, 'id': 73}
{'age': 74, 'id': 74}
{'age': 75, 'id': 75}
{'age': 76, 'id': 76}
{'age': 77, 'id': 77}
{'age': 78, 'id': 78}
{'id': 79, 'age': 79}
{'id': 80, 'age': 80}
{'id': 81, 'age': 81}
{'id': 82, 'age': 82}
{'id': 83, 'age': 83}
{'id': 84, 'age': 84}
{'id': 85, 'age': 85}
{'id': 86, 'age': 86}
{'id': 87, 'age': 87}
{'id': 88, 'age': 88}
{'id': 89, 'age': 89}
{'id': 90, 'age': 90}
{'id': 91, 'age': 91}
{'id': 92, 'age': 92}
{'id': 93, 'age': 93}
{'id': 94, 'age': 94}
{'id': 95, 'age': 95}
{'id': 96, 'age': 96}
{'id': 97, 'age': 97}
{'id': 98, 'age': 98}
{'id': 99, 'age': 99}
{'id': 100, 'age': 100}

The actual result obtained:
{'age': 29, 'id': 29}
{'age': 30, 'id': 30}
{'age': 31, 'id': 31}
{'age': 32, 'id': 32}
{'age': 33, 'id': 33}
{'age': 34, 'id': 34}
{'age': 35, 'id': 35}
{'age': 36, 'id': 36}
{'age': 37, 'id': 37}
{'age': 38, 'id': 38}
{'age': 39, 'id': 39}
{'age': 40, 'id': 40}
{'age': 41, 'id': 41}
{'age': 42, 'id': 42}
{'age': 43, 'id': 43}
{'age': 44, 'id': 44}
{'age': 45, 'id': 45}
{'age': 46, 'id': 46}
{'age': 47, 'id': 47}
{'age': 48, 'id': 48}
{'age': 49, 'id': 49}
{'age': 50, 'id': 50}
{'age': 51, 'id': 51}
{'age': 52, 'id': 52}
{'age': 53, 'id': 53}
{'age': 54, 'id': 54}
{'age': 55, 'id': 55}
{'age': 56, 'id': 56}
{'age': 57, 'id': 57}
{'age': 58, 'id': 58}
{'age': 59, 'id': 59}
{'age': 60, 'id': 60}
{'age': 61, 'id': 61}
{'age': 62, 'id': 62}
{'age': 63, 'id': 63}
{'age': 64, 'id': 64}
{'age': 65, 'id': 65}
{'age': 66, 'id': 66}
{'age': 67, 'id': 67}
{'age': 68, 'id': 68}
{'age': 69, 'id': 69}
{'age': 70, 'id': 70}
{'age': 71, 'id': 71}
{'age': 72, 'id': 72}
{'age': 73, 'id': 73}
{'age': 74, 'id': 74}
{'age': 75, 'id': 75}
{'age': 76, 'id': 76}
{'age': 77, 'id': 77}
{'age': 78, 'id': 78}
page1-------------------------
{'id': 79, 'age': 79}
{'id': 80, 'age': 80}
{'id': 81, 'age': 81}
{'id': 82, 'age': 82}
{'id': 83, 'age': 83}
{'id': 84, 'age': 84}
{'id': 85, 'age': 85}
{'id': 86, 'age': 86}
{'id': 87, 'age': 87}
{'id': 88, 'age': 88}
{'id': 89, 'age': 89}
{'id': 90, 'age': 90}
{'id': 91, 'age': 91}
{'id': 92, 'age': 92}
{'id': 93, 'age': 93}
{'id': 94, 'age': 94}
{'id': 95, 'age': 95}
{'id': 96, 'age': 96}
{'id': 97, 'age': 97}
{'id': 98, 'age': 98}
{'id': 99, 'age': 99}
page2-------------------------
{'id': 100, 'age': 100}
page3-------------------------

Steps/Code To Reproduce behavior

As described in "Describe the bug", if the queryIterator interface is provided with an offset value, data corresponding to the previous 'offset-2' numbers will be lost, and each concurrently returned batch does not meet expectations.

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):None
- Method of installation (Docker, or from source):Zilliz
- Milvus version (v0.3.1, or v0.4.0):v2.3.13-91cdada-219
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

@lentitude2tk lentitude2tk added the kind/bug Something isn't working label Apr 2, 2024
@lentitude2tk
Copy link
Author

It seems there is a problem with how this code block is handling things. It should process the index based on the current offset offset and the actual returned res, rather than directly taking the last record of the final result.

self.__update_cursor(res)

@lentitude2tk
Copy link
Author

lentitude2tk commented Apr 9, 2024

During the initialization of __seek, update the position using self.__update_cursor(res[0: self._kwargs[OFFSET]]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant