Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][cluster][LRU] search and query failed Error in GetObjectSize in DDL and DQL scene #33046

Open
1 task done
wangting0128 opened this issue May 14, 2024 · 1 comment
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240513-9e3f3d99
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task:lru-fouramf-sdtdv

server:

NAME                                                              READY   STATUS                   RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-scene14-etcd-0                                                1/1     Running                  0                16h     10.104.25.147   4am-node30   <none>           <none>
lru-scene14-etcd-1                                                1/1     Running                  0                16h     10.104.26.215   4am-node32   <none>           <none>
lru-scene14-etcd-2                                                1/1     Running                  0                16h     10.104.24.122   4am-node29   <none>           <none>
lru-scene14-milvus-datacoord-55bd489445-2f9dq                     1/1     Running                  3 (16h ago)      16h     10.104.6.221    4am-node13   <none>           <none>
lru-scene14-milvus-datanode-849c58578b-cn52g                      1/1     Running                  3 (16h ago)      16h     10.104.14.77    4am-node18   <none>           <none>
lru-scene14-milvus-indexcoord-d4598bdb-8pxwr                      1/1     Running                  0                16h     10.104.6.220    4am-node13   <none>           <none>
lru-scene14-milvus-indexnode-55664f9d95-gm9vc                     1/1     Running                  3 (16h ago)      16h     10.104.14.75    4am-node18   <none>           <none>
lru-scene14-milvus-indexnode-55664f9d95-hbx44                     1/1     Running                  3 (16h ago)      16h     10.104.6.222    4am-node13   <none>           <none>
lru-scene14-milvus-proxy-66d7877558-vl66v                         1/1     Running                  3 (16h ago)      16h     10.104.6.224    4am-node13   <none>           <none>
lru-scene14-milvus-querycoord-56484bc49-xlxdc                     1/1     Running                  3 (16h ago)      16h     10.104.6.219    4am-node13   <none>           <none>
lru-scene14-milvus-querynode-79b5c75746-zktl6                     1/1     Running                  3 (16h ago)      16h     10.104.6.223    4am-node13   <none>           <none>
lru-scene14-milvus-rootcoord-5dc64df7b5-vcg22                     1/1     Running                  3 (16h ago)      16h     10.104.14.76    4am-node18   <none>           <none>
lru-scene14-minio-0                                               1/1     Running                  0                16h     10.104.25.145   4am-node30   <none>           <none>
lru-scene14-minio-1                                               1/1     Running                  0                16h     10.104.26.214   4am-node32   <none>           <none>
lru-scene14-minio-2                                               1/1     Running                  0                16h     10.104.24.126   4am-node29   <none>           <none>
lru-scene14-minio-3                                               1/1     Running                  0                16h     10.104.27.209   4am-node31   <none>           <none>
lru-scene14-pulsar-bookie-0                                       1/1     Running                  0                16h     10.104.25.146   4am-node30   <none>           <none>
lru-scene14-pulsar-bookie-1                                       1/1     Running                  0                16h     10.104.24.128   4am-node29   <none>           <none>
lru-scene14-pulsar-bookie-2                                       1/1     Running                  0                16h     10.104.27.212   4am-node31   <none>           <none>
lru-scene14-pulsar-bookie-init-c26pm                              0/1     Completed                0                16h     10.104.13.240   4am-node16   <none>           <none>
lru-scene14-pulsar-broker-0                                       1/1     Running                  0                16h     10.104.6.226    4am-node13   <none>           <none>
lru-scene14-pulsar-proxy-0                                        1/1     Running                  0                16h     10.104.13.239   4am-node16   <none>           <none>
lru-scene14-pulsar-pulsar-init-qrg4b                              0/1     Completed                0                16h     10.104.13.241   4am-node16   <none>           <none>
lru-scene14-pulsar-recovery-0                                     1/1     Running                  0                16h     10.104.6.225    4am-node13   <none>           <none>
lru-scene14-pulsar-zookeeper-0                                    1/1     Running                  0                16h     10.104.25.144   4am-node30   <none>           <none>
lru-scene14-pulsar-zookeeper-1                                    1/1     Running                  0                16h     10.104.26.217   4am-node32   <none>           <none>
lru-scene14-pulsar-zookeeper-2                                    1/1     Running                  0                16h     10.104.27.215   4am-node31   <none>           <none>

client pod name: lru-fouramf-sdtdv-3126972081
client log:
image

Expected Behavior

No response

Steps To Reproduce

1. create a collection with 3 fields: id(primaryKey), float_vector(768dim), int64_1(partitionKey=4)
2. build HNSW index
3. prepare 49m data
4. flush collection
5. build index again with the same params
6. load collection
7. concurrent requests:  <- raises error
   - search
   - query
   - scene_search_test

Milvus Log

No response

Anything else?

fouramf-client-lazyload-49m-ddl-dql:

    dataset_params:
      dataset_name: laion1b_nolang
      column_name: float32_vector
      dim: 768
      dataset_size: 49m
      ni_per: 10000
      metric_type: L2
      scalars_params:
        int64_1:
          params:
            is_partition_key: true
    collection_params:
      other_fields:
        - int64_1
      num_partitions: 64
    index_params:
      index_type: HNSW
      index_param:
        M: 30
        efConstruction: 360
    concurrent_tasks:
      - type: search
        weight: 1
        params:
          top_k: 1
          nq: 10
          search_param:
            ef: 64
          expr: int64_1 >= 1
          timeout: 3000
          random_data: true
      - type: query
        weight: 1
        params:
          expr: int64_1 >  20000
          timeout: 3000
          offset: 0
          limit: 20
          random_data: true
          random_count: 10
          random_range:
            - 1000
            - 10000
      - type: scene_search_test
        weight: 1
        params:
          index_type: HNSW
          index_param:
            M: 30
            efConstruction: 360
          search_param:
            ef: 64
    concurrent_params:
      interval: 20
      during_time: 12h
      concurrent_number: 30

fouramf-server-lazyload-cluster-qn1-2c8g:

    queryNode:
      resources:
        limits:
          cpu: '2'
          memory: 8Gi
          ephemeral-storage: 70Gi
        requests:
          cpu: '2'
          memory: 8Gi
      replicas: 1
      extraEnv:
        - name: LOCAL_STORAGE_SIZE
          value: '70'
    indexNode:
      resources:
        limits:
          cpu: '8.0'
          memory: 8Gi
        requests:
          cpu: '5.0'
          memory: 5Gi
      replicas: 2
    dataNode:
      resources:
        limits:
          cpu: '2.0'
          memory: 8Gi
        requests:
          cpu: '2.0'
          memory: 8Gi
      replicas: 1
    minio:
      metrics:
        podMonitor:
          enabled: true
      persistence:
        size: 320Gi
    etcd:
      metrics:
        enabled: true
        podMonitor:
          enabled: true
    metrics:
      serviceMonitor:
        enabled: true
    log:
      level: debug
    extraConfigFiles:
      user.yaml: |
        queryNode:
          diskCacheCapacityLimit: 51539607552
          mmap:
            mmapEnabled: true
          lazyload:
            enabled: true
            waitTimeout: 300000
          useStreamComputing: true
          cache:
            warmup: off
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels May 14, 2024
@chyezh
Copy link
Contributor

chyezh commented May 14, 2024

  • handoff is slow, segment is released on querynode after segment is GC.
  • segment GC should keep sync with querycoord. (will fix in the next version)

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 15, 2024
@yanliang567 yanliang567 added this to the 2.4.lru milestone May 15, 2024
@yanliang567 yanliang567 removed their assignment May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants