Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [Nightly] e2e cases will fail for timeout with no crashed pod #32974

Closed
1 task done
NicoYuan1986 opened this issue May 11, 2024 · 5 comments
Closed
1 task done
Assignees
Labels
ci/e2e kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@NicoYuan1986
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: c0e62e6
- Deployment mode(standalone or cluster): cluster & standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

e2e cases will fail for timeout with no crashed pod.

link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/738/pipeline/255'
log: artifacts-milvus-distributed-kafka-nightly-738-pymilvus-e2e-logs.tar.gz

pods :

[2024-05-11T00:06:29.175Z] + ./uninstall_milvus.sh --release-name mdk-738-n
[2024-05-11T00:06:29.177Z] mdk-738-n-etcd-0                                      1/1     Running             0              6h6m    10.105.1.213   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-etcd-1                                      1/1     Running             0              6h6m    10.105.7.246   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-etcd-2                                      1/1     Running             0              6h6m    10.105.5.53    ci-node11   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-kafka-0                                     2/2     Running             2 (6h4m ago)   6h6m    10.105.1.214   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-kafka-1                                     2/2     Running             2 (6h4m ago)   6h6m    10.105.7.249   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-kafka-2                                     2/2     Running             2 (6h4m ago)   6h6m    10.105.5.56    ci-node11   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-kafka-exporter-667d59b7bd-9wflr             1/1     Running             5 (6h4m ago)   6h6m    10.105.1.188   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-datacoord-786554588c-r74hx           1/1     Running             4 (6h4m ago)   6h6m    10.105.1.191   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-datanode-58bc5b79dc-s599b            1/1     Running             4 (6h4m ago)   6h6m    10.105.1.195   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-datanode-58bc5b79dc-zvbrp            1/1     Running             4 (6h4m ago)   6h6m    10.105.7.227   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-indexcoord-5884c8b474-jsmnf          1/1     Running             0              6h6m    10.105.1.190   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-indexnode-6948d797f8-nrb2r           1/1     Running             4 (6h4m ago)   6h6m    10.105.7.225   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-indexnode-6948d797f8-pf5b9           1/1     Running             4 (6h4m ago)   6h6m    10.105.1.192   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-proxy-58f8c74d97-fc668               1/1     Running             4 (6h4m ago)   6h6m    10.105.7.224   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-proxy-58f8c74d97-rt77t               1/1     Running             4 (6h4m ago)   6h6m    10.105.1.193   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-querycoord-dfb44bccc-t6mtf           1/1     Running             4 (6h4m ago)   6h6m    10.105.1.194   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-querynode-679ff94c64-lbx5f           1/1     Running             4 (6h4m ago)   6h6m    10.105.7.228   ci-node12   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-querynode-679ff94c64-vnr9k           1/1     Running             4 (6h4m ago)   6h6m    10.105.1.196   ci-node10   <none>           <none>
[2024-05-11T00:06:29.177Z] mdk-738-n-milvus-rootcoord-6cc578cb87-bdzfz           1/1     Running             4 (6h4m ago)   6h6m    10.105.1.189   ci-node10   <none>           <none>
[2024-05-11T00:06:29.178Z] mdk-738-n-minio-5d4f487d5-9p6mc                       1/1     Running             0              6h6m    10.105.1.212   ci-node10   <none>           <none>
[2024-05-11T00:06:29.178Z] mdk-738-n-zookeeper-0                                 1/1     Running             0              6h6m    10.105.1.211   ci-node10   <none>           <none>
[2024-05-11T00:06:29.178Z] mdk-738-n-zookeeper-1                                 1/1     Running             0              6h6m    10.105.7.247   ci-node12   <none>           <none>
[2024-05-11T00:06:29.178Z] mdk-738-n-zookeeper-2                                 1/1     Running             0              6h6m    10.105.5.55    ci-node11   <none>           <none>

https://jenkins.milvus.io:18080/blue/rest/organizations/jenkins/pipelines/Milvus%20Nightly%20CI/branches/master/runs/738/nodes/255/steps/529/log/?start=0

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@NicoYuan1986 NicoYuan1986 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 11, 2024
@NicoYuan1986 NicoYuan1986 added this to the 2.4.2 milestone May 11, 2024
@yanliang567
Copy link
Contributor

/assign @congqixia
please help to take a look, it reproduce occasionally in ci e2e
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. ci/e2e and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 11, 2024
@NicoYuan1986
Copy link
Contributor Author

image

The last log.

@smellthemoon
Copy link
Contributor

image too many operations in ten request image snapshot will save two records. Need to divide maxTxnNum by 2, or it will exceed etcd limit.

@smellthemoon
Copy link
Contributor

/assign

sre-ci-robot pushed a commit that referenced this issue May 14, 2024
#32974

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
@NicoYuan1986
Copy link
Contributor Author

fixed. 3d105fc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/e2e kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants