Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cluster can not insert data,and data node will restart when inserting data #33012

Open
1 task done
1271653627 opened this issue May 13, 2024 · 6 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@1271653627
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:v2.3.13
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The cluster is unable to insert data, and every time data insertion is attempted, the data node restarts. When checking the logs, the following error is reported. However, the cluster can create collections and load them normally.
milvus_data1.1.ddkurhxaadx8@gp22aitppap92xj | [2024/05/11 17:16:34.629 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=8] [error="server error: ServiceNotReady: Namespace bundle for topic (persistent://public/default/cpic-milvus-rootcoord-dml_3) not served by this instance:broker:8080. Please redo the lookup. Request is denied: namespace=public/default"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.(*MqTtMsgStream).AsConsumer\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mq_msgstream.go:586\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.NewDispatcher\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/dispatcher.go:100\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Add\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:93\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*client).Register\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/client.go:77\ngithub.com/milvus-io/milvus/internal/datanode.newDmInputNode\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_dmstream_input_node.go:49\ngithub.com/milvus-io/milvus/internal/datanode.getServiceWithChannel\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:361\ngithub.com/milvus-io/milvus/internal/datanode.newServiceWithEtcdTickler\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:431\ngithub.com/milvus-io/milvus/internal/datanode.(*flowgraphManager).addAndStartWithEtcdTickler\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_manager.go:131\ngithub.com/milvus-io/milvus/internal/datanode.(*DataNode).handlePutEvent\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/event_manager.go:179\ngithub.com/milvus-io/milvus/internal/datanode.(*channelEventManager).Run.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/event_manager.go:268"]

Expected Behavior

insert data normally

Steps To Reproduce

1.deploy cluster
2.create collection
3.load 
4.insert

Milvus Log

This data node log
milvus_data.log

Anything else?

No response

@1271653627 1271653627 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2024
@1271653627
Copy link
Author

@yanliang567 May I ask: If I want to restart the milvus service to fix the issue with the milvus cluster deployed using Docker Swarm, can I first stop the milvus related services, that is, coordinate nodes, work nodes, and proxy nodes, and restart them. Just keep the minio, etcd, and plusar stationary?

@yanliang567
Copy link
Contributor

@1271653627 you can restart in that way, but you shall know that Docker Swarm is not a tested depoyment mean in the community.

/assign @congqixia
looks like a mq issue, please help to confirm
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 13, 2024
@1271653627
Copy link
Author

@congqixia @yanliang567 Below is the log for the coord node that I've supplemented.
coordnode.zip
I noticed this issue #25267, and I've encountered a similar situation before where the number of entities on the attu is incorrect after inserting data. Referring to their method: changing the number of replicas of the datanode to 1, and increasing the rootCoord.dmlChannelNum parameter, can this solve the current problem?
Also, I deployed a Milvus cluster with the same configuration in the test environment, and everything worked fine. However, in the production environment, I couldn't insert data.
The test environment is running on Red Hat Enterprise Linux Server 7.4 Maipo (64-bit), while the production environment is on UOS 20 Fuyu (64-bit). I wonder if it's related to the operating system.
Looking forward to your response.
Thanks for your support.

@congqixia
Copy link
Contributor

@1271653627 after some inspection from the log. It looks like the datanode failed to query topic from pulsar broker for a long period.
datanode session id went 100+, so it repeatedly tried to subscribe for serving insert data.
Did you pulsar cluster went abnormal during the problem occurred?
And could you please provided the mq section in you configuration file? the port 8080 seems strange here according to @LoveEachDay

@1271653627
Copy link
Author

I set pulsar webport to 8080,because of below picture.
image
this is my milvus config file. @congqixia
milvus-config.txt

@1271653627
Copy link
Author

@LoveEachDay Please help check the comments above, thank you.

@yanliang567 yanliang567 modified the milestones: 2.4.2, 2.4.3, 2.4.4 May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants