fail to deploy ARM Milvus cluster on k8s #33098

lijunfeng11 · 2024-05-16T08:40:32Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:v2.4.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar 
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS arm64版本
- CPU/Memory: 128
- GPU: VGA compatible controller: Huawei Technologies Co., Ltd. Hi1710 [iBMC Intelligent Management system chip w/VGA support] (rev 01)
- Others:

Current Behavior

使用Helm安装Milvus执行完之后集群系显示的状态一直没有变化

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[root@master local-path-provisioner]# kubectl describe pod my-milvus
Name: my-milvus-datacoord-5dff6f95cb-qwfcs
Namespace: default
Priority: 0
Node: slave1/192.168.6.243
Start Time: Thu, 16 May 2024 15:39:05 +0800
Labels: app.kubernetes.io/instance=my-milvus
app.kubernetes.io/name=milvus
component=datacoord
pod-template-hash=5dff6f95cb
Annotations: checksum/config: d0865f30b5f61714d042ab10f2b6b2754cbcfe02d2283124495e7522a7b662bd
cni.projectcalico.org/containerID: 167c4332ccefcf09438fbedc0c7681d9cf1bb6e52f4794d3ff06e85fe0728af2
cni.projectcalico.org/podIP: 10.244.140.193/32
cni.projectcalico.org/podIPs: 10.244.140.193/32
Status: Running
IP: 10.244.140.193
IPs:
IP: 10.244.140.193
Controlled By: ReplicaSet/my-milvus-datacoord-5dff6f95cb
Init Containers:
config:
Container ID: docker://77444548187bf8938cc1ea32d5b4f04099d93e64b06f72b6b511d8c642928d4c
Image: milvusdb/milvus-config-tool:v0.1.2
Image ID: docker-pullable://milvusdb/milvus-config-tool@sha256:c6b78ac8ba1ecd021b28febfd207ca051956599d2381407dd879e74e7e4db612
Port:
Host Port:
Command:
/cp
/run-helm.sh,/merge
/milvus/tools/run-helm.sh,/milvus/tools/merge
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 16 May 2024 15:39:08 +0800
Finished: Thu, 16 May 2024 15:39:08 +0800
Ready: True
Restart Count: 0
Environment:
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
milvus-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-milvus
Optional: false
tools:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-vj8hn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Scheduled 56m default-scheduler Successfully assigned default/my-milvus-rootcoord-787d8fd6b8-cl9gd to slave1
Normal Pulled 56m kubelet Container image "milvusdb/milvus-config-tool:v0.1.2" already present on machine
Normal Created 56m kubelet Created container config
Normal Started 56m kubelet Started container config
Normal Created 53m (x5 over 56m) kubelet Created container rootcoord
Normal Started 53m (x5 over 56m) kubelet Started container rootcoord
Normal Pulled 21m (x10 over 56m) kubelet Container image "milvusdb/milvus:v2.4.1" already present on machine
Warning BackOff 11m (x63 over 55m) kubelet Back-off restarting failed container
Warning Unhealthy 66s (x143 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

Anything else?

No response

github-actions · 2024-05-16T08:40:45Z

The title and description of this issue contains Chinese. Please use English to describe your issue.

lijunfeng11 · 2024-05-16T08:43:40Z

[root@master local-path-provisioner]# NAME my-milvus-datacoord-5dff6f95cb-qwfcs 1/1 my-milvus-datanode-7f4559b69f-fs58l my-milvus-etcd-0 my-milvus-etcd-1 my-milvus-etcd-2 my-milvus-indexcoord-8987ddfb7-22zg2 1/1 my-milvus-indexnode-755787b4c4-xjd6s 1/1 my-milvus-minio-0 my-milvus-minio-1 my-milvus-minio-2 my-milvus-minio-3 my-milvus-proxy-67c766b8c9-mlwx7 my-milvus-pulsar-bookie-0 my-milvus-pulsar-bookie-1 my-milvus-pulsar-bookie-2 my-milvus-pulsar-bookie-init-4njzx my-milvus-pulsar-bookie-init-9k26x my-milvus-pulsar-bookie-init-jsr85 my-milvus-pulsar-bookie-init-ngssg my-milvus-pulsar-bookie-init-psczv my-milvus-pulsar-bookie-init-rzlpr my-milvus-pulsar-bookie-init-xp9d9 my-milvus-pulsar-broker-0 my-milvus-pulsar-proxy-0 my-milvus-pulsar-recovery-0 my-milvus-pulsar-zookeeper-0 my-milvus-querycoord-595cfb67bd-pv8wh 0/1 my-milvus-querynode-54d446464d-nvk28 1/1 my-milvus-rootcoord-787d8fd6b8-cl9gd 0/1 kubectl get pods
READY STATUS RESTARTS AGE
Running 4 40m
0/1 Running 10 40m
1/1 Running 0 40m
1/1 Running 0 40m
1/1 Running 0 40m
Running 0 40m
Running 4 40m
1/1 Running 0 40m
1/1 Running 0 40m
1/1 Running 0 40m
1/1 Running 0 40m
0/1 Running 10 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 Init:Error 0 38m
0/1 Init:Error 0 30m
0/1 Init:Error 0 40m
0/1 Init:Error 0 40m
0/1 Init:Error 0 39m
0/1 Init:Error 0 35m
0/1 Init:Error 0 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 Init:CrashLoopBackOff 12 40m
0/1 CrashLoopBackOff 12 40m
Running 10 40m
Running 4 40m
Running 10 40m

yanliang567 · 2024-05-16T09:05:28Z

please check why the pulsar pods are all failed.
/assign @lijunfeng11
/unassign

lijunfeng11 · 2024-05-16T09:29:25Z

[root@master ~]# kubectl logs my-milvus-pulsar-bookie-0
Error from server (BadRequest): container "my-milvus-pulsar-bookie" in pod "my-milvus-pulsar-bookie-0" is waiting to start: PodInitializing

please check why the pulsar pods are all failed. /assign @lijunfeng11 /unassign
现在状态是这样的需要再让自启动么
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-milvus-datacoord-5dff6f95cb-phgn4 1/1 Running 3 26m
my-milvus-datanode-7f4559b69f-82gt7 0/1 Running 7 26m
my-milvus-etcd-0 1/1 Running 0 26m
my-milvus-etcd-1 1/1 Running 0 26m
my-milvus-etcd-2 1/1 Running 0 26m
my-milvus-indexcoord-8987ddfb7-822c2 1/1 Running 0 26m
my-milvus-indexnode-755787b4c4-247lw 1/1 Running 3 26m
my-milvus-minio-0 1/1 Running 0 26m
my-milvus-minio-1 1/1 Running 0 26m
my-milvus-minio-2 1/1 Running 0 26m
my-milvus-minio-3 1/1 Running 0 26m
my-milvus-proxy-67c766b8c9-gxkbs 0/1 Running 7 26m
my-milvus-pulsar-bookie-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-bookie-1 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-bookie-2 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-broker-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-proxy-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-recovery-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-zookeeper-0 0/1 CrashLoopBackOff 7 14m
my-milvus-querycoord-595cfb67bd-9wf4l 0/1 Running 7 26m
my-milvus-querynode-54d446464d-k9gt8 1/1 Running 3 26m
my-milvus-rootcoord-787d8fd6b8-cjrlx 0/1 Running 7 26m

yanliang567 · 2024-05-16T09:41:27Z

you need to figure out why pulsar pods are CrashLoopBackOff, describe the pod or share more info about it

lijunfeng11 · 2024-05-16T09:55:43Z

kubectl logs my-milvus-pulsar-bookie-0
Because I don't quite understand k8s, how should I read more error messages

[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-bookie-1
Name: my-milvus-pulsar-bookie-1
Namespace: default
Priority: 0
Node: master/192.168.6.242
Start Time: Thu, 16 May 2024 17:12:56 +0800
Labels: app=pulsar
cluster=my-milvus-pulsar
component=bookie
controller-revision-hash=my-milvus-pulsar-bookie-69ddb44cdd
release=my-milvus
statefulset.kubernetes.io/pod-name=my-milvus-pulsar-bookie-1
Annotations: cni.projectcalico.org/containerID: 81309020e6f72da0ef057d6cfe9617eee99a828834cd813a9ca1da6396f1b1f3
cni.projectcalico.org/podIP: 10.244.219.107/32
cni.projectcalico.org/podIPs: 10.244.219.107/32
prometheus.io/port: 8000
prometheus.io/scrape: true
Status: Pending
IP: 10.244.219.107
IPs:
IP: 10.244.219.107
Controlled By: StatefulSet/my-milvus-pulsar-bookie
Init Containers:
pulsar-bookkeeper-verify-clusterid:
Container ID: docker://ee682c6f895ca9baa3801ca0e0894ad0c6d32ef43b7f10c455dd1950358efc65
Image: apachepulsar/pulsar:2.8.2
Image ID: docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
Port:
Host Port:
Command:
sh
-c
Args:

  set -e; bin/apply-config-from-env.py conf/bookkeeper.conf;until bin/bookkeeper shell whatisinstanceid; do
    sleep 3;
  done;
  
State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Thu, 16 May 2024 17:49:27 +0800
  Finished:     Thu, 16 May 2024 17:49:27 +0800
Ready:          False
Restart Count:  12
Environment Variables from:
  my-milvus-pulsar-bookie  ConfigMap  Optional: false
Environment:               <none>
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b4kq2 (ro)

Containers:
my-milvus-pulsar-bookie:
Container ID:
Image: apachepulsar/pulsar:2.8.2
Image ID:
Ports: 3181/TCP, 8000/TCP
Host Ports: 0/TCP, 0/TCP
Command:
sh
-c
Args:
bin/apply-config-from-env.py conf/bookkeeper.conf;
OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar bookie;

State:          Waiting
  Reason:       PodInitializing
Ready:          False
Restart Count:  0
Requests:
  cpu:      1
  memory:   2Gi
Liveness:   http-get http://:8000/api/v1/bookie/state delay=10s timeout=5s period=30s #success=1 #failure=60
Readiness:  http-get http://:8000/api/v1/bookie/is_ready delay=10s timeout=5s period=30s #success=1 #failure=60
Environment Variables from:
  my-milvus-pulsar-bookie  ConfigMap  Optional: false
Environment:               <none>
Mounts:
  /pulsar/data/bookkeeper/journal from my-milvus-pulsar-bookie-journal (rw)
  /pulsar/data/bookkeeper/ledgers from my-milvus-pulsar-bookie-ledgers (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b4kq2 (ro)

Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
my-milvus-pulsar-bookie-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-milvus-pulsar-bookie-journal-my-milvus-pulsar-bookie-1
ReadOnly: false
my-milvus-pulsar-bookie-ledgers:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-milvus-pulsar-bookie-ledgers-my-milvus-pulsar-bookie-1
ReadOnly: false
kube-api-access-b4kq2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Scheduled 41m default-scheduler Successfully assigned default/my-milvus-pulsar-bookie-1 to master
Normal Pulled 39m (x5 over 40m) kubelet Container image "apachepulsar/pulsar:2.8.2" already present on machine
Normal Created 39m (x5 over 40m) kubelet Created container pulsar-bookkeeper-verify-clusterid
Normal Started 39m (x5 over 40m) kubelet Started container pulsar-bookkeeper-verify-clusterid
Warning BackOff 44s (x189 over 40m) kubelet Back-off restarting failed container

lijunfeng11 · 2024-05-16T10:13:17Z

you need to figure out why pulsar pods are CrashLoopBackOff, describe the pod or share more info about it
zookeeper log

[root@master rbd-eventlog]# kubectl logs my-milvus-pulsar-zookeeper-0
exec /usr/bin/sh: exec format error
[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-zookeeper-0
Name:         my-milvus-pulsar-zookeeper-0
Namespace:    default
Priority:     0
Node:         master/192.168.6.242
Start Time:   Thu, 16 May 2024 17:12:57 +0800
Labels:       app=pulsar
              cluster=my-milvus-pulsar
              component=zookeeper
              controller-revision-hash=my-milvus-pulsar-zookeeper-5c6946568d
              release=my-milvus
              statefulset.kubernetes.io/pod-name=my-milvus-pulsar-zookeeper-0
Annotations:  cni.projectcalico.org/containerID: f240c1461b1460008d146f49ca2d751087a7a66795c36516b64b1579fa0b64a2
              cni.projectcalico.org/podIP: 10.244.219.106/32
              cni.projectcalico.org/podIPs: 10.244.219.106/32
              prometheus.io/port: 8000
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.219.106
IPs:
  IP:           10.244.219.106
Controlled By:  StatefulSet/my-milvus-pulsar-zookeeper
Containers:
  my-milvus-pulsar-zookeeper:
    Container ID:  docker://4f6ee2fc2b8668d3aa99c9ee8b2cd24e7aa76987bc1854c0a1958a06d73256ce
    Image:         apachepulsar/pulsar:2.8.2
    Image ID:      docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
    Ports:         8000/TCP, 2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      sh
      -c
    Args:
      bin/apply-config-from-env.py conf/zookeeper.conf;
      bin/generate-zookeeper-config.sh conf/zookeeper.conf; OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar zookeeper;
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 16 May 2024 18:04:38 +0800
      Finished:     Thu, 16 May 2024 18:04:38 +0800
    Ready:          False
    Restart Count:  15
    Requests:
      cpu:      300m
      memory:   1Gi
    Liveness:   exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Readiness:  exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Environment Variables from:
      my-milvus-pulsar-zookeeper  ConfigMap  Optional: false
    Environment:
      ZOOKEEPER_SERVERS:  my-milvus-pulsar-zookeeper-0,my-milvus-pulsar-zookeeper-1,my-milvus-pulsar-zookeeper-2
    Mounts:
      /pulsar/data from my-milvus-pulsar-zookeeper-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8tv8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  my-milvus-pulsar-zookeeper-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-milvus-pulsar-zookeeper-data-my-milvus-pulsar-zookeeper-0
    ReadOnly:   false
  kube-api-access-s8tv8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  54m                    default-scheduler  Successfully assigned default/my-milvus-pulsar-zookeeper-0 to master
  Normal   Started    53m (x4 over 54m)      kubelet            Started container my-milvus-pulsar-zookeeper
  Normal   Pulled     53m (x5 over 54m)      kubelet            Container image "apachepulsar/pulsar:2.8.2" already present on machine
  Normal   Created    53m (x5 over 54m)      kubelet            Created container my-milvus-pulsar-zookeeper
  Warning  BackOff    4m37s (x256 over 54m)  kubelet            Back-off restarting failed container
[root@master rbd-eventlog]# kubectl logs my-milvus-pulsar-zookeeper-0
exec /usr/bin/sh: exec format error
[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-zookeeper-0
Name:         my-milvus-pulsar-zookeeper-0
Namespace:    default
Priority:     0
Node:         master/192.168.6.242
Start Time:   Thu, 16 May 2024 17:12:57 +0800
Labels:       app=pulsar
              cluster=my-milvus-pulsar
              component=zookeeper
              controller-revision-hash=my-milvus-pulsar-zookeeper-5c6946568d
              release=my-milvus
              statefulset.kubernetes.io/pod-name=my-milvus-pulsar-zookeeper-0
Annotations:  cni.projectcalico.org/containerID: f240c1461b1460008d146f49ca2d751087a7a66795c36516b64b1579fa0b64a2
              cni.projectcalico.org/podIP: 10.244.219.106/32
              cni.projectcalico.org/podIPs: 10.244.219.106/32
              prometheus.io/port: 8000
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.219.106
IPs:
  IP:           10.244.219.106
Controlled By:  StatefulSet/my-milvus-pulsar-zookeeper
Containers:
  my-milvus-pulsar-zookeeper:
    Container ID:  docker://2dd275871a968cdb3eb1eee8d69f30f228c5c8502f4fdac9a063304c3960ba1f
    Image:         apachepulsar/pulsar:2.8.2
    Image ID:      docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
    Ports:         8000/TCP, 2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      sh
      -c
    Args:
      bin/apply-config-from-env.py conf/zookeeper.conf;
      bin/generate-zookeeper-config.sh conf/zookeeper.conf; OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar zookeeper;
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 16 May 2024 18:09:44 +0800
      Finished:     Thu, 16 May 2024 18:09:44 +0800
    Ready:          False
    Restart Count:  16
    Requests:
      cpu:      300m
      memory:   1Gi
    Liveness:   exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Readiness:  exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Environment Variables from:
      my-milvus-pulsar-zookeeper  ConfigMap  Optional: false
    Environment:
      ZOOKEEPER_SERVERS:  my-milvus-pulsar-zookeeper-0,my-milvus-pulsar-zookeeper-1,my-milvus-pulsar-zookeeper-2
    Mounts:
      /pulsar/data from my-milvus-pulsar-zookeeper-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8tv8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  my-milvus-pulsar-zookeeper-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-milvus-pulsar-zookeeper-data-my-milvus-pulsar-zookeeper-0
    ReadOnly:   false
  kube-api-access-s8tv8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  58m                    default-scheduler  Successfully assigned default/my-milvus-pulsar-zookeeper-0 to master
  Normal   Started    57m (x4 over 58m)      kubelet            Started container my-milvus-pulsar-zookeeper
  Normal   Pulled     56m (x5 over 58m)      kubelet            Container image "apachepulsar/pulsar:2.8.2" already present on machine
  Normal   Created    56m (x5 over 58m)      kubelet            Created container my-milvus-pulsar-zookeeper
  Warning  BackOff    3m13s (x280 over 58m)  kubelet            Back-off restarting failed container

yanliang567 · 2024-05-16T10:51:35Z

/assign @LoveEachDay
please help to take a look

lijunfeng11 · 2024-05-16T10:55:47Z

/assign @LoveEachDay please help to take a look

[root@master k8s-Milvus]# kubectl get pods -n my-milvus-zookeeper-0
No resources found in my-milvus-zookeeper-0 namespace.
我现在换为kafka进行启动，只有zookeeper和kafka启动不了，如应该如何查看

`[root@master k8s-Milvus]# kubectl describe pod my-milvus-zookeeper-0 -n default
Name: my-milvus-zookeeper-0
Namespace: default
Priority: 0
Node: master/192.168.6.242
Start Time: Thu, 16 May 2024 18:25:46 +0800
Labels: app.kubernetes.io/component=zookeeper
app.kubernetes.io/instance=my-milvus
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=zookeeper
controller-revision-hash=my-milvus-zookeeper-76fd4b8cf7
helm.sh/chart=zookeeper-8.1.2
statefulset.kubernetes.io/pod-name=my-milvus-zookeeper-0
Annotations: cni.projectcalico.org/containerID: 8637b832df7ef4d407e74a2452ff3967b113ed889a848ead27f2209540cf3a78
cni.projectcalico.org/podIP: 10.244.219.105/32
cni.projectcalico.org/podIPs: 10.244.219.105/32
Status: Running
IP: 10.244.219.105
IPs:
IP: 10.244.219.105
Controlled By: StatefulSet/my-milvus-zookeeper
Containers:
zookeeper:
Container ID: docker://c7efe7602de9c2252a3d9ec84e1ee63cb21ec7904dabf08cf95271d43d0a4b8f
Image: docker.io/bitnami/zookeeper:3.7.0-debian-10-r320
Image ID: docker-pullable://bitnami/zookeeper@sha256:c19c5473ef3feb8a0db00b92891c859915d06f7b888be4b3fdb78aaca109cd1f
Ports: 2181/TCP, 2888/TCP, 3888/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/scripts/setup.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 16 May 2024 19:00:26 +0800
Finished: Thu, 16 May 2024 19:00:26 +0800
Ready: False
Restart Count: 11
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 250m
memory: 256Mi
Liveness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
ZOO_DATA_LOG_DIR:
ZOO_PORT_NUMBER: 2181
ZOO_TICK_TIME: 2000
ZOO_INIT_LIMIT: 10
ZOO_SYNC_LIMIT: 5
ZOO_PRE_ALLOC_SIZE: 65536
ZOO_SNAPCOUNT: 100000
ZOO_MAX_CLIENT_CNXNS: 60
ZOO_4LW_COMMANDS_WHITELIST: srvr, mntr, ruok
ZOO_LISTEN_ALLIPS_ENABLED: no
ZOO_AUTOPURGE_INTERVAL: 0
ZOO_AUTOPURGE_RETAIN_COUNT: 3
ZOO_MAX_SESSION_TIMEOUT: 40000
ZOO_SERVERS: my-milvus-zookeeper-0.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::1 my-milvus-zookeeper-1.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::2 my-milvus-zookeeper-2.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::3
ZOO_ENABLE_AUTH: no
ZOO_HEAP_SIZE: 1024
ZOO_LOG_LEVEL: ERROR
ALLOW_ANONYMOUS_LOGIN: yes
POD_NAME: my-milvus-zookeeper-0 (v1:metadata.name)
Mounts:
/bitnami/zookeeper from data (rw)
/scripts/setup.sh from scripts (rw,path="setup.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ww4sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-milvus-zookeeper-0
ReadOnly: false
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-milvus-zookeeper-scripts
Optional: false
kube-api-access-ww4sn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Scheduled 35m default-scheduler Successfully assigned default/my-milvus-zookeeper-0 to master
Normal Pulling 35m kubelet Pulling image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320"
Normal Pulled 31m kubelet Successfully pulled image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320" in 3m31.33228944s
Normal Created 30m (x4 over 31m) kubelet Created container zookeeper
Normal Started 30m (x4 over 31m) kubelet Started container zookeeper
Normal Pulled 30m (x4 over 31m) kubelet Container image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320" already present on machine
Warning BackOff 19s (x163 over 31m) kubelet Back-off restarting failed containe`

lijunfeng11 · 2024-05-17T12:42:34Z

现在我尝试直接用外部的kafka进行连接配置文件
helmConfigYml.txt

运行之后报错

配置文件是应该这样配置么？尝试了好多次也没有成功

lijunfeng11 · 2024-05-20T05:23:53Z

/assign @LoveEachDay please help to take a look
您好，现在这个问题有什么进展么？我在单独安装zookeeper的时候也是有这个执行不了的问题，应该这个镜像里面没有arm 的包吧
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
be5cb546c61c6ea6df39c65b1584a89522dcab6c87acb8a12bdedc72c866e5d7

在咨询个问题，用外部的kafka的话是需要安装在dokcer里面的么？因为我这个是没有安装在Docker
里面的一直连接不上外部的kafka其中里面也是没有加认证的

lijunfeng11 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024

lijunfeng11 assigned yanliang567 May 16, 2024

sre-ci-robot assigned lijunfeng11 and unassigned yanliang567 May 16, 2024

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024

yanliang567 changed the title ~~k8s集群部署Milvus服务问题~~ fail to deploy ARM Milvus cluster on k8s May 16, 2024

sre-ci-robot assigned LoveEachDay May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to deploy ARM Milvus cluster on k8s #33098

fail to deploy ARM Milvus cluster on k8s #33098

lijunfeng11 commented May 16, 2024

github-actions bot commented May 16, 2024

lijunfeng11 commented May 16, 2024

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024 •

edited

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024

lijunfeng11 commented May 16, 2024

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024 •

edited

lijunfeng11 commented May 17, 2024 •

edited

lijunfeng11 commented May 20, 2024

fail to deploy ARM Milvus cluster on k8s #33098

fail to deploy ARM Milvus cluster on k8s #33098

Comments

lijunfeng11 commented May 16, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

github-actions bot commented May 16, 2024

lijunfeng11 commented May 16, 2024

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024 • edited

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024

lijunfeng11 commented May 16, 2024

yanliang567 commented May 16, 2024

lijunfeng11 commented May 16, 2024 • edited

lijunfeng11 commented May 17, 2024 • edited

lijunfeng11 commented May 20, 2024

lijunfeng11 commented May 16, 2024 •

edited

lijunfeng11 commented May 16, 2024 •

edited

lijunfeng11 commented May 17, 2024 •

edited