Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to deploy ARM Milvus cluster on k8s #33098

Open
1 task done
lijunfeng11 opened this issue May 16, 2024 · 11 comments
Open
1 task done

fail to deploy ARM Milvus cluster on k8s #33098

lijunfeng11 opened this issue May 16, 2024 · 11 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@lijunfeng11
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:v2.4.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar 
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS arm64版本
- CPU/Memory: 128
- GPU: VGA compatible controller: Huawei Technologies Co., Ltd. Hi1710 [iBMC Intelligent Management system chip w/VGA support] (rev 01)
- Others:

Current Behavior

使用Helm安装Milvus执行完之后集群系显示的状态一直没有变化

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[root@master local-path-provisioner]# kubectl describe pod my-milvus
Name: my-milvus-datacoord-5dff6f95cb-qwfcs
Namespace: default
Priority: 0
Node: slave1/192.168.6.243
Start Time: Thu, 16 May 2024 15:39:05 +0800
Labels: app.kubernetes.io/instance=my-milvus
app.kubernetes.io/name=milvus
component=datacoord
pod-template-hash=5dff6f95cb
Annotations: checksum/config: d0865f30b5f61714d042ab10f2b6b2754cbcfe02d2283124495e7522a7b662bd
cni.projectcalico.org/containerID: 167c4332ccefcf09438fbedc0c7681d9cf1bb6e52f4794d3ff06e85fe0728af2
cni.projectcalico.org/podIP: 10.244.140.193/32
cni.projectcalico.org/podIPs: 10.244.140.193/32
Status: Running
IP: 10.244.140.193
IPs:
IP: 10.244.140.193
Controlled By: ReplicaSet/my-milvus-datacoord-5dff6f95cb
Init Containers:
config:
Container ID: docker://77444548187bf8938cc1ea32d5b4f04099d93e64b06f72b6b511d8c642928d4c
Image: milvusdb/milvus-config-tool:v0.1.2
Image ID: docker-pullable://milvusdb/milvus-config-tool@sha256:c6b78ac8ba1ecd021b28febfd207ca051956599d2381407dd879e74e7e4db612
Port:
Host Port:
Command:
/cp
/run-helm.sh,/merge
/milvus/tools/run-helm.sh,/milvus/tools/merge
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 16 May 2024 15:39:08 +0800
Finished: Thu, 16 May 2024 15:39:08 +0800
Ready: True
Restart Count: 0
Environment:
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
milvus-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-milvus
Optional: false
tools:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-vj8hn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 56m default-scheduler Successfully assigned default/my-milvus-rootcoord-787d8fd6b8-cl9gd to slave1
Normal Pulled 56m kubelet Container image "milvusdb/milvus-config-tool:v0.1.2" already present on machine
Normal Created 56m kubelet Created container config
Normal Started 56m kubelet Started container config
Normal Created 53m (x5 over 56m) kubelet Created container rootcoord
Normal Started 53m (x5 over 56m) kubelet Started container rootcoord
Normal Pulled 21m (x10 over 56m) kubelet Container image "milvusdb/milvus:v2.4.1" already present on machine
Warning BackOff 11m (x63 over 55m) kubelet Back-off restarting failed container
Warning Unhealthy 66s (x143 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

Anything else?

No response

@lijunfeng11 lijunfeng11 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
Copy link
Contributor

The title and description of this issue contains Chinese. Please use English to describe your issue.

@lijunfeng11
Copy link
Author

[root@master local-path-provisioner]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-milvus-datacoord-5dff6f95cb-qwfcs 1/1 Running 4 40m
my-milvus-datanode-7f4559b69f-fs58l 0/1 Running 10 40m
my-milvus-etcd-0 1/1 Running 0 40m
my-milvus-etcd-1 1/1 Running 0 40m
my-milvus-etcd-2 1/1 Running 0 40m
my-milvus-indexcoord-8987ddfb7-22zg2 1/1 Running 0 40m
my-milvus-indexnode-755787b4c4-xjd6s 1/1 Running 4 40m
my-milvus-minio-0 1/1 Running 0 40m
my-milvus-minio-1 1/1 Running 0 40m
my-milvus-minio-2 1/1 Running 0 40m
my-milvus-minio-3 1/1 Running 0 40m
my-milvus-proxy-67c766b8c9-mlwx7 0/1 Running 10 40m
my-milvus-pulsar-bookie-0 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-bookie-1 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-bookie-2 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-bookie-init-4njzx 0/1 Init:Error 0 38m
my-milvus-pulsar-bookie-init-9k26x 0/1 Init:Error 0 30m
my-milvus-pulsar-bookie-init-jsr85 0/1 Init:Error 0 40m
my-milvus-pulsar-bookie-init-ngssg 0/1 Init:Error 0 40m
my-milvus-pulsar-bookie-init-psczv 0/1 Init:Error 0 39m
my-milvus-pulsar-bookie-init-rzlpr 0/1 Init:Error 0 35m
my-milvus-pulsar-bookie-init-xp9d9 0/1 Init:Error 0 40m
my-milvus-pulsar-broker-0 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-proxy-0 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-recovery-0 0/1 Init:CrashLoopBackOff 12 40m
my-milvus-pulsar-zookeeper-0 0/1 CrashLoopBackOff 12 40m
my-milvus-querycoord-595cfb67bd-pv8wh 0/1 Running 10 40m
my-milvus-querynode-54d446464d-nvk28 1/1 Running 4 40m
my-milvus-rootcoord-787d8fd6b8-cl9gd 0/1 Running 10 40m

@yanliang567
Copy link
Contributor

please check why the pulsar pods are all failed.
/assign @lijunfeng11
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
@lijunfeng11
Copy link
Author

lijunfeng11 commented May 16, 2024

[root@master ~]# kubectl logs my-milvus-pulsar-bookie-0
Error from server (BadRequest): container "my-milvus-pulsar-bookie" in pod "my-milvus-pulsar-bookie-0" is waiting to start: PodInitializing

please check why the pulsar pods are all failed. /assign @lijunfeng11 /unassign
现在状态是这样的需要再让自启动么
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-milvus-datacoord-5dff6f95cb-phgn4 1/1 Running 3 26m
my-milvus-datanode-7f4559b69f-82gt7 0/1 Running 7 26m
my-milvus-etcd-0 1/1 Running 0 26m
my-milvus-etcd-1 1/1 Running 0 26m
my-milvus-etcd-2 1/1 Running 0 26m
my-milvus-indexcoord-8987ddfb7-822c2 1/1 Running 0 26m
my-milvus-indexnode-755787b4c4-247lw 1/1 Running 3 26m
my-milvus-minio-0 1/1 Running 0 26m
my-milvus-minio-1 1/1 Running 0 26m
my-milvus-minio-2 1/1 Running 0 26m
my-milvus-minio-3 1/1 Running 0 26m
my-milvus-proxy-67c766b8c9-gxkbs 0/1 Running 7 26m
my-milvus-pulsar-bookie-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-bookie-1 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-bookie-2 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-broker-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-proxy-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-recovery-0 0/1 Init:CrashLoopBackOff 7 14m
my-milvus-pulsar-zookeeper-0 0/1 CrashLoopBackOff 7 14m
my-milvus-querycoord-595cfb67bd-9wf4l 0/1 Running 7 26m
my-milvus-querynode-54d446464d-k9gt8 1/1 Running 3 26m
my-milvus-rootcoord-787d8fd6b8-cjrlx 0/1 Running 7 26m

@yanliang567
Copy link
Contributor

you need to figure out why pulsar pods are CrashLoopBackOff, describe the pod or share more info about it

@lijunfeng11
Copy link
Author

kubectl logs my-milvus-pulsar-bookie-0
Because I don't quite understand k8s, how should I read more error messages

[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-bookie-1
Name: my-milvus-pulsar-bookie-1
Namespace: default
Priority: 0
Node: master/192.168.6.242
Start Time: Thu, 16 May 2024 17:12:56 +0800
Labels: app=pulsar
cluster=my-milvus-pulsar
component=bookie
controller-revision-hash=my-milvus-pulsar-bookie-69ddb44cdd
release=my-milvus
statefulset.kubernetes.io/pod-name=my-milvus-pulsar-bookie-1
Annotations: cni.projectcalico.org/containerID: 81309020e6f72da0ef057d6cfe9617eee99a828834cd813a9ca1da6396f1b1f3
cni.projectcalico.org/podIP: 10.244.219.107/32
cni.projectcalico.org/podIPs: 10.244.219.107/32
prometheus.io/port: 8000
prometheus.io/scrape: true
Status: Pending
IP: 10.244.219.107
IPs:
IP: 10.244.219.107
Controlled By: StatefulSet/my-milvus-pulsar-bookie
Init Containers:
pulsar-bookkeeper-verify-clusterid:
Container ID: docker://ee682c6f895ca9baa3801ca0e0894ad0c6d32ef43b7f10c455dd1950358efc65
Image: apachepulsar/pulsar:2.8.2
Image ID: docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
Port:
Host Port:
Command:
sh
-c
Args:

  set -e; bin/apply-config-from-env.py conf/bookkeeper.conf;until bin/bookkeeper shell whatisinstanceid; do
    sleep 3;
  done;
  
State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Thu, 16 May 2024 17:49:27 +0800
  Finished:     Thu, 16 May 2024 17:49:27 +0800
Ready:          False
Restart Count:  12
Environment Variables from:
  my-milvus-pulsar-bookie  ConfigMap  Optional: false
Environment:               <none>
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b4kq2 (ro)

Containers:
my-milvus-pulsar-bookie:
Container ID:
Image: apachepulsar/pulsar:2.8.2
Image ID:
Ports: 3181/TCP, 8000/TCP
Host Ports: 0/TCP, 0/TCP
Command:
sh
-c
Args:
bin/apply-config-from-env.py conf/bookkeeper.conf;
OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar bookie;

State:          Waiting
  Reason:       PodInitializing
Ready:          False
Restart Count:  0
Requests:
  cpu:      1
  memory:   2Gi
Liveness:   http-get http://:8000/api/v1/bookie/state delay=10s timeout=5s period=30s #success=1 #failure=60
Readiness:  http-get http://:8000/api/v1/bookie/is_ready delay=10s timeout=5s period=30s #success=1 #failure=60
Environment Variables from:
  my-milvus-pulsar-bookie  ConfigMap  Optional: false
Environment:               <none>
Mounts:
  /pulsar/data/bookkeeper/journal from my-milvus-pulsar-bookie-journal (rw)
  /pulsar/data/bookkeeper/ledgers from my-milvus-pulsar-bookie-ledgers (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b4kq2 (ro)

Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
my-milvus-pulsar-bookie-journal:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-milvus-pulsar-bookie-journal-my-milvus-pulsar-bookie-1
ReadOnly: false
my-milvus-pulsar-bookie-ledgers:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-milvus-pulsar-bookie-ledgers-my-milvus-pulsar-bookie-1
ReadOnly: false
kube-api-access-b4kq2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 41m default-scheduler Successfully assigned default/my-milvus-pulsar-bookie-1 to master
Normal Pulled 39m (x5 over 40m) kubelet Container image "apachepulsar/pulsar:2.8.2" already present on machine
Normal Created 39m (x5 over 40m) kubelet Created container pulsar-bookkeeper-verify-clusterid
Normal Started 39m (x5 over 40m) kubelet Started container pulsar-bookkeeper-verify-clusterid
Warning BackOff 44s (x189 over 40m) kubelet Back-off restarting failed container

@yanliang567 yanliang567 changed the title k8s集群部署Milvus服务问题 fail to deploy ARM Milvus cluster on k8s May 16, 2024
@lijunfeng11
Copy link
Author

you need to figure out why pulsar pods are CrashLoopBackOff, describe the pod or share more info about it
zookeeper log

[root@master rbd-eventlog]# kubectl logs my-milvus-pulsar-zookeeper-0
exec /usr/bin/sh: exec format error
[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-zookeeper-0
Name:         my-milvus-pulsar-zookeeper-0
Namespace:    default
Priority:     0
Node:         master/192.168.6.242
Start Time:   Thu, 16 May 2024 17:12:57 +0800
Labels:       app=pulsar
              cluster=my-milvus-pulsar
              component=zookeeper
              controller-revision-hash=my-milvus-pulsar-zookeeper-5c6946568d
              release=my-milvus
              statefulset.kubernetes.io/pod-name=my-milvus-pulsar-zookeeper-0
Annotations:  cni.projectcalico.org/containerID: f240c1461b1460008d146f49ca2d751087a7a66795c36516b64b1579fa0b64a2
              cni.projectcalico.org/podIP: 10.244.219.106/32
              cni.projectcalico.org/podIPs: 10.244.219.106/32
              prometheus.io/port: 8000
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.219.106
IPs:
  IP:           10.244.219.106
Controlled By:  StatefulSet/my-milvus-pulsar-zookeeper
Containers:
  my-milvus-pulsar-zookeeper:
    Container ID:  docker://4f6ee2fc2b8668d3aa99c9ee8b2cd24e7aa76987bc1854c0a1958a06d73256ce
    Image:         apachepulsar/pulsar:2.8.2
    Image ID:      docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
    Ports:         8000/TCP, 2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      sh
      -c
    Args:
      bin/apply-config-from-env.py conf/zookeeper.conf;
      bin/generate-zookeeper-config.sh conf/zookeeper.conf; OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar zookeeper;
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 16 May 2024 18:04:38 +0800
      Finished:     Thu, 16 May 2024 18:04:38 +0800
    Ready:          False
    Restart Count:  15
    Requests:
      cpu:      300m
      memory:   1Gi
    Liveness:   exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Readiness:  exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Environment Variables from:
      my-milvus-pulsar-zookeeper  ConfigMap  Optional: false
    Environment:
      ZOOKEEPER_SERVERS:  my-milvus-pulsar-zookeeper-0,my-milvus-pulsar-zookeeper-1,my-milvus-pulsar-zookeeper-2
    Mounts:
      /pulsar/data from my-milvus-pulsar-zookeeper-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8tv8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  my-milvus-pulsar-zookeeper-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-milvus-pulsar-zookeeper-data-my-milvus-pulsar-zookeeper-0
    ReadOnly:   false
  kube-api-access-s8tv8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  54m                    default-scheduler  Successfully assigned default/my-milvus-pulsar-zookeeper-0 to master
  Normal   Started    53m (x4 over 54m)      kubelet            Started container my-milvus-pulsar-zookeeper
  Normal   Pulled     53m (x5 over 54m)      kubelet            Container image "apachepulsar/pulsar:2.8.2" already present on machine
  Normal   Created    53m (x5 over 54m)      kubelet            Created container my-milvus-pulsar-zookeeper
  Warning  BackOff    4m37s (x256 over 54m)  kubelet            Back-off restarting failed container
[root@master rbd-eventlog]# kubectl logs my-milvus-pulsar-zookeeper-0
exec /usr/bin/sh: exec format error
[root@master rbd-eventlog]# kubectl describe pod my-milvus-pulsar-zookeeper-0
Name:         my-milvus-pulsar-zookeeper-0
Namespace:    default
Priority:     0
Node:         master/192.168.6.242
Start Time:   Thu, 16 May 2024 17:12:57 +0800
Labels:       app=pulsar
              cluster=my-milvus-pulsar
              component=zookeeper
              controller-revision-hash=my-milvus-pulsar-zookeeper-5c6946568d
              release=my-milvus
              statefulset.kubernetes.io/pod-name=my-milvus-pulsar-zookeeper-0
Annotations:  cni.projectcalico.org/containerID: f240c1461b1460008d146f49ca2d751087a7a66795c36516b64b1579fa0b64a2
              cni.projectcalico.org/podIP: 10.244.219.106/32
              cni.projectcalico.org/podIPs: 10.244.219.106/32
              prometheus.io/port: 8000
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.219.106
IPs:
  IP:           10.244.219.106
Controlled By:  StatefulSet/my-milvus-pulsar-zookeeper
Containers:
  my-milvus-pulsar-zookeeper:
    Container ID:  docker://2dd275871a968cdb3eb1eee8d69f30f228c5c8502f4fdac9a063304c3960ba1f
    Image:         apachepulsar/pulsar:2.8.2
    Image ID:      docker-pullable://apachepulsar/pulsar@sha256:d538416d5afe03360e10d5beb44bdad33d7303d137fc66c264108426875f61c6
    Ports:         8000/TCP, 2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      sh
      -c
    Args:
      bin/apply-config-from-env.py conf/zookeeper.conf;
      bin/generate-zookeeper-config.sh conf/zookeeper.conf; OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar zookeeper;
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 16 May 2024 18:09:44 +0800
      Finished:     Thu, 16 May 2024 18:09:44 +0800
    Ready:          False
    Restart Count:  16
    Requests:
      cpu:      300m
      memory:   1Gi
    Liveness:   exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Readiness:  exec [bin/pulsar-zookeeper-ruok.sh] delay=10s timeout=5s period=30s #success=1 #failure=10
    Environment Variables from:
      my-milvus-pulsar-zookeeper  ConfigMap  Optional: false
    Environment:
      ZOOKEEPER_SERVERS:  my-milvus-pulsar-zookeeper-0,my-milvus-pulsar-zookeeper-1,my-milvus-pulsar-zookeeper-2
    Mounts:
      /pulsar/data from my-milvus-pulsar-zookeeper-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8tv8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  my-milvus-pulsar-zookeeper-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-milvus-pulsar-zookeeper-data-my-milvus-pulsar-zookeeper-0
    ReadOnly:   false
  kube-api-access-s8tv8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  58m                    default-scheduler  Successfully assigned default/my-milvus-pulsar-zookeeper-0 to master
  Normal   Started    57m (x4 over 58m)      kubelet            Started container my-milvus-pulsar-zookeeper
  Normal   Pulled     56m (x5 over 58m)      kubelet            Container image "apachepulsar/pulsar:2.8.2" already present on machine
  Normal   Created    56m (x5 over 58m)      kubelet            Created container my-milvus-pulsar-zookeeper
  Warning  BackOff    3m13s (x280 over 58m)  kubelet            Back-off restarting failed container

@yanliang567
Copy link
Contributor

/assign @LoveEachDay
please help to take a look

@lijunfeng11
Copy link
Author

lijunfeng11 commented May 16, 2024

/assign @LoveEachDay please help to take a look
image

[root@master k8s-Milvus]# kubectl get pods -n my-milvus-zookeeper-0
No resources found in my-milvus-zookeeper-0 namespace.
我现在换为kafka进行启动,只有zookeeper和kafka启动不了,如应该如何查看
dccfa22b07fc43497a391e97f535a3d

87ffdec21f8fb77c7cdf079390aa77c
`[root@master k8s-Milvus]# kubectl describe pod my-milvus-zookeeper-0 -n default
Name: my-milvus-zookeeper-0
Namespace: default
Priority: 0
Node: master/192.168.6.242
Start Time: Thu, 16 May 2024 18:25:46 +0800
Labels: app.kubernetes.io/component=zookeeper
app.kubernetes.io/instance=my-milvus
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=zookeeper
controller-revision-hash=my-milvus-zookeeper-76fd4b8cf7
helm.sh/chart=zookeeper-8.1.2
statefulset.kubernetes.io/pod-name=my-milvus-zookeeper-0
Annotations: cni.projectcalico.org/containerID: 8637b832df7ef4d407e74a2452ff3967b113ed889a848ead27f2209540cf3a78
cni.projectcalico.org/podIP: 10.244.219.105/32
cni.projectcalico.org/podIPs: 10.244.219.105/32
Status: Running
IP: 10.244.219.105
IPs:
IP: 10.244.219.105
Controlled By: StatefulSet/my-milvus-zookeeper
Containers:
zookeeper:
Container ID: docker://c7efe7602de9c2252a3d9ec84e1ee63cb21ec7904dabf08cf95271d43d0a4b8f
Image: docker.io/bitnami/zookeeper:3.7.0-debian-10-r320
Image ID: docker-pullable://bitnami/zookeeper@sha256:c19c5473ef3feb8a0db00b92891c859915d06f7b888be4b3fdb78aaca109cd1f
Ports: 2181/TCP, 2888/TCP, 3888/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/scripts/setup.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 16 May 2024 19:00:26 +0800
Finished: Thu, 16 May 2024 19:00:26 +0800
Ready: False
Restart Count: 11
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 250m
memory: 256Mi
Liveness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
ZOO_DATA_LOG_DIR:
ZOO_PORT_NUMBER: 2181
ZOO_TICK_TIME: 2000
ZOO_INIT_LIMIT: 10
ZOO_SYNC_LIMIT: 5
ZOO_PRE_ALLOC_SIZE: 65536
ZOO_SNAPCOUNT: 100000
ZOO_MAX_CLIENT_CNXNS: 60
ZOO_4LW_COMMANDS_WHITELIST: srvr, mntr, ruok
ZOO_LISTEN_ALLIPS_ENABLED: no
ZOO_AUTOPURGE_INTERVAL: 0
ZOO_AUTOPURGE_RETAIN_COUNT: 3
ZOO_MAX_SESSION_TIMEOUT: 40000
ZOO_SERVERS: my-milvus-zookeeper-0.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::1 my-milvus-zookeeper-1.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::2 my-milvus-zookeeper-2.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::3
ZOO_ENABLE_AUTH: no
ZOO_HEAP_SIZE: 1024
ZOO_LOG_LEVEL: ERROR
ALLOW_ANONYMOUS_LOGIN: yes
POD_NAME: my-milvus-zookeeper-0 (v1:metadata.name)
Mounts:
/bitnami/zookeeper from data (rw)
/scripts/setup.sh from scripts (rw,path="setup.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ww4sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-milvus-zookeeper-0
ReadOnly: false
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-milvus-zookeeper-scripts
Optional: false
kube-api-access-ww4sn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 35m default-scheduler Successfully assigned default/my-milvus-zookeeper-0 to master
Normal Pulling 35m kubelet Pulling image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320"
Normal Pulled 31m kubelet Successfully pulled image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320" in 3m31.33228944s
Normal Created 30m (x4 over 31m) kubelet Created container zookeeper
Normal Started 30m (x4 over 31m) kubelet Started container zookeeper
Normal Pulled 30m (x4 over 31m) kubelet Container image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320" already present on machine
Warning BackOff 19s (x163 over 31m) kubelet Back-off restarting failed containe`

@lijunfeng11
Copy link
Author

lijunfeng11 commented May 17, 2024

现在我尝试直接用外部的kafka进行连接配置文件
helmConfigYml.txt

运行之后报错
image
配置文件是应该这样配置么?尝试了好多次也没有成功

@lijunfeng11
Copy link
Author

/assign @LoveEachDay please help to take a look
您好,现在这个问题有什么进展么?我在单独安装zookeeper的时候也是有这个执行不了的问题,应该这个镜像里面没有arm 的包吧
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
be5cb546c61c6ea6df39c65b1584a89522dcab6c87acb8a12bdedc72c866e5d7

在咨询个问题,用外部的kafka的话是需要安装在dokcer里面的么?因为我这个是没有安装在Docker
里面的一直连接不上外部的kafka其中里面也是没有加认证的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants