Kubernetes Cilium 网络插件升级

Summary: Author: 张亚飞 | 阅读时间: 4 minute read | Published: 2019-10-28
Filed under Categories: KubernetesTags: Kubernetes, Cilium,

Kubernetes 升级到最新版 v1.16.2 后节点一直是 NotReady 状态

双节点集群启动后一直是 NotReady 状态

Mon Oct 28 15:14:32 coam@a.us.1:~$ kubectl get nodes
NAME     STATUS     ROLES    AGE     VERSION
a.us.0   NotReady   <none>   2m29s   v1.16.2
a.us.1   NotReady   master   3m12s   v1.16.2

查看 Pods 网络插件 coredns 状态:

Mon Oct 28 15:14:40 coam@a.us.1:~$ kubectl get pods --all-namespaces -owide
NAMESPACE     NAME                                    READY   STATUS             RESTARTS   AGE     IP              NODE     NOMINATED NODE   READINESS GATES
kube-system   cilium-8tkcw                            0/1     Running            0          2m58s   172.31.141.97   a.us.1   <none>           <none>
kube-system   cilium-bplvz                            0/1     Running            0          2m58s   172.31.141.98   a.us.0   <none>           <none>
kube-system   cilium-etcd-operator-8597d48bc6-s4ffg   0/1     CrashLoopBackOff   4          2m58s   172.31.141.98   a.us.0   <none>           <none>
kube-system   cilium-operator-57cc5d7dfb-bdmsm        0/1     Pending            0          2m57s   <none>          <none>   <none>           <none>
kube-system   coredns-5644d7b6d9-dff4s                0/1     Pending            0          3m25s   <none>          <none>   <none>           <none>
kube-system   coredns-5644d7b6d9-qf2bt                0/1     Pending            0          3m25s   <none>          <none>   <none>           <none>
kube-system   etcd-a.us.1                             1/1     Running            0          2m22s   172.31.141.97   a.us.1   <none>           <none>
kube-system   kube-apiserver-a.us.1                   1/1     Running            0          2m37s   172.31.141.97   a.us.1   <none>           <none>
kube-system   kube-controller-manager-a.us.1          1/1     Running            0          2m41s   172.31.141.97   a.us.1   <none>           <none>
kube-system   kube-proxy-jtz6z                        1/1     Running            0          3m24s   172.31.141.97   a.us.1   <none>           <none>
kube-system   kube-proxy-kfr9h                        1/1     Running            0          3m      172.31.141.98   a.us.0   <none>           <none>
kube-system   kube-scheduler-a.us.1                   1/1     Running            0          2m44s   172.31.141.97   a.us.1   <none>           <none>
kube-system   kubernetes-dashboard-86f47f5775-cxkgr   0/1     Pending            0          2m59s   <none>          <none>   <none>           <none>

发现 coredns 一直是 Pending 状态.查看 coredns 信息

Mon Oct 28 15:15:11 coam@a.us.1:~$ kubectl describe pod coredns-5644d7b6d9-dff4s -n kube-system
Name:                 coredns-5644d7b6d9-dff4s
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 <none>
Labels:               k8s-app=kube-dns
                      pod-template-hash=5644d7b6d9
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-5644d7b6d9
Containers:
  coredns:
    Image:       k8s.gcr.io/coredns:1.6.2
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-m4thq (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-m4thq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-m4thq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.

可以看到错误: 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.,怀疑主节点不允许调度,但查看节点已经可以排除污点了

$ kubectl describe nodes --all-namespaces | grep Taints -A2
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
--
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
$ kubectl taint nodes --all node-role.kubernetes.io/master-
taint "node-role.kubernetes.io/master" not found
taint "node-role.kubernetes.io/master" not found

查看系统日志可以看到大量 Unable to update cni config: no valid networks found in /etc/cni/net.d 错误:

$ sudo journalctl -f
Oct 28 15:12:23 a.us.1 kubelet[2269]: W1028 15:12:23.163168    2269 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d
Oct 28 15:12:23 a.us.1 kubelet[2269]: E1028 15:12:23.821706    2269 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 28 15:12:28 a.us.1 kubelet[2269]: W1028 15:12:28.194305    2269 cni.go:202] Error validating CNI config &{cilium  false [0xc0009ddd20] [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 34 44 34 110 97 109 101 34 58 34 99 105 108 105 117 109 34 44 34 112 108 117 103 105 110 115 34 58 91 123 34 110 97 109 101 34 58 34 99 105 108 105 117 109 34 44 34 116 121 112 101 34 58 34 99 105 108 105 117 109 45 99 110 105 34 125 93 125]}: [plugin cilium-cni does not support config version ""]

注意一条错误警告: plugin cilium-cni does not support config version ""

网上有用户反映升级到 v1.16.2flannel 网络插件也出现类似问题,而我网络插件用的是 cilium,参照修改 /etc/cni/net.d/05-cilium.conf 配置增加 "cniVersion": "0.2.0":

vim /etc/cni/net.d/05-cilium.conf

{
     "name": "cilium",
     "cniVersion": "0.2.0",
     "type": "cilium-cni"
}

可以看到节点状态已恢复正常:

$ kubectl get nodes
NAME     STATUS     ROLES    AGE   VERSION
a.us.0   NotReady   <none>   16m   v1.16.2
a.us.1   Ready      master   17m   v1.16.2

再次查看 Pods 网络 coredns 已恢复 Running 状态:

Mon Oct 28 15:31:12 coam@a.us.1:~/Server/Run/run_s/kubernetes/run$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY   STATUS              RESTARTS   AGE
kube-system   cilium-8tkcw                            0/1     Running             5          19m
kube-system   cilium-bplvz                            1/1     Running             5          19m
kube-system   cilium-etcd-operator-8597d48bc6-s4ffg   0/1     CrashLoopBackOff    8          19m
kube-system   cilium-operator-57cc5d7dfb-bdmsm        1/1     Running             2          19m
kube-system   coredns-5644d7b6d9-dff4s                1/1     Running             1          19m
kube-system   coredns-5644d7b6d9-qf2bt                1/1     Running             1          19m
kube-system   etcd-a.us.1                             1/1     Running             0          18m
kube-system   kube-apiserver-a.us.1                   1/1     Running             0          18m
kube-system   kube-controller-manager-a.us.1          1/1     Running             0          18m
kube-system   kube-proxy-jtz6z                        1/1     Running             0          19m
kube-system   kube-proxy-kfr9h                        1/1     Running             0          19m
kube-system   kube-scheduler-a.us.1                   1/1     Running             0          18m
kube-system   kubernetes-dashboard-86f47f5775-cxkgr   0/1     ContainerCreating   0          19m

注意: 升级到新版 cilium:v1.6.3 已经支持了此配置,升级 cilium 后默认安装的配置为:

$ cat /etc/cni/net.d/05-cilium.conf
{
  "cniVersion": "0.3.1",
  "name": "cilium",
  "type": "cilium-cni"
}

Reference


Comments

  • 哈哈哈 says: 2019-11-15 17:33:02

    哈哈

Cor-Ethan, the beverage → www.iirii.com