K8S问题解决


2020年3月3日 11:07     admin

高可用新master加不进集群

原来master节点故障,无法恢复,使用新的虚拟机master集群加入集群


解决:需要在正常的etcd集群里,把旧的故障节点etcd信息remove掉


Nacos漏洞:

  1. https://help.aliyun.com/zh/mse/product-overview/nacos-security-risk-description-about-nacos-default-token-secret-key-risk-description

安装问题

错误:

error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to decode cluster configuration data: no kind “ClusterConfiguration” is registered for version “kubeadm.k8s.io/v1beta2”

解决:

执行 join 的 kubeadm 版本与执行 init 的 kubeadm 版本不匹配,升级 kubeadm 版本

错误:

node加入master时:

error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition

解决:

  1. swapoff -a # will turn off the swap
  2. kubeadm reset
  3. systemctl daemon-reload
  4. systemctl restart kubelet
  5. iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

问题一:

Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)

  1. 解决:
  2. mkdir -p $HOME/.kube
  3. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  4. sudo chown $(id -u):$(id -g) $HOME/.kube/config

错误:

failed to update node lease, error: Operation cannot be fulfilled

  1. 原因是没有办法对加入节点进行操作,node节点的hostnamemaster节点的hostname相同
  2. 解决:
  3. node节点上修改hostname
  4. hostname work1
  5. #重启kubeadm
  6. kubeadm reset
  7. #重新加入
  8. kubeadm join <masterIP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash <discovery-token-ca-cert-hash>

错误:

failed to set bridge addr: “cni0” already has an IP address different from 10.244.1.1/24

  1. 将这个错误的网卡删除掉,之后会自动重建。
  2. 下面我们删除错误的cni0,然后让它自己重建,操作过程如下:
  3. sudo ifconfig cni0 down
  4. sudo ip link delete cni0

错误:

k8s dns主机名无法正常解析 coredns服务一直处于 CrashLoopBackOff状态

  1. vim /etc/resolv.conf
  2. #将nameserver临时修改为114.114.114.114 ,这只是临时办法 根本原因及永久办法请参考
  3. #解决ubuntu系统 dns覆盖写入127.0.0.53的问题
  1. kubectl edit deployment coredns -n kube-system
  2. #将replicates改为0,从而停止已经启动的coredns pod
  3. kubectl edit deployment coredns -n kube-system
  4. #再将replicates改为2,触发coredns重新读取系统配置
  5. kubectl get pods -n kube-system
  6. #检查服务状态为Running

pod跨节点网络不通,flannel无效

  1. systemctl stop kubelet
  2. systemctl stop docker
  3. iptables --flush
  4. iptables -tnat --flush
  5. systemctl restart kubelet
  6. systemctl restart docker

pod网络不通

kubernetes集群节点多网卡,calico/flannel组件如何指定网卡


1、calico如果有节点是多网卡,所以需要在deploy的env指定内网网卡


  1. spec:
  2. containers:
  3. - env:
  4. - name: DATASTORE_TYPE
  5. value: kubernetes
  6. - name: IP_AUTODETECTION_METHOD # DaemonSet中添加该环境变量
  7. value: "interface=ens33,eth0" # 指定内网网卡
  8. - name: WAIT_FOR_DATASTORE
  9. value: "true"

2、flannel如果有节点是多网卡,在deploy的args中指定网卡

  1. containers:
  2. - name: kube-flannel
  3. image: quay.io/coreos/flannel:v0.10.0-amd64
  4. command:
  5. - /opt/bin/flanneld
  6. args:
  7. - --ip-masq
  8. - --kube-subnet-mgr
  9. - --iface=ens33
  10. - --iface=eth0

运行kubectl tab时出现以下报错

  1. [root@master bin]# kubectl c-bash: _get_comp_words_by_ref: command not found

解决方法:

  1. 1.安装bash-completion
  2. [root@master bin]# yum install bash-completion -y
  3. 2.执行bash_completion
  4. [root@master bin]# source /usr/share/bash-completion/bash_completion
  5. 3.重新加载kubectl completion
  6. [root@master bin]# source <(kubectl completion bash)