k8s之kubeadm部署集群

k8s之kubeadm部署集群

kubeadm简介

kubeadm部署过程解析

官方介绍:kubeadm部署中步骤解析

kubernetes集群部署方式

​ 常用kubernetes的集群部署方式:

  • minikube:单机的伪分布式集群,适合测试,新手入门
  • kubeadm部署工具:算是kubernetes一键部署工具
  • 二进制部署:手动部署kubernetes各个组件为节点daemon进程,需要手动定义一系列认证信息等,较为繁琐,可参考github相关ansible playbook一键部署

kubernetes集群运行方式

  • 独立的守护进程
  • 静态pod方式
  • 自托管pod方式

独立的守护进程:

​ master节点的进程:kube-apiserver、kube-schedule、kube-controller-manager、etcd以节点上daemon进程的方式运行;master节点就是一般的linux主机即可

静态pod方式

​ master节点的进程:kube-apiserver、kube-schedule、kube-controller-manager、etcd以静态pod方式运行为容器进程;master节点需要kubelet和docker环境,借此管理api-server等容器;

自托管pod方式

​ master节点的进程:kube-apiserver、kube-schedule、kube-controller-manager、etcd仍运行为pod,只是受daemonset控制器管理:master节点也需要kubelet和docker环境,借此管理api-server等容器;

kubeadm可以选择运行为静态pod、或自托管pod方式;默认为静态pod;kubeadm --int 采用参数--features-gates=selfHosting即可选择自托管方式;

kubeadm实验环境

kubeadm部署总结

  1. 主机基础环境准备
    1. 时间同步
    2. 主机名解析
    3. 防火墙关闭
    4. selinux禁用
    5. 禁用主机上swap设备或后续kubeadm配置忽略swap未关闭报错
    6. (可选)脚本加载ipvs mudule(新版k8s采用ipvs做service规则配置时需要)
  2. 集群初始化
    1. master/node节点安装docker、kubelet、kubeadm
      1. (根据需要配置docker镜像加速、代理、forward链策略等配置)
      2. (kubelet忽略swap错误)
    2. master节点kubeadm init初始化
      1. 命令行或yaml文件方式
      2. kubeadm init过程分析
    3. node节点kubeadm join加入
      1. node节点加入时过程分析
    4. master节点拉取部署网络插件
    5. 安装配置kubectl
  3. 部署完成验证
    1. 查看集群信息
    2. 如何移除node

kubeadm安装

kubeadm集群部署官方参考文档

集群结构示意图

下图中节点网络改为同一个网段的虚拟机网段:192.168.80.0/24

image-20201106162503597

基础环境与版本信息

​ 实验用机为vmvare workstation 15上安装的虚拟机:1台master节点、3台node节点,1台单独kubectl客户端,共5台虚拟机:

master:192.168.80.101

node1:192.168.80.106
node2:192.168.80.107
node3:192.168.80.108

client:192.168.80.102
# 虚拟机为vmnet8的nat网络,可以借助实验pc机访问外网;
  • OS:CentOS Linux release 7.5.1804 (Core) | 3.10.0-862.el7.x86_64
  • docker:Docker version 18.03.1-ce, build 9ee9f40
  • kubernetes:1.12

部署集群过程

主机环境准备

时间同步

[root@master ~]# systemctl start chronyd
[root@master ~]# systemctl enable chronyd

# 安装chronyd包,直接启动,采用默认centos站点的时间服务器同步时间即可;也可换为国内时间服务器ip
# 内网一般有单独的时间服务器,
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.centos.pool.ntp.org iburst

主机名解析

# 注意ip在前;
[root@master ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.80.101 master
192.168.80.106 node1
192.168.80.107 node2
192.168.80.108 node3
192.168.80.102 client

[root@master ~]# scp /etc/hosts root@192.168.80.106:/etc/hosts
[root@master ~]# scp /etc/hosts root@192.168.80.107:/etc/hosts
...

# 因实验主机较少,采用直接编辑hosts文件即可;
# 生产中大量主机一般需要dns解析服务

防火墙关闭

[root@master ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

selinux禁用

[root@master ~]# getenforce 
Disabled
        
[root@master ~]# grep dis /etc/selinux/config 
#     disabled - No SELinux policy is loaded.
SELINUX=disabled

swap禁用

[root@master ~]# swapoff -a

# 禁用swap设备,
# kubeadm会检查节点的swap有无禁用,没禁用时默认会部署失败,swap设备会有影响性能,一般不应该开启,
# 对于有swap设备的节点应该swapoff -a禁用,并在/etc/fstab中注释掉,
# 也可:在kubelet配置文件、kubeadm init和join时,分别加上忽略swap设备未禁用错误 的配置

加载ipvs模块

​ kubernetes1.11后,采用ipvs作为sevice的规则实现,因此需要加载ipvs相关模块,但ipvs只实现负载均衡部分,snat等功能仍需要iptables实现,

​ 加载脚本:

[root@master ~]# vim /etc/sysconfig/modules/ipvs.modules
[root@master ~]# sh -n !$
sh -n /etc/sysconfig/modules/ipvs.modules
[root@master ~]# cat !$
cat /etc/sysconfig/modules/ipvs.modules
#!/bin/bash

ipvs_module_dir="/usr/lib/modules/$(uname -r)/kernel/net/netfilter/ipvs"

for i in $(ls $ipvs_module_dir |grep -o "^[^.]*"); do

	/sbin/modinfo -F filename $i &> /dev/null
	if [ $? -eq 0 ];then
		/sbin/modprobe $i
	fi
done


[root@master ~]# chmod +x /etc/sysconfig/modules/ipvs.modules 
[root@master ~]# sh  /etc/sysconfig/modules/ipvs.modules 

# 并复制到其他的master/node节点,然后加载模块;

部署集群

​ kubeadm部署集群过程大致如下:

  1. master和node节点安装docker环境、kubelet、kubeadm包;
    1. docker配置加速、代理、accept forward链配置;
    2. kubelet配置忽略swap错误;enable
  2. master节点上kubeadm init初始化集群;
  3. node节点上kubeadm join加入master上初始化后集群;
  4. master节点拉取并部署网络插件容器,如flannel,提供pod和service网络;
  5. 部署kubectl客户端,(可选择部署在单独的客户端节点)

安装docker、kubelet、kubeadm

0、虚拟机访问“互联网”的配置

​ 本设置可以实现虚拟机通过pc而访问”科学的网络“,但是只是为了拉取k8s组件镜像的话,代理配置在docker的unitfile即可,且该系统级别的配置,导致了后面的kubeadm init老失败!

# 配置ssr开启,并允许局域网的连接

# 虚拟机配置pc的ip作为代理,端口默认为1080,示例如下:

[root@node3 ~]# cat /etc/profile.d/http_proxy.sh 
http_proxy=http://192.168.31.107:1080
https_proxy=http://192.168.31.107:1080
export http_proxy
export https_proxy

1、master和3个node节点均安装dokcer-ce-18.03

1、安装yum-config工具
yum install -y yum-utils

2、配置docker-ce的源
yum-config-manager \
    --add-repo \
    https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo 
    
3、列出所有提供的docker-ce版本
yum list docker-ce --showduplicates | sort -r

4、指定docker-ce版本安装
[root@host2 ~]# yum install docker-ce-18.03.1.ce-1.el7.centos docker-ce-cli-18.03.1.ce-1.el7.centos containerd.io

# master和3个node均安装了docker-ce1803版本;
[root@master ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40

[root@node1 ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40

[root@node2 ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40

[root@node3 ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40

2、配置阿里云的kubernetes仓库安装kubeadm、kubelet、kubectl(可选)

# 配置kubernetes仓库地址;采用阿里云镜像仓库

[root@master ~]# cat /etc/yum.repos.d/k8s.repo 
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

# 列出所有提供的版本
[root@master ~]# yum list kubelet kubeadm kubectl  --showduplicates|sort -r | tail
kubeadm.x86_64                       1.10.2-0                         kubernetes
kubeadm.x86_64                       1.10.13-0                        kubernetes
kubeadm.x86_64                       1.10.12-0                        kubernetes
kubeadm.x86_64                       1.10.11-0                        kubernetes
kubeadm.x86_64                       1.10.1-0                         kubernetes
kubeadm.x86_64                       1.10.10-0                        kubernetes
kubeadm.x86_64                       1.10.0-0                         kubernetes
...

# 安装指定版本,此处安装kubeadm,kubelet,kubectl 均为1.12版本;
# 主节点安装如下:
[root@master ~]# yum install -y kubeadm-1.12.9 kubelet-1.12.9 kubectl-1.12.9 kubernetes-cni-0.7.5-0.x86_64

# 3个node节点安装,除去了kubectl安装:
# 配置了代理后,无法访问阿里云仓库,这里先暂时取消代理
[root@node1 ~]# unset http_proxy
[root@node1 ~]# unset https_proxy
[root@node1 ~]# yum install -y kubeadm-1.12.9  kubelet-1.12.9 kubernetes-cni-0.7.5-0.x86_64

kubelet配置

[root@master ~]# vim /etc/sysconfig/kubelet 
[root@master ~]# cat !$
cat /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.

# 开机启动,并配置忽略swap未关闭报错;无需手动启动,kubeadm init时会生成其所需证书等文件,并启动kubelet,

3、配置docker加速

​ 此处采用阿里云提供的个人加速地址

[root@master ~]# cat /etc/docker/daemon.json 
{
  "registry-mirrors": ["https://****.mirror.aliyuncs.com"]
}

4、配置docker代理(可选)

​ 在dokcer的服务启动脚本中,添加环境变量:environment,对于无法访问真正互联网的环境,需要该配置

[root@master ~]# grep Environ /usr/lib/systemd/system/docker.service 
Environment="HTTP_PROXY=http://192.168.31.107:1080"

# 在[service]配置段加上即可

# 查看是否生效,31.107为实验pc机上在局域网获得的dhcp地址,注意地址可能会过期;
[root@master ~]# docker info |grep Proxy
HTTP Proxy: http://192.168.31.107:1080

5、配置docker对forward链默认策略

​ docker自1.13版本后对forward链默认策略为drop,可能影响k8s的报文转发功能;

​ 在dokcer的服务启动脚本中,修改启动命令如下:

[root@master ~]# iptables -vnL |grep FORWARD
Chain FORWARD (policy DROP 0 packets, 0 bytes)
[root@master ~]# vim /usr/lib/systemd/system/docker.service 
[root@master ~]# grep StartPost !$
grep StartPost /usr/lib/systemd/system/docker.service
ExecStartPost=/usr/sbin/iptables -P FORWARD ACCEPT
# 在execstart后加一行startpost,修改docker对forward的链的默认策略
# 修改后默认为accept
[root@master ~]# iptables -vnL |grep FORWARD
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)


[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl restart docker

安装后,各节点的包:

[root@master ~]# rpm -qa|grep kube
kubelet-1.12.9-0.x86_64
kubernetes-cni-0.7.5-0.x86_64
kubectl-1.12.9-0.x86_64
kubeadm-1.12.9-0.x86_64
# 主节点

[root@client ~]# rpm -qa|grep kube
kubectl-1.12.9-0.x86_64
# 客户端


[root@node1 ~]# rpm -qa |grep kube
kubelet-1.12.9-0.x86_64
kubernetes-cni-0.7.5-0.x86_64
kubectl-1.19.3-0.x86_64
kubeadm-1.12.9-0.x86_64
# 各个node节点;node节点未安装kubectl,但会被依赖而安装,且安装的是最新版本1.19

kubeadm init

[root@master ~]# kubeadm  init --kubernetes-version=v1.12.9 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --apiserver-advertise-address=0.0.0.0 --ignore-preflight-errors=Swap

# 指定安装版本
# 指定pod和service网络的网段
# 指定apiserver的监听,通信地址,0.0.0.0为本机所有ip,也可指定某网卡的单一ip
# 忽略环境检查时的swap未关闭错误

最终成功输出

[root@master ~]# kubeadm  init --kubernetes-version=v1.12.9 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --apiserver-advertise-address=0.0.0.0 --ignore-preflight-errors=Swap
[init] using Kubernetes version: v1.12.9
[preflight] running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [192.168.80.101 127.0.0.1 ::1]
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.80.101]
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certificates] Generated sa key and public key.
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 21.002522 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.12" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node master as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node master as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master" as an annotation
[bootstraptoken] using token: rdif9c.h8084b6polru2bde
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.80.101:6443 --token rdif9c.h8084b6polru2bde --discovery-token-ca-cert-hash sha256:e64e4335f619b04dc17a004c626876451f96c80005aea0e05ed73269033bddbd

配置主节点kubectl

​ 又受到了系统代理设置影响,配置好后kubectl报错,unknown host,设置删除其变量文件后即可;

[root@master ~]# rm -rf /etc/profile.d/http_proxy.sh 
[root@master ~]# logout
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
  
  
[root@master ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   
[root@master ~]# kubectl get nodes
NAME     STATUS     ROLES    AGE    VERSION
master   NotReady   master   106m   v1.12.9
[root@master ~]# 

各节点运行的pod

master:

[root@master ~]# docker ps -a  |grep kube |grep -v pause
97709d637ee8        295526df163c           "/usr/local/bin/kube…"   2 hours ago         Up 2 hours                              k8s_kube-proxy_kube-proxy-hqh4m_kube-system_0cb936d7-2321-11eb-8d73-000c292d5d7c_0
705848fe4797        c79506ccc1bc           "kube-scheduler --ad…"   2 hours ago         Up 2 hours                              k8s_kube-scheduler_kube-scheduler-master_kube-system_63688448321beb5bd69c28a75cff89a4_0
82cc49978e77        f473e8452c8e           "kube-controller-man…"   2 hours ago         Up 2 hours                              k8s_kube-controller-manager_kube-controller-manager-master_kube-system_45e2f4eec8cacc28e2e3a79e0ef42efc_0
b761d3dc1486        8ea704c2d4a7           "kube-apiserver --au…"   2 hours ago         Up 2 hours                              k8s_kube-apiserver_kube-apiserver-master_kube-system_f6b13d90320c48e622e8a00e1ce786e9_0
a5fb6d7e2943        3cab8e1b9802           "etcd --advertise-cl…"   2 hours ago         Up 2 hours                              k8s_etcd_etcd-master_kube-system_d2f4eef271e0e5c9c74810d2ad56adce_0

# 目前共有5个pod:api-server、scheduler、controller-manager、etcd、kube-proxy,以及5个pause的模版pod

遇到错误1

​ 内核参数未设置;

[root@master ~]# kubeadm  init --kubernetes-version=v1.12.1 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --apiserver-advertise-address=0.0.0.0 --ignore-preflight-errors=Swap
[init] using Kubernetes version: v1.12.1
[preflight] running pre-flight checks
	[WARNING HTTPProxy]: Connection to "https://192.168.80.101" uses proxy "http://192.168.31.107:1080". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://192.168.31.107:1080". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING HTTPProxyCIDR]: connection to "10.244.0.0/16" uses proxy "http://192.168.31.107:1080". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
[preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`


# 修改:
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

错误2

​ 代理设置格式错误

[preflight] Some fatal errors occurred:
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.12.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: invalid proxy URL port "1080,HTTPS_PROXY=https:"

# 修改:去掉了https的代理
[root@master ~]# grep Environ /usr/lib/systemd/system/docker.service 
Environment="HTTP_PROXY=http://192.168.31.107:1080"

错误3

​ 设置的系统级别http代理影响了api-server的访问;

[root@master ~]# kubeadm  init --kubernetes-version=v1.12.1 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --apiserver-advertise-address=0.0.0.0 --ignore-preflight-errors=Swap
[init] using Kubernetes version: v1.12.1
[preflight] running pre-flight checks
	[WARNING HTTPProxy]: Connection to "https://192.168.80.101" uses proxy "http://192.168.31.107:1080". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://192.168.31.107:1080". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING HTTPProxyCIDR]: connection to "10.244.0.0/16" uses proxy "http://192.168.31.107:1080". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [192.168.80.101 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.80.101]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certificates] Generated sa key and public key.
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
[init] this might take a minute or longer if the control plane images have to be pulled

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster


# 更正:
# 排查过程:
1、kubelet和docker用的cgroupfs不一致问题,修改kubelet的unitfile,改为一致;
	但没用
2、版本bug,
	升级,降级后都没用
3、域名解析问题,
	域名解析正常,不是这个问题
4、pod也都运行正常,端口在监听,telnet也通

5、最终问题:网络访问问题,在该博客上,https://blog.fleeto.us/post/kubeadm-traps/列出的第二点:
“kubeadm init过程首先会检查代理服务器,确定跟 kube-apiserver 的 https 连接方式,如果有代理设置,会提出警告。”
# 才注意到init时,开头的警告部分:[WARNING HTTPProxy]: Connection to "https://192.168.80.101" uses proxy "http://192.168.31.107:1080". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://192.168.31.107:1080". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING HTTPProxyCIDR]: connection to "10.244.0.0/16" uses proxy 
# 再结合kubelet的日志报错,Unable to register node "master" with API server: Post https://192.168.80.101:6443/api/v1/nodes: dial tcp 192.168.80.101:6443: connect: connection refused

# 怀疑是因为之前设置宿主机的ip:1080做拉取谷歌容器的代理,变量 http(s)_proxy,根据警告部分得知,访问service和pod网络时都会经由代理,而代理无法访问主节点的6443,导致了后续失败,
# printenv打出环境变量,然后unset2个代理变量后,清理之前的环境,再用相同的init命令,就成功了!
[root@master ~]# unset http_proxy
[root@master ~]# unset https_proxy
[root@master ~]# kubeadm reset
[root@master ~]# kubeadm  init --kubernetes-version=v1.12.9 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --apiserver-advertise-address=0.0.0.0 --ignore-preflight-errors=Swap

kubeadm init常见错误

https://blog.fleeto.us/post/kubeadm-traps/

该博客列出了kubeadm常见的几个可能出现的错误:

  • 代理问题,(本次实验,就是设置的代理影响了master节点6443端口的访问)
    • (设置拉取镜像的代理时,设置在docker的unitfile中,而不是节点上整个系统的环境变量)
  • cgroupfs不一致
  • 内核参数net.bridge.bridge-nf-call-iptables没设置为1
  • 以及pod中容器问题,注意看日志:pod中容器日志、kubelet日志,系统日志/var/log/message

部署flannel

第一次部署错误:

下载文件:
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

[root@master ~]# kubectl apply -f kube-flannel.yml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

​ 部署flannel有问题,flannel未运行pod,集群节点也是notready状态,查看flnanel的控制器ds期望状态pod是0,查看pod也是没有flnanel的pod运行在kube-system空间;

[root@master ~]# kubectl get nodes
NAME     STATUS     ROLES    AGE     VERSION
master   NotReady   master   3h22m   v1.12.9
node2    NotReady   <none>   4m30s   v1.12.9

[root@master ~]# kubectl get ds -n kube-system
NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-flannel-ds   0         0         0       0            0           <none>          17h
kube-proxy        2         2         1       2            1           <none>          19h

更正后:

​ 错误原因:flnanel的最新的清单版本与k8s版本1.12.9不匹配,flannel官方介绍:k8s1.6到1.15需要采用旧版的yml文件;

kube-flannel.yaml has some features that aren't compatible with older versions of Kubernetes, though flanneld itself should work with any version of Kubernetes.

[root@master ~]# vim kube-flannel-old.yml 
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-old.yaml

[root@master ~]# docker pull quay.io/coreos/flannel:v0.11.0-amd64
# 采用了旧版的yaml文件后,即正常;

[root@master ~]# kubectl get pods -n kube-system -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP               NODE     NOMINATED NODE
coredns-576cbf47c7-8fpsn         1/1     Running   0          20h    10.244.0.2       master   <none>
coredns-576cbf47c7-tzmmt         1/1     Running   0          20h    10.244.0.3       master   <none>
etcd-master                      1/1     Running   3          16h    192.168.80.101   master   <none>
kube-apiserver-master            1/1     Running   4          16h    192.168.80.101   master   <none>
kube-controller-manager-master   1/1     Running   9          16h    192.168.80.101   master   <none>
kube-flannel-ds-amd64-lnwmf      1/1     Running   0          106s   192.168.80.107   node2    <none>
kube-flannel-ds-amd64-lqkv5      1/1     Running   0          106s   192.168.80.101   master   <none>
kube-proxy-hqh4m                 1/1     Running   3          20h    192.168.80.101   master   <none>
kube-proxy-szvvw                 1/1     Running   0          16h    192.168.80.107   node2    <none>
kube-scheduler-master            1/1     Running   8          16h    192.168.80.101   master   <none>
[root@master ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   20h   v1.12.9
node2    Ready    <none>   16h   v1.12.9

kubeadm join

1、初次加入报错

[root@node2 ~]#  kubeadm join 192.168.80.101:6443 --token rdif9c.h8084b6polru2bde --discovery-token-ca-cert-hash sha256:e64e4335f619b04dc17a004c626876451f96c80005aea0e05ed73269033bddbd
[preflight] running pre-flight checks
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_sh ip_vs ip_vs_rr ip_vs_wrr] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
	[WARNING Hostname]: hostname "node2" could not be reached
	[WARNING Hostname]: hostname "node2" lookup node2 on 192.168.80.2:53: no such host
	[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

# 根据报错分别修改:
1,加载ipvs模块
2,enable kubelet服务
3,设置内核参数

2、成功加入

[root@node2 ~]#  kubeadm join 192.168.80.101:6443 --token rdif9c.h8084b6polru2bde --discovery-token-ca-cert-hash sha256:e64e4335f619b04dc17a004c626876451f96c80005aea0e05ed73269033bddbd
[preflight] running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
	[WARNING Hostname]: hostname "node2" could not be reached
	[WARNING Hostname]: hostname "node2" lookup node2 on 192.168.80.2:53: no such host
[discovery] Trying to connect to API Server "192.168.80.101:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.80.101:6443"
[discovery] Requesting info from "https://192.168.80.101:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.80.101:6443"
[discovery] Successfully established connection with API Server "192.168.80.101:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node2" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

3、最终集群节点

[root@master ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE     VERSION
master   Ready    master   20h     v1.12.9
node1    Ready    <none>   3m46s   v1.12.9
node2    Ready    <none>   17h     v1.12.9
node3    Ready    <none>   3m39s   v1.12.9

部署kubectl

​ 配置阿里云的kubernetes镜像仓库,安装对应版本的kubectl;

[root@client ~]# cat /etc/yum.repos.d/k8s.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

[root@client ~]# yum install -y kubectl-1.12.9

​ 配置kubectl的配置文件

[root@master ~]# scp /etc/kubernetes/admin.conf 192.168.80.102:/root/.kube/config
复制到单独的kubectl客户端;

[root@client ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   

集群相关目录

1、主目录

[root@master ~]# ll /etc/kubernetes/
total 36
-rw------- 1 root root 5454 Nov 10 14:50 admin.conf
# kubectl需要,用户管理整个集群的kubectl客户端的配置文件,
-rw------- 1 root root 5490 Nov 10 14:50 controller-manager.conf
-rw------- 1 root root 5462 Nov 10 14:50 kubelet.conf
# kubelet controller-manager的配置文件
drwxr-xr-x 2 root root  113 Nov 10 14:50 manifests
drwxr-xr-x 3 root root 4096 Nov 10 14:50 pki
-rw------- 1 root root 5438 Nov 10 14:50 scheduler.conf
# scheduler配置文件

2、证书目录

​ 各组件基于https安全通信的一些列证书,私钥;

[root@master ~]# ll /etc/kubernetes/pki/
total 56
-rw-r--r-- 1 root root 1216 Nov 10 14:50 apiserver.crt
-rw-r--r-- 1 root root 1090 Nov 10 14:50 apiserver-etcd-client.crt
-rw------- 1 root root 1679 Nov 10 14:50 apiserver-etcd-client.key
-rw------- 1 root root 1675 Nov 10 14:50 apiserver.key
-rw-r--r-- 1 root root 1099 Nov 10 14:50 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Nov 10 14:50 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1025 Nov 10 14:50 ca.crt
-rw------- 1 root root 1675 Nov 10 14:50 ca.key
drwxr-xr-x 2 root root  162 Nov 10 14:50 etcd
-rw-r--r-- 1 root root 1038 Nov 10 14:50 front-proxy-ca.crt
-rw------- 1 root root 1675 Nov 10 14:50 front-proxy-ca.key
-rw-r--r-- 1 root root 1058 Nov 10 14:50 front-proxy-client.crt
-rw------- 1 root root 1679 Nov 10 14:50 front-proxy-client.key
-rw------- 1 root root 1679 Nov 10 14:50 sa.key
-rw------- 1 root root  451 Nov 10 14:50 sa.pub

3、组件的静态pod运行定义文件

[root@master ~]# ll /etc/kubernetes/manifests/
total 16
-rw------- 1 root root 1933 Nov 10 14:50 etcd.yaml
-rw------- 1 root root 2674 Nov 10 14:50 kube-apiserver.yaml
-rw------- 1 root root 2547 Nov 10 14:50 kube-controller-manager.yaml
-rw------- 1 root root 1051 Nov 10 14:50 kube-scheduler.yaml

程序组成文件

# 
[root@master ~]# rpm -qa |grep kube
kubelet-1.12.9-0.x86_64
kubernetes-cni-0.7.5-0.x86_64
kubectl-1.12.9-0.x86_64
kubeadm-1.12.9-0.x86_64

#
[root@master ~]# rpm -ql kubelet
/etc/kubernetes/manifests
/etc/sysconfig/kubelet
/usr/bin/kubelet
/usr/lib/systemd/system/kubelet.service

[root@master ~]# rpm -ql kubeadm
/usr/bin/kubeadm
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

[root@master ~]# rpm -ql kubectl
/usr/bin/kubectl

[root@master ~]# rpm -ql kubernetes-cni
/opt/cni
/opt/cni/bin
/opt/cni/bin/bridge
/opt/cni/bin/dhcp
/opt/cni/bin/flannel
/opt/cni/bin/host-device
/opt/cni/bin/host-local
/opt/cni/bin/ipvlan
/opt/cni/bin/loopback
/opt/cni/bin/macvlan
/opt/cni/bin/portmap
/opt/cni/bin/ptp
/opt/cni/bin/sample
/opt/cni/bin/tuning
/opt/cni/bin/vlan

完成后验证

查看集群信息

[root@client ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   
[root@client ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE     VERSION
master   Ready    master   20h     v1.12.9
node1    Ready    <none>   4m32s   v1.12.9
node2    Ready    <none>   17h     v1.12.9
node3    Ready    <none>   4m25s   v1.12.9
[root@client ~]# kubectl cluster-info
Kubernetes master is running at https://192.168.80.101:6443
KubeDNS is running at https://192.168.80.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@client ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.9", GitCommit:"e09f5c40b55c91f681a46ee17f9bc447eeacee57", GitTreeState:"clean", BuildDate:"2019-05-27T16:08:57Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.9", GitCommit:"e09f5c40b55c91f681a46ee17f9bc447eeacee57", GitTreeState:"clean", BuildDate:"2019-05-27T15:58:45Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

移除node的方法

1、先排干节点,即将该节点上pod等资源迁移走或停止

[root@client ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE     VERSION
master   Ready    master   20h     v1.12.9
node1    Ready    <none>   8m45s   v1.12.9
node2    Ready    <none>   17h     v1.12.9
node3    Ready    <none>   8m38s   v1.12.9
[root@client ~]# kubectl drain node3 --delete-local-data --force --ignore-daemonsets 
node/node3 cordoned
WARNING: Ignoring DaemonSet-managed pods: kube-flannel-ds-amd64-bww5t, kube-proxy-szcnx
[root@client ~]# kubectl get nodes
NAME     STATUS                     ROLES    AGE     VERSION
master   Ready                      master   20h     v1.12.9
node1    Ready                      <none>   9m22s   v1.12.9
node2    Ready                      <none>   17h     v1.12.9
node3    Ready,SchedulingDisabled   <none>   9m15s   v1.12.9

2、删除node节点

[root@client ~]# kubectl delete node node3
node "node3" deleted
[root@client ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE     VERSION
master   Ready    master   20h     v1.12.9
node1    Ready    <none>   9m33s   v1.12.9
node2    Ready    <none>   17h     v1.12.9

3、reset节点,登陆到要删除的节点,用kubeadm命令reset节点即可

[root@node3 ~]# kubeadm reset
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: y

4、再次加入,再次在node节点上采用kubeadm join即可

updatedupdated2020-11-182020-11-18
加载评论