前言:过去一年,断断续续的学习了K8S,学习过程中,躺过很多坑,也遇到过很多事,整个学习阶段也是事倍功半。经常是一个文档,做到一半就做不下去了。虽前路漫漫,同志仍需努力。。
-----------------------------------------------------------------------------------------------------------------------
一、实验环境:
请注意K8S版本和Docker版本。
主机名
IP地址(NAT)
操作系统
K8s版本
docker版本
描述
LB&NFS.host.com
10.x.x.19
Centos7.6
LB,NFS,Harbor
Master01.host.com
10.x.x.20
Centos7.6
V1.18.20
V20.10.12
Kubernets Master/Etcd节点
Node01.host.com
10.x.x.21
Centos7.6
V1.18.20
V20.10.12
Kubernets Node节点
Node02.host.com
10.x.x.22
Centos7.6
V1.18.20
V20.10.12
Kubernets Node节点
Node03.host.com
10.x.x.23
Centos7.6
V1.18.20
V20.10.12
Kubernets Node节点
Service网段
10.1.0.0/16
Pod网段
10.2.0.0/16
二、系统环境初始化:
注:所用主机节点都做同样操作。
2.1、设置各主机的主机名
hostnamectl set-hostname node01
2.2、关闭selinux
setenforce 0;getenforce
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/sysconfig/selinux
2.3、关闭swap缓存
sudo sed -i '/swap/d' /etc/fstab
sudo swapoff -a
2.4、关闭防火墙
systemctl stop firewalld.service ;systemctl disable firewalld.service ;systemctl status firewalld.service
2.5、修改各主机的hosts文件,作本地解析用
cat >> /etc/hosts < 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.x.x.19 lb_nfs.host.com lb_nfs 10.x.x.20 master01.host.com master01 10.x.x.21 node01.host.com node01 10.x.x.22 node01.host.com node02 10.x.x.23 node01.host.com node03 EOF 2.6、修改DNS文件,重启网上,ping baidu确保能正常上网 echo "nameserver 10.x.x.1" >> /etc/resolv.conf systemctl restart network ping -c 3 www.baidu.com 2.7、修改YUM库 wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo cat > /etc/yum.repos.d/kubernetes.repo << EOF [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF yum clean all;yum repolist all 2.8、安装必要的基础环境 yum install -y wgetnet-tools vim ftplrzsz tree screen lsof tcpdump nc mtr nmap git epel-release.noarch 2.9、安装时间同步服务器 yum install -y ntpdateyum install -y ntpdate ntpdate time.windows.com 2.10、安装指定的Docker版本 yum list docker-ce.x86_64 --showduplicates | sort -r yum install -y containerd.io-1.2.13 docker-ce-20.10.12 docker-ce-cli-20.10.12 2.11、创建docker daemon.json文件 mkdir /etc/docker cat > /etc/docker/daemon.json < { "registry-mirrors": ["https://dx5z2hy7.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] } EOF 启动docker systemctl enable docker; systemctl start docker;systemctl status docker 查看安装后的docker版本 docker --version 三、安装软件包: 注:所用主机节点都做同样操作。 3.1、安装软件 yum install -y kubelet-1.18.20 kubeadm-1.18.20 kubectl-1.18.20 ipvsadm 3.2、配置kubelet 默认情况下,Kubelet不允许所在的主机存在交换分区,后期规划的时候,可以考虑在系统安装的时候不创建交换分区,针对已经存在交换分区的可以设置忽略禁止使用Swap的限制,不然无法启动Kubelet。 cat > /etc/sysconfig/kubelet < KUBELET_CGROUP_ARGS="--cgroup-driver=systemd" KUBELET_EXTRA_ARGS="--fail-swap-on=false" EOF 3.3、 配置内核参数 cat < net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF 使配置生效 sysctl --system 3.4、 启动kubelet并设置开机启动 systemctl enable kubelet && systemctl start kubelet 3.5、在Kubernetes集群中Kube-Proxy组件负载均衡的功能,默认使用iptables,生产环境建议使用ipvs进行负载均衡。在所有节点启用ipvs模块 cat > /etc/sysconfig/modules/ipvs.modules < #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 EOF 分配可执行权限 chmod +x /etc/sysconfig/modules/ipvs.modules 加载模块 source /etc/sysconfig/modules/ipvs.modules 检查模块是否加载正常 lsmod | grep -e ip_vs -e nf_conntrack_ipv4 四、初始化集群部署Master 注:下面的操作都在Master节点上操作 4.1、生成默认配置文件 [root@master01 ~]# kubeadm config print init-defaults > kubeadm.yaml W0119 16:05:27.447289 30160 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] 警告:kubeadm不能验证API组的组件配置 忽略些告警信息, 上面的命令会生成一个默认配置的kubeadm配置文件,然后在此基础上进行修改即可。 [root@linux-node1 ~]# cat kubeadm.yaml apiVersion: kubeadm.k8s.io/v1beta2 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 10.x.x.20 #修改为API Server的地址 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: linux-node1.example.com taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: registry.aliyuncs.com/google_containers #修改为阿里云镜像仓库(也可以使用默认的镜像) kind: ClusterConfiguration kubernetesVersion: v1.18.20 #修改为具体的版本 networking: dnsDomain: cluster.local serviceSubnet: 10.1.0.0/16 #修改Service的网络 podSubnet: 10.2.0.0/16 #新增Pod的网络 scheduler: {} --- #下面有增加的三行配置,用于设置Kubeproxy使用LVS apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs 4.2、 执行初始化操作 kubeadm init --config kubeadm.yaml 注: 如提示镜像报错,可使用原imageRepository ,再进行安装 4.3、安装成功。建议记录下面标黄的字段。 [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: Installing Addons | Kubernetes Then you can join any number of worker nodes by running the following on each as root: kubeadm join 10.x.x.20:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:233da03cbfe0cc8c2027b1183b2a4638ec4a40b05498566a554d1c3fde805cec 4.4、为kubectl准备Kubeconfig文件 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config 4.5、 查看组件状态 kubectl get cs 4.6、 解决controller-manager和scheduler不健的报错 [root@master01 ~]# kubectl get cs # 解决controller-manager和scheduler不健康问题 NAME STATUS MESSAGE ERROR controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Healthy {"health":"true"} 除了kubelet之外,其它组件时通过裸pod的方式运行,配置文件在 /etc/kubernetes/manifests 内,修改该配置文件后会动态生效。将 kube-scheduler.yaml 和 kube-controller-manager.yaml 容器启动参数中 --port=0去掉。该--port=0表示禁止使用非安全的http接口,同时 --bind-address 非安全的绑定地址参数失效。 [root@master01 ~]# sed -i 's/- --port=0/#&/' /etc/kubernetes/manifests/kube-scheduler.yaml [root@master01 ~]# sed -i 's/- --port=0/#&/' /etc/kubernetes/manifests/kube-controller-manager.yaml [root@master01 ~]# systemctl restart kubelet 五、部署网络插件(Flannel) git clone --depth 1 GitHub - flannel-io/flannel: flannel is a network fabric for containers, designed for Kubernetes 注:如下载不了文件,可以自行去github,找到 flannel/documentation/kube-flannel.yml 路径,拷贝出代码,新建文件上传,然后修改为.yml cd flannel/documentation/ vim kube-flannel.yml # 修改"Network": "10.244.0.0/16"为"Network": "10.2.0.0/16", net-conf.json: | { "Network": "10.2.0.0/16", "Backend": { "Type": "vxlan" } } (注:镜像路径可以不用修改) # 请注意,Flannel的镜像拉取速度会比较慢,可以替换为国内镜像 # image: quay.io/coreos/flannel:v0.10.0-amd64 image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-amd64 kubectl apply -f kube-flannel.yml 查看pod和node节点 kubectl get pod -n kube-system kubectl get node 六、部署node节点 注:在node节点上进行如下操作 Node01/03] node01 ~]kubeadm join 10.x.x.20:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:233da03cbfe0cc8c2027b1183b2a4638ec4a40b05498566a554d1c3fde805cec 返回master节点,输入如下命令。查看node节点都已添加到master群集里 Masetr01 ~]# kubectl get daemonset --all-namespaces kubectl get pod --all-namespaces kubectl get nodes --show-labels