基于vagrant搭建k8s集群

3台虚拟机

节点 系统 IP
master CentOS-8 192.168.10.90
node1 CentOS-8 192.168.10.91
node2 CentOS-8 192.168.10.92

构建基础镜像

cd k8sbase
vagrant box add centos8 ../vagrant_package/CentOS-8-generic.box
vagrant init centos8
#启动
vagrant up
#登陆系统,进行基础配置
vagrant ssh
#切换到root
sudo su

基础配置

#关闭防火墙
systemctl stop firewalld && systemctl disable firewalld

#关闭 seLinux
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config

#关闭 swap 分区
sed -ri 's/.*swap.*/#&/' /etc/fstab

#将桥接的 IPV4 流量传递到 iptables 的链
cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

#同步时间
1 date #查看时间是否正确,不正确则执行以下步骤
2 rm -rf /etc/localtime
3 ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
4 设置时区
    tzselect
5 同步时间
    yum install -y ntpdate
    ntpdate cn.pool.ntp.org
6 date

# yum源
vi /etc/yum.repos.d/docker-ce.repo
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

vi /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

#安装
yum install docker-ce kubelet kubeadm kubectl
systemctl enable docker
systemctl enable kubelet

#注意,这里我遇到一个问题,docker的驱动类型和kubelet的驱动类型不同,需要统一一下。修改docker的驱动类型为systemd
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

打包镜像

vagrant halt
vagrant package --output k8s.box
vagrant box add k8s k8s.box

基于基础镜像批量生成虚拟机

# Vagrantfile
# vim: set ft=ruby ts=2 :

Vagrant.configure("2") do |config|
  config.vm.box = "k8s"
  config.vm.provider "virtualbox" do |v|
    v.memory = 2048
    v.cpus = 2
  end

  config.vm.define :master do |cfg|
    cfg.vm.hostname = "master"
    cfg.vm.network :public_network, ip: "192.168.10.90"
  end

  config.vm.define :node1 do |cfg|
    cfg.vm.hostname = "node1"
    cfg.vm.network :public_network, ip: "192.168.10.91"
  end

  config.vm.define :node2 do |cfg|
    cfg.vm.hostname = "node2"
    cfg.vm.network :public_network, ip: "192.168.10.92"
  end

  config.vm.synced_folder "./", "/vagrant"
end
cd k8s
# 启动2台虚拟机
vagrant up master
vagrant up node
# 开一个新窗口
vagrant ssh master
# 开一个新窗口
vagrant ssh node

master 节点初始化

# vi /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--node-ip=192.168.10.90 --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5"

service kubelet restart

#kubeadm 预下载基础镜像
kubeadm config images pull --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers

#初始化
kubeadm init \
--kubernetes-version v1.22.0 \
--apiserver-advertise-address=192.168.10.90 \
--ignore-preflight-errors=all \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16

POD的网段为: 10.244.0.0/16, api server地址就是master本机IP。

这一步很关键,由于kubeadm 默认从官网k8s.grc.io下载所需镜像,国内无法访问,因此需要通过–image-repository指定阿里云镜像仓库地址。1️⃣

这一步也可能会因为kubelet启动失败而失败2️⃣

参数解释:

–kubernetes-version: 用于指定k8s版本;
–apiserver-advertise-address:用于指定kube-apiserver监听的ip地址,就是 master本机IP地址。
–pod-network-cidr:用于指定Pod的网络范围; 10.244.0.0/16
–service-cidr:用于指定SVC的网络范围;
–image-repository: 指定阿里云镜像仓库地址

如果执行成功会看到

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.10.90:6443 --token 9vh4kc.5lcth0nqbhx7mugj \
    --discovery-token-ca-cert-hash sha256:eb606e1c5f634d7a861fe09644dfdb12f9bfeb02534012445cc9712e4e8caede 

#依次执行上面的3条命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

[root@master vagrant]# kubectl get nodes
NAME     STATUS     ROLES                  AGE     VERSION
master   NotReady   control-plane,master   6m54s   v1.22.0

node节点

# vi /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--node-ip=192.168.10.91 --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5"

service kubelet restart

#将当前的节点加入到kubelet集群当中去,如果忘记可以通过命令 kubeadm token create --print-join-command 获取
[root@node vagrant]# kubeadm join 192.168.10.90:6443 --token 9vh4kc.5lcth0nqbhx7mugj \
>     --discovery-token-ca-cert-hash sha256:eb606e1c5f634d7a861fe09644dfdb12f9bfeb02534012445cc9712e4e8caede
[preflight] Running pre-flight checks
    [WARNING FileExisting-tc]: tc not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

#回到master上看一下, 可以看见INTERNAL-IP都是我们希望的IP了(状态还是NoReady,因为网络组件还没安装)
[root@master vagrant]# kubectl get nodes -o wide
NAME     STATUS     ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
master   NotReady   control-plane,master   13m   v1.22.0   192.168.10.90   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8
node1    NotReady   <none>                 8s    v1.22.0   192.168.10.91   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8
node2    NotReady   <none>                 16s   v1.22.0   192.168.10.92   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8

flannel网络组件

在master上执行

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#可能上面的命令因网络问题而执行失败,那就把整个库下载下来再执行
https://github.com/flannel-io/flannel
#修改kube-flannel.yml,指定eth1
...
      containers:
      - name: kube-flannel
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/flannel:v0.9.0
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface=eth1
...
#执行
kubectl apply -f ./Documentation/kube-flannel.yml

#查询所以pod,不出所料,STATUS都是Running
[root@master vagrant]# kubectl get pods -n kube-system -o wide
NAME                             READY   STATUS    RESTARTS        AGE     IP              NODE     NOMINATED NODE   READINESS GATES
coredns-7d89d9b6b8-fcs2v         1/1     Running   0               42m     10.244.0.3      master   <none>           <none>
coredns-7d89d9b6b8-kb8f8         1/1     Running   0               42m     10.244.0.2      master   <none>           <none>
etcd-master                      1/1     Running   1 (2m51s ago)   51m     192.168.10.90   master   <none>           <none>
kube-apiserver-master            1/1     Running   2 (2m30s ago)   51m     192.168.10.90   master   <none>           <none>
kube-controller-manager-master   1/1     Running   9 (45s ago)     51m     192.168.10.90   master   <none>           <none>
kube-flannel-ds-4g9k2            1/1     Running   0               4m28s   192.168.10.92   node2    <none>           <none>
kube-flannel-ds-bzw9q            1/1     Running   0               5m39s   192.168.10.90   master   <none>           <none>
kube-flannel-ds-c5x4l            1/1     Running   0               5m42s   192.168.10.91   node1    <none>           <none>
kube-proxy-bfl8p                 1/1     Running   1 (2m8s ago)    38m     192.168.10.92   node2    <none>           <none>
kube-proxy-pf8r9                 1/1     Running   1 (2m10s ago)   38m     192.168.10.91   node1    <none>           <none>
kube-proxy-vj85t                 1/1     Running   1 (2m51s ago)   42m     192.168.10.90   master   <none>           <none>
kube-scheduler-master            0/1     Running   4 (46s ago)     49m     192.168.10.90   master   <none>           <none>

#查询所有node,不出所料,STATUS都是Ready
[root@master vagrant]# kubectl get nodes -o wide
NAME     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
master   Ready    control-plane,master   51m   v1.22.0   192.168.10.90   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8
node1    Ready    <none>                 38m   v1.22.0   192.168.10.91   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8
node2    Ready    <none>                 38m   v1.22.0   192.168.10.92   <none>        CentOS Linux 8   4.18.0-240.15.1.el8_3.x86_64   docker://20.10.8

dashboard

k8s 安装 dashboard

ingress-nginx

k8s ingress 两种部署方式nodePort和hostNetwork

文件下载

https://github.com/chudaozhe/k8s-demo

参考

https://developer.aliyun.com/mirror/kubernetes

1️⃣ 关于registry.aliyuncs.com/google_containers,详见 阿里云镜像仓库市场

问题

# 问题1
[WARNING ImagePull]: failed to pull image registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4: output: Error response from daemon: manifest for registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4 not found: manifest unknown: manifest unknown

# 解决办法(所以节点都需要执行)
#拉取
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.4
# 重命名
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.4 registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4
# 删除原有镜像
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.4

2️⃣问题2

如果kubeadm init执行失败,先确认kubelet是否正常允许
service kubelet status
如果kubelet未能正常运行,执行下面这条命令排查
journalctl -xefu kubelet
假如看到了下面这样的错误

 "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

大致意思是docker和kubelet的cgroup driver不一致

#查看docker的cgroup driver
docker info
#查看kubelet的cgroup driver
cat /var/lib/kubelet/config.yaml

#设置docker的cgroup driver为systemd
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
#重启docker
systemctl restart docker

#多次执行kubeadm init,可能会提示端口占用,需重置一下
kubeadm reset

感谢阅读这篇文章,如果你喜欢,或者遇到了问题,可以关注我的公众号