We now need to set up a CNI plugin that will allow us to have east-to-west traffic between pods.
The worker nodes need to allow IP forwarding
sudo sysctl net.ipv4.conf.all.forwarding=1
echo "net.ipv4.conf.all.forwarding=1" | sudo tee -a /etc/sysctl.conf
We'll download an auto-generated configuration from Weave for our specific version of Kubernetes, and for a a Cluster CIDR of 10.200.0.0/16.
cloud_user@ctl01:~$ curl "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.200.0.0/16" -Lo weave.conf
cloud_user@ctl01:~$ grep kind weave.conf
kind: List
kind: ServiceAccount
kind: ClusterRole
kind: ClusterRoleBinding
kind: ClusterRole
- kind: ServiceAccount
kind: Role
kind: RoleBinding
kind: Role
- kind: ServiceAccount
kind: DaemonSet
The file is of kind: List that creates a new role for Weave. The role is added to the kube-ssytem namespace:
cloud_user@ctl01:~$ kubectl get ns
NAME STATUS AGE
default Active 14d
kube-node-lease Active 14d
kube-public Active 14d
kube-system Active 14d
The config file then launches a DaemonSet - A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
The DaemonSet will download and install two containers in both worker nodes:
kind: DaemonSet
...
labels:
name: weave-net
namespace: kube-system
spec:
...
containers:
- name: weave
command:
- /home/weave/launch.sh
...
- name: IPALLOC_RANGE
value: 10.200.0.0/16
image: 'docker.io/weaveworks/weave-kube:2.6.5'
...
image: 'docker.io/weaveworks/weave-npc:2.6.5'
resources:
requests:
cpu: 10m
To apply the configuration:
cloud_user@ctl01:~$ kubectl apply -f weave.conf
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.apps/weave-net created
Verify that the new pods were created with:
cloud_user@ctl01:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
weave-net-979r7 2/2 Running 0 6m14s
weave-net-xfnbz 2/2 Running 0 6m15s
Each one of the pods was created in a different worker node. And it has two containers. For example, on wrk01:
cloud_user@wrk01:~$ sudo ls -l /var/log/pods/kube-system_weave-net-xfnbz_9*/
total 8
drwxr-xr-x 2 root root 4096 Aug 2 20:44 weave
drwxr-xr-x 2 root root 4096 Aug 2 20:44 weave-npc
Now that the pods were created, the new network interfaces were added to the workers:
cloud_user@wrk02:~$ ip -h link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 0a:fa:ab:9d:5b:14 brd ff:ff:ff:ff:ff:ff
3: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether f2:80:55:b3:75:5f brd ff:ff:ff:ff:ff:ff
5: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 26:ca:30:44:3b:74 brd ff:ff:ff:ff:ff:ff
6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 92:35:4a:ab:ba:38 brd ff:ff:ff:ff:ff:ff
8: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP mode DEFAULT group default
link/ether 9e:ea:ca:e5:23:fa brd ff:ff:ff:ff:ff:ff
9: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT group default
link/ether 82:cf:0d:a5:8b:aa brd ff:ff:ff:ff:ff:ff
10: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master datapath state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 66:6f:b4:6d:b9:d1 brd ff:ff:ff:ff:ff:ff
cloud_user@wrk02:~$ ip -h -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
inet 172.31.26.138/20 brd 172.31.31.255 scope global eth0
valid_lft forever preferred_lft forever
5: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
inet 10.200.0.1/16 brd 10.200.255.255 scope global weave
valid_lft forever preferred_lft forever
We can now created a Deployment of two nginx pods, to confirm that a pod IP address is automatically assigned to each pod:
cloud_user@ctl01:~$ cat nginx.conf
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
run: nginx
replicas: 2
template:
metadata:
labels:
run: nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
cloud_user@ctl01:~$ kubectl apply -f nginx.conf
deployment.apps/nginx created
cloud_user@ctl01:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-7866ff8b79-ktvrs 1/1 Running 0 6m57s 10.200.0.2 wrk02.kube.com <none> <none>
nginx-7866ff8b79-v2n4l 1/1 Running 0 6m57s 10.200.192.1 wrk01.kube.com <none> <none>
The Weave logs on the worker nodes shows that two new cluster IP were associated to the pods
2020-08-02T21:06:44.554513018Z stderr F INFO: 2020/08/02 21:06:44.554368 adding entry 10.200.0.2 to weave-k?Z;25^M}|1s7P3|H9i;*;MhG of 064e9bf5-8a47-4c21-8ae9-35557edbdc9a
...
2020-08-02T21:06:45.129688044Z stderr F INFO: 2020/08/02 21:06:45.129574 adding entry 10.200.192.1 to weave-k?Z;25^M}|1s7P3|H9i;*;MhG of a2cb5dee-88a7-474c-9aa4-5bf573dda302
The VXLAN set by Weave allows a client running on wrk01 to reach the nginx running on wrk02. The packets are encapsulated inside UDP, and a header includes the unique VXLAN identifier
source: https://www.juniper.net/documentation/en_US/junos/topics/topic-map/sdn-vxlan.html
171 15.191593 172.31.26.138 → 172.31.29.196 UDP 126 58287 → 6784 Len=82
172 15.191720 172.31.29.196 → 172.31.26.138 UDP 118 44751 → 6784 Len=74
173 15.191731 172.31.29.196 → 172.31.26.138 UDP 192 44751 → 6784 Len=148
174 15.191735 10.200.192.0 → 10.200.0.2 TCP 68 37224 → 80 [ACK] Seq=1 Ack=1 Win=26752 Len=0 TSval=298244 TSecr=297810
175 15.191737 10.200.192.0 → 10.200.0.2 TCP 68 [TCP Dup ACK 174#1] 37224 → 80 [ACK] Seq=1 Ack=1 Win=26752 Len=0 TSval=298244 TSecr=297810
176 15.191739 10.200.192.0 → 10.200.0.2 HTTP 142 GET / HTTP/1.1
Now we can expose the nginx deployment as a Kubernetes service
cloud_user@client:~$ kubectl get deployment -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
nginx 2/2 2 2 6d23h my-nginx nginx run=nginx
Run the expose command:
cloud_user@client:~$ kubectl expose deployment/nginx
service/nginx exposed
cloud_user@client:~$ kubectl get service -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 21d <none>
nginx ClusterIP 10.32.0.65 <none> 80/TCP 31s run=nginx
To verify that we can connect to the service, we'll launch a new pod running busybox (BusyBox combines tiny versions of many common UNIX utilities into a single small executable.) - In this example we'll run a modified version of busybox from radial that includes curl
cloud_user@client:~$ kubectl run busybox --image=radial/busyboxplus:curl --command -- sleep 3600
pod/busybox created
cloud_user@client:~$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 23s 10.200.0.3 wrk02.kube.com <none> <none>
nginx-7866ff8b79-ktvrs 1/1 Running 1 6d23h 10.200.0.2 wrk02.kube.com <none> <none>
nginx-7866ff8b79-v2n4l 1/1 Running 1 6d23h 10.200.192.1 wrk01.kube.com <none> <none>
The first attempt to run curl on that the pod returns an error:
cloud_user@ctl01:~$ kubectl exec busybox -- curl 10.32.0.65
error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
The problem is that the kublet doesn't allow the apiserver (with user CN=kubernetes) to use the kubelet API. https://github.com/kubernetes/kubernetes/issues/65939#issuecomment-403218465
To fix this we need to create a new clusterrolebinding for the existing clusterrole: system:kubelet-api-admin and the kubernetes user:
cloud_user@ctl01:~$ kubectl create clusterrolebinding apiserver-kubelet-api-admin --clusterrole system:kubelet-api-admin --user kubernetes
clusterrolebinding.rbac.authorization.k8s.io/apiserver-kubelet-api-admin created
cloud_user@ctl01:~$ kubectl get clusterrole | grep kubelet-api-admin
system:kubelet-api-admin 2020-07-19T00:20:21Z
cloud_user@ctl01:~$ kubectl get clusterrolebinding | grep kubelet-api-admin
apiserver-kubelet-api-admin ClusterRole/system:kubelet-api-admin 18m
Then:
cloud_user@ctl01:~$ kubectl exec busybox -- curl 10.32.0.65 -sI
HTTP/1.1 200 OK
Server: nginx/1.19.1
Date: Sun, 09 Aug 2020 21:18:52 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
Connection: keep-alive
ETag: "5f049a39-264"
Accept-Ranges: bytes
We can remove the nginx and busybox pods we created to test the CNI.
cloud_user@client:~$ kubectl delete pod busybox
pod "busybox" deleted
cloud_user@client:~$ kubectl delete svc nginx
service "nginx" deleted
cloud_user@client:~$ kubectl delete deployment nginx
deployment.apps "nginx" deleted