KTHW - Set up networking with Weave Net.

August 2, 2020 - Reading time: 6 minutes

We now need to set up a CNI plugin that will allow us to have east-to-west traffic between pods.

The worker nodes need to allow IP forwarding

sudo sysctl net.ipv4.conf.all.forwarding=1
echo "net.ipv4.conf.all.forwarding=1" | sudo tee -a /etc/sysctl.conf

We'll download an auto-generated configuration from Weave for our specific version of Kubernetes, and for a a Cluster CIDR of 10.200.0.0/16.

cloud_user@ctl01:~$ curl "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.200.0.0/16" -Lo weave.conf
cloud_user@ctl01:~$ grep kind weave.conf
kind: List
    kind: ServiceAccount
    kind: ClusterRole
    kind: ClusterRoleBinding
      kind: ClusterRole
      - kind: ServiceAccount
    kind: Role
    kind: RoleBinding
      kind: Role
      - kind: ServiceAccount
    kind: DaemonSet

The file is of kind: List that creates a new role for Weave. The role is added to the kube-ssytem namespace:

cloud_user@ctl01:~$ kubectl  get  ns
NAME              STATUS   AGE
default           Active   14d
kube-node-lease   Active   14d
kube-public       Active   14d
kube-system       Active   14d

The config file then launches a DaemonSet - A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

The DaemonSet will download and install two containers in both worker nodes:

    kind: DaemonSet
...
      labels:
        name: weave-net
      namespace: kube-system
    spec:
...
          containers:
            - name: weave
              command:
                - /home/weave/launch.sh
...
                - name: IPALLOC_RANGE
                  value: 10.200.0.0/16
              image: 'docker.io/weaveworks/weave-kube:2.6.5'

...
              image: 'docker.io/weaveworks/weave-npc:2.6.5'
              resources:
                requests:
                  cpu: 10m

To apply the configuration:

cloud_user@ctl01:~$ kubectl apply -f weave.conf
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.apps/weave-net created

Verify that the new pods were created with:

cloud_user@ctl01:~$ kubectl get pods -n kube-system
NAME              READY   STATUS    RESTARTS   AGE
weave-net-979r7   2/2     Running   0          6m14s
weave-net-xfnbz   2/2     Running   0          6m15s

Each one of the pods was created in a different worker node. And it has two containers. For example, on wrk01:

cloud_user@wrk01:~$ sudo ls -l /var/log/pods/kube-system_weave-net-xfnbz_9*/
total 8
drwxr-xr-x 2 root root 4096 Aug  2 20:44 weave
drwxr-xr-x 2 root root 4096 Aug  2 20:44 weave-npc

Now that the pods were created, the new network interfaces were added to the workers:

cloud_user@wrk02:~$ ip -h link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0a:fa:ab:9d:5b:14 brd ff:ff:ff:ff:ff:ff
3: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether f2:80:55:b3:75:5f brd ff:ff:ff:ff:ff:ff
5: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 26:ca:30:44:3b:74 brd ff:ff:ff:ff:ff:ff
6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:35:4a:ab:ba:38 brd ff:ff:ff:ff:ff:ff
8: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP mode DEFAULT group default
    link/ether 9e:ea:ca:e5:23:fa brd ff:ff:ff:ff:ff:ff
9: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT group default
    link/ether 82:cf:0d:a5:8b:aa brd ff:ff:ff:ff:ff:ff
10: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master datapath state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 66:6f:b4:6d:b9:d1 brd ff:ff:ff:ff:ff:ff
cloud_user@wrk02:~$ ip -h -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    inet 172.31.26.138/20 brd 172.31.31.255 scope global eth0
       valid_lft forever preferred_lft forever
5: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    inet 10.200.0.1/16 brd 10.200.255.255 scope global weave
       valid_lft forever preferred_lft forever
  • wrk02 has 10.200.0.1/16
  • wrk01 has 10.200.192.0/16

We can now created a Deployment of two nginx pods, to confirm that a pod IP address is automatically assigned to each pod:

cloud_user@ctl01:~$ cat nginx.conf
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      run: nginx
  replicas: 2
  template:
    metadata:
      labels:
        run: nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80

cloud_user@ctl01:~$ kubectl apply -f nginx.conf
deployment.apps/nginx created

cloud_user@ctl01:~$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
nginx-7866ff8b79-ktvrs   1/1     Running   0          6m57s   10.200.0.2     wrk02.kube.com   <none>           <none>
nginx-7866ff8b79-v2n4l   1/1     Running   0          6m57s   10.200.192.1   wrk01.kube.com   <none>           <none>

The Weave logs on the worker nodes shows that two new cluster IP were associated to the pods

2020-08-02T21:06:44.554513018Z stderr F INFO: 2020/08/02 21:06:44.554368 adding entry 10.200.0.2 to weave-k?Z;25^M}|1s7P3|H9i;*;MhG of 064e9bf5-8a47-4c21-8ae9-35557edbdc9a
...
2020-08-02T21:06:45.129688044Z stderr F INFO: 2020/08/02 21:06:45.129574 adding entry 10.200.192.1 to weave-k?Z;25^M}|1s7P3|H9i;*;MhG of a2cb5dee-88a7-474c-9aa4-5bf573dda302

KTHW - Create a kubeconfig file for remote access

August 2, 2020 - Reading time: 2 minutes

By default kubectl stores the user's configuration under ~/.kube/config To create the file, we just need to run kubectl with the config option and set the name of the cluster:

cloud_user@client:~$ kubectl config set-cluster kubernetes-the-hard-way
Cluster "kubernetes-the-hard-way" set.
cloud_user@client:~$ cat ~/.kube/config
apiVersion: v1
clusters:
- cluster:
    server: ""
  name: kubernetes-the-hard-way
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null

We can then add the rest of the settings, like the IP address of the API server, and the certificates signed by the CA.

cloud_user@client:~$ kubectl config set clusters.kubernetes-the-hard-way.server https://172.31.23.61:6443
cloud_user@client:~$ kubectl config set-cluster kubernetes-the-hard-way --embed-certs=true --certificate-authority kthw/ca.pem
cloud_user@client:~$ kubectl config set-credentials admin --client-certificate=kthw/admin.pem  --client-key=kthw/admin-key.pem

Then create the user and the context. A context is a group of access parameters. Each context contains a Kubernetes cluster, a user, and a namespace. The current context is the cluster that is currently the default for kubectl

cloud_user@client:~$ kubectl config set-credentials admin --client-certificate=kthw/admin.pem  --client-key=kthw/admin-key.pem
cloud_user@client:~$ kubectl config set-context kubernetes-the-hard-way --cluster=kubernetes-the-hard-way --user=admin
cloud_user@client:~$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://172.31.23.61:6443
  name: kubernetes-the-hard-way
contexts:
- context:
    cluster: kubernetes-the-hard-way
    user: admin
  name: kubernetes-the-hard-way
current-context: ""
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate: /home/cloud_user/kthw/admin.pem
    client-key: /home/cloud_user/kthw/admin-key.pem

The current-context is still empty. So the last thing we need to do is specify that we want to use the newly created context.

cloud_user@client:~$ kubectl config use-context kubernetes-the-hard-way

Now we should be able to get details about our cluster

cloud_user@client:~$ kubectl get nodes
NAME             STATUS     ROLES    AGE    VERSION
wrk01.kube.com   NotReady   <none>   4d5h   v1.18.6
wrk02.kube.com   NotReady   <none>   4d5h   v1.18.6

KTHW - Kubernetes and network, the basic

July 30, 2020 - Reading time: 5 minutes

The networking model help us deal with the following problems:

  • Communication between containers.
  • Reaching containers on different working nodes.
  • How to reach services
  • What IP address / port will be assigned to a container

The Kubernetes model was designed to overcome some of the limitations of the Docker model. With Docker, each hosts creates a virtual network bridge that allows containers in the same host to communicate to each other, and to initiate outbound connections. For containers on different hosts, the administrator needs to creat a proxy on the host to expose a port to the container.

All this proxying of services can become very complicated when dealing with muyltiple containers.

The Kubernetes solution is to create one virtual network for the whole cluster.

  • Each pod has a unique IP address
  • Each service has an unique IP address (on a different range than pods)

Cluster CIDR

IP range used to assign IP addresses to pods in the cluster. The kube-proxy service running on the worker nodes, specifies the clusterCIDR: "10.200.0.0/16" The kube-controller-manager also includes the --cluster-cidr=10.200.0.0/16 flag.

Service cluster

IP range used for services in the cluster. This range MUST NOT overlap with the cluster CIDR range.

One of the parameters we set when we created the systemd unit service for kube-apiserver was the --service-cluster-ip-range=10.32.0.0/24 and --service-node-port-range=30000-32767

The nodeport range is used when providing access to services via kube-proxy in nodeport mode. In this mode, a port is open on the worker node and the traffic is redirected from there to the service (using iptables or ipvs)

The kube-controller-manager has a --service-cluster-ip-range=10.32.0.0/24 flag

One of the SAN on the kubernetes.pem certificate was IP Address:10.32.0.1

Pod CIDR

The specific IP range for pods on one worker node. This range shouldn't overlap between worker nodes. For example, 10.200.1.0/24 and 10.200.2.0/24 Some network plugins will handle this automatically.

Types of networking and requirements

  • Communication between containers in a pod (handled by the container runtime) - Docker uses a virtual bridge named docker0. Each container creates a Virtual Ethernet Device and it's attached to the bridge. Containers inside a pod can also communicate via localhost, or intra-process communication.
  • Communication between pods (across nodes) - Known as East-west traffic - Implemented by the CNI plugin
  • Communication between pods happens without NAT
  • External exposure of services to external clients - Kown as North-south traffic
  • Service discovery and load balancing
  • Segmenting networks for pod security

CNI plugins

Used to imlement pod-to-pod communication (Calico, Weave, Flannel) Currently there are 3 types of networking

  • L2 (switchin)
  • L3 (routing)
  • Overlay (tunneling)

L2

Easiest type of communication. All the pods and nodes are in the same L2 domain Pod-to-pod communication happens through ARP. Bridge plugin example:

{
    "name":"kubenet",
    "type":"bridge",
    "bridge":"kube-bridge",
    "isDefatultGateway": true,
    "ipam" : {
                "type": "host-local",
                "subnet": "10.1.0.0./16" 
            }
}

L2 is not scalable.

L3

Flannel is an example of a L3 plugin.

Overlay configuration

It's a Sofware Defined Network. Using tunnels. Common encapsulation mechanisms such as VXLAN, GRE are availalbe.

Services

Used to expose functionality externally. The service refers to a set of pods which is based on labels. Services get a publicly accesible IP address.


KTHW - Load balance requests to controller nodes

July 26, 2020 - Reading time: ~1 minute

The loadbalancer will be use to access both controllers from a single point. In this example we'll use nginx with a stream load balancer for port 443 and 6443

sudo apt-get install -y nginx
sudo systemctl enable nginx
sudo mkdir -p /etc/nginx/tcpconf.d
sudo vi /etc/nginx/nginx.conf
## Add the line: 
## include /etc/nginx/tcpconf.d/*;
## create the kubernetes config: 
cloud_user@kubelb:~$ cat /etc/nginx/tcpconf.d/kubernetes.conf
stream {
    upstream kubernetes {
        server 172.31.19.77:6443;
        server 172.31.24.213:6443;
    }

    server {
        listen 6443;
        listen 443;
        proxy_pass kubernetes;
    }
}
sudo nginx -s reload

KTHW - The workers

July 26, 2020 - Reading time: 7 minutes
  • The worker nodes

The nodes are responsible for the actual work of running container applications managed by kubernetes. Components:

  • Kublet: agent running on the node. Provides the API to access the node.
  • Kube-proxy: manages iptable rules to provide virtual network access to pods.
  • container runtime: download images and run containers (ex. docker, containerd).

OS dependencies

  • Socat: Multipurpose relay (SOcket CAT) Socat is a command line based utility that establishes two bidirectional byte streams and transfers data between them. It enables support to the kubectl port-forward command.

  • Conntrack: command line interface for netfilter connection tracking Using conntrack , you can dump a list of all (or a filtered selection of) currently tracked connections, delete connections from the state table, and even add new ones.

  • Ipset: administration tool for IP sets A netfilter projectm some of the uses are: store multiple IP addresses or port numbers and match against the collection by iptables at one swoop; dynamically update iptables rules against IP addresses or ports without performance penalty; express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets

Worker binaries

  • cri-tools Introduced in Kubernetes 1.5, the Container Runtime Interface (CRI) is a plugin interface which enables kubelet to use a wide variety of container runtimes, without the need to recompile. https://github.com/kubernetes-sigs/cri-tools

  • runc runc is a CLI tool for spawning and running containers according to the OCI specification. Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes. Currently contains two specifications: the Runtime Specification (runtime-spec) and the Image Specification (image-spec). The Runtime Specification outlines how to run a “filesystem bundle” that is unpacked on disk. https://github.com/opencontainers/runc

  • cni The Container Network Interface project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. We'll use the cni-plugins project. This is a Cloud Native Computing Foundation (CNCF) project currently on incubation phase. (known incubation projects: etcd, cni) (known graduated projects from CNCF: kubernetes, prometheus, coreDNS, containerd, fluentd) https://github.com/containernetworking

  • containerd An industry-standard container runtime with an emphasis on simplicity, robustness and portability Graduated on Cloud Native Computing Foundation on 2019.

  • kubectl

  • kube-proxy

  • kublet

Install the binaries and OS dependencies

sudo -y install socat conntrack ipset
wget -q --show-progress --https-only --timestamping \
  https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.18.0/crictl-v1.18.0-linux-amd64.tar.gz \
  https://github.com/opencontainers/runc/releases/download/v1.0.0-rc91/runc.amd64 \
  https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz \
  https://github.com/containerd/containerd/releases/download/v1.3.6/containerd-1.3.6-linux-amd64.tar.gz \
  https://storage.googleapis.com/kubernetes-release/release/v1.18.6/bin/linux/amd64/kubectl \
  https://storage.googleapis.com/kubernetes-release/release/v1.18.6/bin/linux/amd64/kube-proxy \
  https://storage.googleapis.com/kubernetes-release/release/v1.18.6/bin/linux/amd64/kubelet
sudo mkdir -p \
  /etc/cni/net.d \
  /opt/cni/bin \
  /var/lib/kubelet \
  /var/lib/kube-proxy \
  /var/lib/kubernetes \
  /var/run/kubernetes
mkdir containerd
tar -xvf crictl-v1.18.0-linux-amd64.tar.gz
tar -xvf containerd-1.3.6-linux-amd64.tar.gz -C containerd
sudo tar -xvf cni-plugins-linux-amd64-v0.8.6.tgz -C /opt/cni/bin/
sudo mv runc.amd64 runc
chmod +x crictl kubectl kube-proxy kubelet runc 
sudo mv crictl kubectl kube-proxy kubelet runc /usr/local/bin/
sudo mv containerd/bin/* /bin/ 

Configure containerd

sudo mkdir -p /etc/containerd/
cat << EOF | sudo tee /etc/containerd/config.toml
[plugins]
  [plugins.cri.containerd]
    snapshotter = "overlayfs"
    [plugins.cri.containerd.default_runtime]
      runtime_type = "io.containerd.runtime.v1.linux"
      runtime_engine = "/usr/local/bin/runc"
      runtime_root = ""
EOF
cat <<EOF | sudo tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
EOF

Configure the kubelet

HOSTNAME=$(hostname)
sudo mv ${HOSTNAME}-key.pem ${HOSTNAME}.pem /var/lib/kubelet/
sudo mv ${HOSTNAME}.kubeconfig /var/lib/kubelet/kubeconfig
sudo mv ca.pem /var/lib/kubernetes/
cat << EOF | sudo tee /var/lib/kubelet/kubelet-config.yaml
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: "/var/lib/kubernetes/ca.pem"
authorization:
  mode: Webhook
clusterDomain: "cluster.local"
clusterDNS: 
  - "10.32.0.10"
runtimeRequestTimeout: "15m"
tlsCertFile: "/var/lib/kubelet/${HOSTNAME}.pem"
tlsPrivateKeyFile: "/var/lib/kubelet/${HOSTNAME}-key.pem"
EOF
cat << EOF | sudo tee /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
ExecStart=/usr/local/bin/kubelet \\
  --config=/var/lib/kubelet/kubelet-config.yaml \\
  --container-runtime=remote \\
  --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\
  --image-pull-progress-deadline=2m \\
  --kubeconfig=/var/lib/kubelet/kubeconfig \\
  --network-plugin=cni \\
  --register-node=true \\
  --v=2 \\
  --hostname-override=${HOSTNAME} \\
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

configure the kubernetes proxy

sudo mv kube-proxy.kubeconfig /var/lib/kube-proxy/kubeconfig
cat << EOF | sudo tee /var/lib/kube-proxy/kube-proxy-config.yaml
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
  kubeconfig: "/var/lib/kube-proxy/kubeconfig"
mode: "iptables"
clusterCIDR: "10.200.0.0/16"
EOF
cat << EOF | sudo tee /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-proxy \\
  --config=/var/lib/kube-proxy/kube-proxy-config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Verify that the workers have registered with the controllers

We can check this from one of the controllers:

cloud_user@ctl01:~$ kubectl get nodes --kubeconfig admin.kubeconfig
NAME             STATUS     ROLES    AGE     VERSION
wrk01.kube.com   NotReady   <none>   10m     v1.18.6
wrk02.kube.com   NotReady   <none>   3m38s   v1.18.6