Intro

November 11, 2021 - Reading time: 3 minutes

Running your first container:

test@localhost: sudo docker run dockerinaction/hello_world
Unable to find image 'dockerinaction/hello_world:latest' locally
latest: Pulling from dockerinaction/hello_world
a3ed95caeb02: Pull complete
1db09adb5ddd: Pull complete
Digest: sha256:cfebf86139a3b21797765a3960e13dee000bcf332be0be529858fca840c00d7f
Status: Downloaded newer image for dockerinaction/hello_world:latest
hello world

After printing the message (program stopped), the container is marked as stopped.

Container after creation (not running)

test@localhost: sudo docker container ls --all
CONTAINER ID   IMAGE                        COMMAND                CREATED          STATUS                      PORTS     NAMES
d873f76c1a3c   dockerinaction/hello_world   "echo 'hello world'"   19 minutes ago   Exited (0) 19 minutes ago             modest_sutherland
test@localhost:

Running the container multiple times creates multiple instances

test@localhost: sudo docker container ls --all
CONTAINER ID   IMAGE                        COMMAND                CREATED          STATUS                      PORTS     NAMES
4658622f1268   dockerinaction/hello_world   "echo 'hello world'"   10 seconds ago   Exited (0) 9 seconds ago              relaxed_chandrasekhar
4156ce5490cc   dockerinaction/hello_world   "echo 'hello world'"   11 seconds ago   Exited (0) 10 seconds ago             upbeat_merkle
31c858003d28   dockerinaction/hello_world   "echo 'hello world'"   12 seconds ago   Exited (0) 11 seconds ago             kind_cartwright
df3224150c4b   dockerinaction/hello_world   "echo 'hello world'"   14 seconds ago   Exited (0) 14 seconds ago             zen_satoshi
d873f76c1a3c   dockerinaction/hello_world   "echo 'hello world'"   49 minutes ago   Exited (0) 49 minutes ago             modest_sutherland

Image used to run the container:

test@localhost: sudo docker images
REPOSITORY                   TAG       IMAGE ID       CREATED       SIZE
dockerinaction/hello_world   latest    a1a9a5ed65e9   6 years ago   2.43MB

History / details of the image:

test@localhost: sudo docker image history a1a9a5ed65e9
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
a1a9a5ed65e9   6 years ago   /bin/sh -c #(nop) CMD ["echo" "hello world"]    0B
<missing>      6 years ago   /bin/sh -c #(nop) CMD ["/bin/sh"]               0B
<missing>      6 years ago   /bin/sh -c #(nop) ADD file:f398243fc9cb933fa…   2.43MB
<missing>      6 years ago   /bin/sh -c #(nop) MAINTAINER Jérôme Petazzon…   0B

Image details:

test@localhost: sudo docker image inspect a1a9a5ed65e9 | jq '.[].Config'
{
  "Hostname": "04b9901f18ea",
  "Domainname": "",
  "User": "",
  "AttachStdin": false,
  "AttachStdout": false,
  "AttachStderr": false,
  "Tty": false,
  "OpenStdin": false,
  "StdinOnce": false,
  "Env": [
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
  ],
  "Cmd": [
    "echo",
    "hello world"
  ],
  "Image": "0f864637f229eee9da53fd5591fb58138b6bfb0196f0ee4fd9417d3655fb3d28",
  "Volumes": null,
  "WorkingDir": "",
  "Entrypoint": null,
  "OnBuild": [],
  "Labels": null
}

Detached and interactive containers

November 11, 2021 - Reading time: 4 minutes

Detached containers run in the background (not attached to input/output stream)

To launch a detached container:

test@localhost: sudo docker run --detach --name web nginx:latest
Unable to find image 'nginx:latest' locally
latest: Pulling from library/nginx
7d63c13d9b9b: Pull complete
15641ef07d80: Pull complete
392f7fc44052: Pull complete
8765c7b04ad8: Pull complete
8ddffa52b5c7: Pull complete
353f1054328a: Pull complete
Digest: sha256:dfef797ddddfc01645503cef9036369f03ae920cac82d344d58b637ee861fda1
Status: Downloaded newer image for nginx:latest
096112267c5a0d3e4df38cac68cef1aaaa17caa70ca06db8d8985f4aebea6a43

To run an interactive container, we can specify -i
an interacitve container won't have a tty by default:

test@localhost: sudo docker container run --name web_test --interactive busybox
ls
bin
dev
etc
home
proc
root
sys
tmp
usr
var                          #### <---- Exited  with ctrl + d 
test@localhost: sudo docker container ls --all
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS                      PORTS       NAMES
cc5dd5327298   busybox                     "sh"                     25 seconds ago   Exited (0) 16 seconds ago               web_test
8652860736fa   dockerinaction/ch2_mailer   "/mailer/mailer.sh"      12 minutes ago   Up 12 minutes               33333/tcp   mailer
096112267c5a   nginx:latest                "/docker-entrypoint.…"   15 minutes ago   Up 15 minutes               80/tcp      web

To get a tty we use --tty

test@localhost: sudo docker container run --name web_test --interactive --tty busybox
/ # ls
bin   dev   etc   home  proc  root  sys   tmp   usr   var
/ #   ### <--- Exited with ctrl + d 

We can use --link to create a link between containers running in the same system. For example, I want my busybox container to access nginx and load the index.html page:

test@localhost: sudo docker container run --name web_test --interactive --tty --link web:theWebServer busybox
/ # ping theWebServer
PING theWebServer (172.17.0.2): 56 data bytes
64 bytes from 172.17.0.2: seq=0 ttl=64 time=0.086 ms
[...]
/ # ping web
PING web (172.17.0.2): 56 data bytes
64 bytes from 172.17.0.2: seq=0 ttl=64 time=0.054 ms
[...]
/ # ip -4 addr show dev eth0
28: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    inet 172.17.0.4/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

To run an interactive container, and detach without exiting the process we need to use Ctrl + p Ctrl + q

test@localhost: sudo docker run --interactive --tty --name agent --link web:insideweb --link mailer:insidemailer dockerinaction/ch2_agent
[...]
System up.
System up.
System up.    ## <--- detached with ctrl +p ctrl + q 
test@localhost:

Check all containers running in the system with ps

test@localhost: sudo docker ps
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS          PORTS       NAMES
5723dbb7f8c8   dockerinaction/ch2_agent    "/watcher/watcher.sh"    4 minutes ago    Up 4 minutes                agent
8652860736fa   dockerinaction/ch2_mailer   "/mailer/mailer.sh"      39 minutes ago   Up 39 minutes   33333/tcp   mailer
096112267c5a   nginx:latest                "/docker-entrypoint.…"   42 minutes ago   Up 42 minutes   80/tcp      web

Check the stdout / stderr of a container using the log command. Optionally specify tail and follow (like tail -f)

test@localhost: sudo docker logs web --follow --tail 1
172.17.0.4 - - [12/Nov/2021:00:28:31 +0000] "GET / HTTP/1.0" 200 615 "-" "-" "-"
172.17.0.4 - - [12/Nov/2021:00:28:32 +0000] "GET / HTTP/1.0" 200 615 "-" "-" "-"
172.17.0.4 - - [12/Nov/2021:00:28:33 +0000] "GET / HTTP/1.0" 200 615 "-" "-" "-"
172.17.0.4 - - [12/Nov/2021:00:28:34 +0000] "GET / HTTP/1.0" 200 615 "-" "-" "-"
^C

Now we can stop containers (sends SIGTERM) after SIGTERM, tries to send SIGKILL (if process didn't stop).
We could also send SIGKILL with a specific signal using docker kill --signal 11

test@localhost: sudo docker stop web
web
test@localhost: sudo docker stop agent
agent
test@localhost: sudo docker stop agent mailer
agent

mailer

Docker containers and namespaces

November 12, 2021 - Reading time: 2 minutes

The name or identifier of each container represents a different namespace. This includes the PID namespace:

test@localhost: sudo docker run -d --name ns1 busybox /bin/sh -c "sleep 50000" 
c697ca05f26d149ae2f8d4cd3f69d337eca9780bf3b6fa7966a2cada9e38db02
test@localhost: sudo docker run -d --name ns2 busybox /bin/sh -c "sleep 90000" 
3d47c92028f861ecbba3fa079c0d7009aee558f953ce8ee534fdd349fb0bd403
# List processes inside each container:
test@localhost: sudo docker exec ns1 ps 
PID   USER     TIME  COMMAND
    1 root      0:00 sleep 50000
   14 root      0:00 ps
test@localhost: sudo docker exec ns2 ps 
PID   USER     TIME  COMMAND
    1 root      0:00 sleep 90000
    7 root      0:00 ps

We can specify the a different namespace for the PID using --pid. To share the same namespace as the host:

test@localhost: sudo docker run --pid host busybox ps | grep sleep
 6206 root      0:00 sleep 50000
 6290 root      0:00 sleep 90000
 6708 1000      0:00 grep sleep

The first 12 chars of the container id (1024 bit number) can be used interchangeably with the container name:

est@localhost: sudo docker ps --no-trunc
CONTAINER ID                                                       IMAGE     COMMAND                      CREATED          STATUS          PORTS     NAMES
3d47c92028f861ecbba3fa079c0d7009aee558f953ce8ee534fdd349fb0bd403   busybox   "/bin/sh -c 'sleep 90000'"   13 minutes ago   Up 13 minutes             ns2
c697ca05f26d149ae2f8d4cd3f69d337eca9780bf3b6fa7966a2cada9e38db02   busybox   "/bin/sh -c 'sleep 50000'"   13 minutes ago   Up 13 minutes             ns1
test@localhost: sudo docker exec 3d47c92028f8 ps
PID   USER     TIME  COMMAND
    1 root      0:00 sleep 90000
   30 root      0:00 ps

The CID can be written to a file during create or run

test@localhost: sudo docker create --cidfile /var/tmp/web.cid nginx 
2f7863c7fc126df665b43913bbe93685cb18733d5f9912c144f96905e4ad630d
test@localhost: cat /var/tmp/web.cid 
2f7863c7fc126df665b43913bbe93685cb18733d5f9912c144f96905e4ad630

Env var Read Only and Volumes

November 12, 2021 - Reading time: 2 minutes

Find out what files are normally modified in a container by doing a diff

test@localhost: sudo docker container run --name wp -d wordpress:php8.0-apache 
ca879747fbe643b3ebf138a93ddaf5a2e8938598434f093227b69cc18a632db8
test@localhost: sudo docker diff wp
C /run
C /run/apache2                           #<--- Created new folder
A /run/apache2/apache2.pid       #<--- Added new file 

Now I can create a read-only container, providing writing access only to specific directories. RO containers are safer and can prevent users from performing changes.

test@localhost: sudo docker container run -d --name wp --read-only --volume /run/apache2 --tmpfs /tmp wordpress:php8.0-apache 

The volumes are part of the Mounts configuration in the container.

test@localhost: sd inspect wp | jq '.[].Mounts[]|{Destination,Source,Driver}'
{
  "Destination": "/run/apache2",
  "Source": "/var/lib/docker/volumes/96d17544b776275f28cc2e83c59f13c174f82b4c6339957816c4949158ab9a00/_data",
  "Driver": "local"
}
{
  "Destination": "/var/www/html",
  "Source": "/var/lib/docker/volumes/95c1df6097627ef6fa2e5c5bab769d14963ab50ac05969a9a754062d1e106e76/_data",
  "Driver": "local"
}

tmpfs is only available on Linux and it creates a FS that is available while the container is running.

We can now launch a msyql container first, and pass an env var to set the root password

test@localhost: sudo docker run -d --name wpdb -e MYSQL_ROOT_PASSWORD=ch2demo mysql
test@localhost: sudo docker container run -d --name wp --link wpdb:mysql -p 8080:80 --read-only --volume /run/apache2 --tmpfs /tmp wordpress:php8.0-apache 

List all the env var in a container with an exec command:

test@localhost: sd container exec wpdb env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=eb2bd55ca0ee
MYSQL_ROOT_PASSWORD=ch2demo
GOSU_VERSION=1.12
MYSQL_MAJOR=8.0
MYSQL_VERSION=8.0.27-1debian10
HOME=/root

Container startup system / maintenance and cleaning

November 12, 2021 - Reading time: 2 minutes

Docker can be instructed to automatically restart a failing container (failing in this context means that the last process in the container finished / exited)

test@localhost: sudo docker run -d --restart 
always          no              on-failure      on-failure:     unless-stopped 

always will restart every time the container exits - with an exponential back-off no default on-failure when it exits with non-zero status with optional :max-retries unless-stopped don't do it on containers that are stopped.

A more elegant way - particularly when dealing with multiple processes inside a container - to control the container is to use an image that includes a sys init. (tini, supervisord, runinit)

test@localhost: sudo docker run -d -p 80:80 --name lamp tutum/lamp
test@localhost: sudo docker exec lamp ps
    PID TTY          TIME CMD
      1 ?        00:00:00 supervisord
    434 ?        00:00:00 mysqld_safe
    435 ?        00:00:00 apache2
    816 ?        00:00:00 ps
test@localhost: sudo docker top lamp
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                12215               12193               0                   17:30               ?                   00:00:00            /usr/bin/python /usr/bin/supervisord -n
root                12675               12215               0                   17:30               ?                   00:00:00            /bin/sh /usr/bin/mysqld_safe
root                12676               12215               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
www-data            12774               12676               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
www-data            12776               12676               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
www-data            12779               12676               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
www-data            12780               12676               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
www-data            12781               12676               0                   17:30               ?                   00:00:00            apache2 -D FOREGROUND
systemd+            13039               12675               0                   17:30               ?                   00:00:00            /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

Entry point read by docker before it runs specific container commands - in this example it was a bash script, that I replaced with "cat" and then passed a parameter (the file name)

test@localhost: sudo docker run --entrypoint="cat" wordpress:php8.0-apache /usr/local/bin/docker-entrypoint.sh
#!/usr/bin/env bash
set -Eeuo pipefail

if [[ "$1" == apache2* ]] || [ "$1" = 'php-fpm' ]; then
    uid="$(id -u)"
    gid="$(id -g)"
    if [ "$uid" = '0' ]; then

Repositories Images tags and Dockerfile

November 14, 2021 - Reading time: 8 minutes

Docker will pull images from Docker Hub by default, but we can specify the location of the image using: [REGISTRYHOST:PORT/][USERNAME/]NAME[:TAG]

Running your own repository could create a single point of failure!

Images can also be saved into a file (exported) and copied / distributed to other devices:

test@localhost: sudo docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:e7157b6d7ebbe2cce5eaa8cfe8aa4fa82d173999b9f90a9ec42e57323546c353
Status: Image is up to date for busybox:latest
docker.io/library/busybox:latest
##
test@localhost: sudo docker save -o /var/tmp/busy.img busybox:latest
##
test@localhost: sudo file /var/tmp/busy.img
/var/tmp/busy.img: POSIX tar archive

The resulting tar file contains:

test@localhost: sudo tar -tf /var/tmp/busy.img
7138284460ffa3bb6ee087344f5b051468b3f8697e2d1427bac1a20c8d168b14.json
c827bfcc430983f33f0807d0539883ecdaf786e41b20a77876e30f9cb2014e92/
c827bfcc430983f33f0807d0539883ecdaf786e41b20a77876e30f9cb2014e92/VERSION
c827bfcc430983f33f0807d0539883ecdaf786e41b20a77876e30f9cb2014e92/json
c827bfcc430983f33f0807d0539883ecdaf786e41b20a77876e30f9cb2014e92/layer.tar
manifest.json
repositories

We can now delete the image, and load it again from the saved file:

test@localhost: sudo docker rmi  busybox
Untagged: busybox:latest
Untagged: busybox@sha256:e7157b6d7ebbe2cce5eaa8cfe8aa4fa82d173999b9f90a9ec42e57323546c353
Deleted: sha256:7138284460ffa3bb6ee087344f5b051468b3f8697e2d1427bac1a20c8d168b14
Deleted: sha256:d94c78be13527d00673093f9677f9b43d7e3a02ae6fa0ec74d3d98243b5b40e4
##
test@localhost: sudo docker load -i /var/tmp/busy.img
d94c78be1352: Loading layer [==================================================>]  1.459MB/1.459MB
Loaded image: busybox:latest

An alternative way to build an image, is to specify the commands / base images using a Dockerfile:

test@localhost: cat test.sh
#!/bin/sh
echo "Hello this is ${HOSTNAME}"
test@localhost: chmod 777 test.sh
##
test@localhost: cat Dockerfile
FROM busybox:latest
ADD test.sh /var/tmp/
WORKDIR /var/tmp/
CMD ./test.sh
##
test@localhost: sudo docker build -f ./Dockerfile -t mytest:first ./
Sending build context to Docker daemon  321.5MB
Step 1/4 : FROM busybox:latest
 ---> 7138284460ff
Step 2/4 : ADD test.sh /var/tmp/
 ---> bba4f2a8d9bd
Step 3/4 : WORKDIR /var/tmp/
 ---> Running in 8ac9a959ddec
Removing intermediate container 8ac9a959ddec
 ---> 3ab3a1ddf92f
Step 4/4 : CMD ./test.sh
 ---> Running in f3ca9c11adf3
Removing intermediate container f3ca9c11adf3
 ---> 2de1580be977
Successfully built 2de1580be977
Successfully tagged mytest:first
##
test@localhost: sudo docker image ls mytest
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
mytest       first     2de1580be977   32 seconds ago   1.24MB
##
test@localhost: sudo docker run mytest:first
Hello this is 5d8087899746

A layer is set of files and file metadata that is packaged and distributed as an atomic unit.
Each layer is treated internally as an image. A layer can be "promoted" to an image by tagging it.
Most layers build up on existing layers by adding new files to the FS.

When pulling images, existing layers can be re-used. Layers are NOT modified, a new layer is created if a file needs to be modified. For example, here we're pulling two apps that rely on the same based layers. The layers are donwloaded only for the first image:

test@localhost: sudo docker pull dockerinaction/ch3_myapp
Using default tag: latest
latest: Pulling from dockerinaction/ch3_myapp
f5d23c7fed46: Pull complete 
eaa7ca9a16a1: Pull complete 
d7d34b884c95: Pull complete 
d0f024ff373b: Pull complete 
9384c9efb97d: Pull complete 
a7e74b426681: Pull complete 
6f1c51bc28c2: Pull complete 
ce0e70589db8: Pull complete 
df420ec9fa4c: Pull complete 
Digest: sha256:2e492fedd50d9d4ef5e8ea92c32795c3f53836199322cb85eafb93c2e139b3f1
Status: Downloaded newer image for dockerinaction/ch3_myapp:latest

test@localhost: sudo docker image ls dockerinaction/ch3_myapp
REPOSITORY                 TAG       IMAGE ID       CREATED       SIZE
dockerinaction/ch3_myapp   latest    0858f7736a46   2 years ago   401MB

test@localhost: sudo docker pull dockerinaction/ch3_myotherapp
Using default tag: latest
latest: Pulling from dockerinaction/ch3_myotherapp
f5d23c7fed46: Already exists    ## <--- no need to download again 
eaa7ca9a16a1: Already exists 
d7d34b884c95: Already exists 
d0f024ff373b: Already exists      ## <--- no need to download again 
b739d2c7836e: Pull complete 
79f97461601b: Pull complete 
1c2b86e90a51: Pull complete 
57ebdb20c65a: Pull complete 
1558a979f442: Pull complete 
Digest: sha256:5ec2875ca4b24ad5df22b03b4cf45181ad544cdc8b22dc85d27960e28131433e
Status: Downloaded newer image for dockerinaction/ch3_myotherapp:latest
docker.io/dockerinaction/ch3_myotherapp:latest

Images are referenced by ID only, until they're tagged and published. In this example, the first layer is Debian, then there is an openjdk layer (3 layers) and the specific layers of each app.

Docker uses the MNT namespace to mount the filesystems specific to the image, and then limits the folder structure with chroot.

With the two applications running, we can see the mounts on the host os:

test@localhost: sudo mount  | tail -n4
overlay on /var/lib/docker/overlay2/d7a4bcb57c946ee97e153dbefe72582f2eba3edcbd7aa44609edaf2e404a607e/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GBJ577KE4P36ABUP4E7AIJFO66:/var/lib/docker/overlay2/l/6URWUWCRMTSWR6RH72UZ2BWFWG:/var/lib/docker/overlay2/l/P4PI4JNJVXP3OGSOGHPHDY57HY:/var/lib/docker/overlay2/l/5KKCT7GQZFAN7DH3EJW3E6Y7FJ:/var/lib/docker/overlay2/l/OIEYT3DV5JPNTQG5BBNYOIV4PU:/var/lib/docker/overlay2/l/JSX77UNYPC3NILHOVBLSQ2WCQS:/var/lib/docker/overlay2/l/2U3BRBSWUJN3MOUTXXVEI3TFNL:/var/lib/docker/overlay2/l/TZ62SNSFUUKQ74ROE3RJ3I7R3H:/var/lib/docker/overlay2/l/DL65Y7L6TM2GMOOJHAAID6X5UL:/var/lib/docker/overlay2/l/ZJ4RTWLGDGBBNSYHSMUDSUP6LQ,upperdir=/var/lib/docker/overlay2/d7a4bcb57c946ee97e153dbefe72582f2eba3edcbd7aa44609edaf2e404a607e/diff,workdir=/var/lib/docker/overlay2/d7a4bcb57c946ee97e153dbefe72582f2eba3edcbd7aa44609edaf2e404a607e/work)
nsfs on /run/docker/netns/8f576aa794e3 type nsfs (rw)
overlay on /var/lib/docker/overlay2/8ee9f0042e38f75538eac9e765b31c3a178923346eb6ca907361ee5e7634e863/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/7W2ODADHDIMIZ2GWOSGXLU2LIY:/var/lib/docker/overlay2/l/7ILMY6OC6HYYMJZTZ23W6N2HAU:/var/lib/docker/overlay2/l/K433PZWEGYQ7GCBSKSNLGJ3V4V:/var/lib/docker/overlay2/l/VF55NXTGQXQV5JARHLCSEMTDNG:/var/lib/docker/overlay2/l/H5DPD3UM3R2JM4U7E4Z73NUUYN:/var/lib/docker/overlay2/l/TZCY2MCB3JYYWZIQDYYXEFDPJU:/var/lib/docker/overlay2/l/2U3BRBSWUJN3MOUTXXVEI3TFNL:/var/lib/docker/overlay2/l/TZ62SNSFUUKQ74ROE3RJ3I7R3H:/var/lib/docker/overlay2/l/DL65Y7L6TM2GMOOJHAAID6X5UL:/var/lib/docker/overlay2/l/ZJ4RTWLGDGBBNSYHSMUDSUP6LQ,upperdir=/var/lib/docker/overlay2/8ee9f0042e38f75538eac9e765b31c3a178923346eb6ca907361ee5e7634e863/diff,workdir=/var/lib/docker/overlay2/8ee9f0042e38f75538eac9e765b31c3a178923346eb6ca907361ee5e7634e863/work)
nsfs on /run/docker/netns/570a28936e82 type nsfs (rw)

Each storage driver has its pros and cons, and they can be replaced (like pluggins) Docker uses overlay2 by default:

test@localhost: sd info | grep Stora
 Storage Driver: overlay2

https://docs.docker.com/storage/storagedriver/select-storage-driver/


volumes tmpfs and binds

November 15, 2021 - Reading time: 4 minutes

The union file system (UnionFS) that inspired overlay2, aufs shouldn't be used to write files that need to out-live the container (like databases, logs) or data that needs to be shared with other containers or with the host.

A container can have access at the same type to all type of storage (mounted with --mount). For example:

Image FS --> overlay driver --> Maps to /var/lib/docker/overlay2/l/... in the host
In-Memory --> tmpfs --> volatile /tmp on the container
Bind-mount --> /etc/httpd/test.conf --> Maps to /home/test/nginxconf.txt in the host (ro or rw)
Volume --> /var/logs --> Maps to /var/lib/docker/volumes/some-location/_data (read/write)

Bind

I can link a specific file or folder on the host system to the container. This is useful for logs, config files, or any other tool that needs to be shared between the host and the container. src represents the file in the host:

CONF_SRC=~/example.conf; \
CONF_DST=/etc/nginx/conf.d/default.conf; \

LOG_SRC=~/example.log; \
LOG_DST=/var/log/nginx/custom.host.access.log; \

docker run -d --name diaweb \
  --mount type=bind,src=${CONF_SRC},dst=${CONF_DST} \
  --mount type=bind,src=${LOG_SRC},dst=${LOG_DST} \
  -p 80:80 \
  nginx:latest

The bind replaces the original content of the image in /etc/nginx/conf.d/default.conf - Optionally the bind can be made Read-only

In-mem

Useful for private keys, API keys, passwords or any private information.

docker run --rm \
    --mount type=tmpfs,dst=/tmp \
    --entrypoint mount \
    alpine:latest -v
[...]
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)

We can add more parameters to limit the size, and the file permissions.

Volumes

Useful to share locations between different containers. The volumes could be local to the host, or remote (specified by the driver option). Volumes can be created on their own, or as part of a container, listed and inspected.

test@localhost: sudo docker volume create --driver local --label test=first myvolume
myvolume
test@localhost: sudo docker volume inspect myvolume 
[
    {
        "CreatedAt": "2021-11-15T12:06:32Z",
        "Driver": "local",
        "Labels": {
            "test": "first"
        },
        "Mountpoint": "/var/lib/docker/volumes/myvolume/_data",
        "Name": "myvolume",
        "Options": {},
        "Scope": "local"
    }
]

Volumes can be passed to containers with --volume or with mount (type=volume).

test@localhost: sudo docker run -d --volume myvolume:/var/lib/cassandra/data --name cass1 cassandra:2.2 
[...]
test@localhost: sudo docker exec -it cass1 /bin/sh
# cqlsh 
cqlsh> create keyspace docker_hello_world
   ... with replication = {
   ...     'class' : 'SimpleStrategy',
   ...     'replication_factor': 1
   ... };

The content is now saved in a volume. Even if the container is deleted, a new cassandra container can be launched with the same volume settings to access the data.
Containers can use the --volumes-from flag to copy existing volumes from other containers


Single-host network and DNS

November 15, 2021 - Reading time: 6 minutes

Docker creates three networks by default:

test@localhost: sudo docker network ls 
NETWORK ID     NAME      DRIVER    SCOPE
87b1b48fc040   bridge    bridge    local
bc4e37844d7a   host      host      local
6a689a237adc   none      null      local

All containers are attached to bridge for inter-container connectivity. The host networks is used to place containers on the host namespace. Possible scopes are local , global (be part of a swarm, but don't route) and swarm.

Each container has a loopback int and a private interface that's attached to the docker0 virtual interface.

Create a new bridge network with a different subnet, make it attachable so that containers can be attached/removed at any point:

test@localhost: sudo docker network create --driver bridge --label test=first --attachable --scope local --subnet 10.55.11.0/24 --ip-range 10.55.11.128/25 mynetwork
bf0d64f6d3e10f04b87ae5d33606e0c34302b8fbcde8d6851612062e2bc02ccf

Networks can be added to containers:

test@localhost: sudo docker run --interactive --tty --network mynetwork --name test alpine sh
[...]
/ # ip -f inet -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
61: eth0    inet 10.55.11.129/24 brd 10.55.11.255 scope global eth0\       valid_lft forever preferred_lft forever

Create a new network, and connect our existing container:

test@localhost: sudo docker network create --driver bridge --label test=first --attachable --scope local --subnet 10.66.11.0/24 --ip-range 10.66.11.128/25 mynetwork2
71b6284bdc851e5a6f4d3296952ecaa93428f18935b946385017594f4948f6bb
test@localhost: sudo docker network connect mynetwork2 test 
test@localhost: sudo docker attach test
/ # ip -f inet -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
61: eth0    inet 10.55.11.129/24 brd 10.55.11.255 scope global eth0\       valid_lft forever preferred_lft forever
64: eth1    inet 10.66.11.129/24 brd 10.66.11.255 scope global eth1\       valid_lft forever preferred_lft forever

If I now create another container, and attach it to mynetwork2, I can do a reverse DNS lookup to get the hostname:

test@localhost: sudo docker attach test
/ # dig -x 10.66.11.130 +short
test2.mynetwork2.
test@localhost: sudo docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED          STATUS          PORTS     NAMES
9829476f60f8   alpine    "sh"      8 minutes ago    Up 8 minutes              test2
dd071d4ee496   alpine    "sh"      16 minutes ago   Up 16 minutes             test

Swarm

When working with multiple node in a swam, we need to use the overlay driver to create a logical switch that expands to all nodes in the cluster. This allows inter-communication between containers in different nodes.

Host network

The container and the host share the same network namespace:

test@localhost: sudo docker run --interactive --tty --network host --name testhost alpine /bin/sh
/ # ip -f inet -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: ens33    inet 192.168.0.12/24 brd 192.168.0.255 scope global dynamic ens33\       valid_lft 75444sec preferred_lft 75444sec
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
60: br-bf0d64f6d3e1    inet 10.55.11.128/24 brd 10.55.11.255 scope global br-bf0d64f6d3e1\       valid_lft forever preferred_lft forever
63: br-71b6284bdc85    inet 10.66.11.128/24 brd 10.66.11.255 scope global br-71b6284bdc85\       valid_lft forever preferred_lft forever

NodePort publishing

Used to provide in-bound connectivity to the container on a specific port.
The --publish list includes tuples of host:container ports. It can only be specified during creation: create or run
-p 8080 -> forwards random port in the host to 8080 in the container
-p 8080:8080/udp -> forwards UDP:8080 in the host to the same in the container
All the published ports can be listed with:

test@localhost: sudo docker run --detach --name list01 --publish 8080 alpine sleep 1d
6451adc9b7daccc6df93f0c0cb55d571b4a682b11194bbbfc5f3e3033232cb07
test@localhost: sudo docker run --detach --name list02 --publish 8080:8080 alpine sleep 1d
732cb05042ee2d2e037802ab1fce7ee53a9afa75b611a2eea214bcbca21001f1
test@localhost: sudo docker port list01
8080/tcp -> 0.0.0.0:49153
8080/tcp -> :::49153
test@localhost: sudo docker port list02
8080/tcp -> 0.0.0.0:8080
8080/tcp -> :::8080

DNS

Names assigned to containers via --name or --hostname can be used by other containers in the same network to resolve the IP Address.

test@localhost: sudo docker run --interactive --tty --hostname myhostname.cool.uk --network mynetwork --name anyname alpine sh 
/ # cat /etc/hosts 
[...]
10.55.11.130    myhostname.cool.uk myhostname
test@localhost: sudo docker attach test
/ # dig +short myhostname.cool.uk 
10.55.11.130
/ # dig +short anyname.mynetwork
10.55.11.130

The container settings can be overridden with --dns and --dns-search and add static hosts with --add-host


Container restrictions

November 15, 2021 - Reading time: 5 minutes

Memory

Memory limits are specified when a container is create or run. They don't specify the amount of memory that needs to be reserved, but the max available. --memory 256m

test@localhost: sudo docker run --name memtest --detach --memory 7m busybox sleep 1d
b76fa66d685e2ce5a6b143f8b4a1fc0b9953d4598f9be0376a5eff73f6cebc1f
test@localhost: sudo docker stats --no-stream
CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %     NET I/O       BLOCK I/O        PIDS
b76fa66d685e   memtest   0.00%     196KiB / 7MiB       2.73%     2.65kB / 0B   201kB / 1.67MB   1

CPU

There are different ways to restrict CPU availability. With --cpu-shares the container uses as much CPU as it's available, but in case of contention, the processes get scheduled based on the percentage of share assigned to the container. If new containers are created, the percentages are re-calculated.
Initial state: C1: 512, C2: 512 (cpu assigned 50-50)
New state: C1: 512, C2: 512, C3: 1024 (cpu assigned 25-25-50)
With --cpus we can specify exactly the number of CPU assigned to the container.
The containers can also be pinned to a specific CPU with --cpuset-cpus

test@localhost: sudo docker create --help | grep cpu
      --cpu-period int                 Limit CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int                  Limit CPU CFS (Completely Fair Scheduler) quota
      --cpu-rt-period int              Limit CPU real-time period in microseconds
      --cpu-rt-runtime int             Limit CPU real-time runtime in microseconds
  -c, --cpu-shares int                 CPU shares (relative weight)
      --cpus decimal                   Number of CPUs
      --cpuset-cpus string             CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string             MEMs in which to allow execution (0-3, 0,1)

Shared mem

Some process do IPC via shared memory (faster than named pipes, or network IPC).
In order to achieve this, the IPC needs to be explicitly shared in one container shareable, and then the other containers need to have a reference to it container:<name>.

test@localhost: sudo docker run --ipc
container:  host        none        private     shareable

Users

By default (if not specified in the image) the container will run as the root user:

test@localhost: sudo docker image inspect nginx | gron | grep -i user
json[0].Config.User = "";
json[0].ContainerConfig.User = "";
test@localhost: sudo docker run --entrypoint whoami --rm nginx
root

The value can be replaced by any user that exists in the image

test@localhost: sudo docker container run --user nginx --rm --entrypoint whoami nginx
nginx

File permissions are shared between host and container when using binds (or volumes).

test@localhost: echo hi > hi.txt
test@localhost: chmod 400 hi.txt ; sudo chown root:root hi.txt
test@localhost: sudo docker container run --rm --user www-data --volume $(pwd)/hi.txt:/container/hi.txt debian bash -c 'whoami ; cat /container/hi.txt ; ls -l /container'
cat: /container/hi.txt: Permission denied
www-data
total 4
-r-------- 1 root root 3 Nov 15 23:00 hi.txt

Capabilities

Specific capabilities can be added or restricted using --cap

test@localhost: sudo docker container run --cap-add
ALL                     CAP_BLOCK_SUSPEND       CAP_LINUX_IMMUTABLE     CAP_SYS_BOOT            CAP_SYS_RESOURCE        IPC_OWNER               PERFMON                 SYS_PACCT
AUDIT_CONTROL           CAP_BPF                 CAP_MAC_ADMIN           CAP_SYSLOG              CAP_SYS_TIME            LEASE                   RESET                   SYS_PTRACE
AUDIT_READ              CAP_CHECKPOINT_RESTORE  CAP_MAC_OVERRIDE        CAP_SYS_MODULE          CAP_SYS_TTY_CONFIG      LINUX_IMMUTABLE         SYS_ADMIN               SYS_RAWIO
BLOCK_SUSPEND           CAP_DAC_READ_SEARCH     CAP_NET_ADMIN           CAP_SYS_NICE            CAP_WAKE_ALARM          MAC_ADMIN               SYS_BOOT                SYS_RESOURCE
BPF                     CAP_IPC_LOCK            CAP_NET_BROADCAST       CAP_SYS_PACCT           CHECKPOINT_RESTORE      MAC_OVERRIDE            SYSLOG                  SYS_TIME
CAP_AUDIT_CONTROL       CAP_IPC_OWNER           CAP_PERFMON             CAP_SYS_PTRACE          DAC_READ_SEARCH         NET_ADMIN               SYS_MODULE              SYS_TTY_CONFIG
CAP_AUDIT_READ          CAP_LEASE               CAP_SYS_ADMIN           CAP_SYS_RAWIO           IPC_LOCK                NET_BROADCAST           SYS_NICE                WAKE_ALARM

Another option to add all capabilities is using --privileged
Docker also allows loading seccomp and LSM profiles:

test@localhost: sudo docker container run --security-opt
apparmor=               label=                  no-new-privileges       seccomp=                systempaths=unconfined

Dockerfile

November 16, 2021 - Reading time: 6 minutes

Basic example

List of instructions to run when creating a container.

FROM debian:latest
LABEL location="DC1"
LABEL maintainer="Alan <a@b.com>"
RUN apt update && apt install -y git
ENTRYPOINT  ["git"]

The name of the file is Dockerfile and we can generate a new container from it with:

test@localhost: sudo docker build --tag debian:with-git .
[...]
test@localhost: sudo docker image history debian:with-git
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
3ecca0c1966a   2 minutes ago   /bin/sh -c #(nop)  ENTRYPOINT ["git"]           0B        
2d12420a2edb   2 minutes ago   /bin/sh -c apt update && apt install -y git     119MB     
05954652b3a9   3 minutes ago   /bin/sh -c #(nop)  LABEL maintainer=Alan <a@…   0B        
e85ee52ff06f   3 minutes ago   /bin/sh -c #(nop)  LABEL location=DC1           0B        
f776cfb21b5e   5 weeks ago     /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      5 weeks ago     /bin/sh -c #(nop) ADD file:aea313ae50ce6474a…   124MB 

The build process will add intermediate layers to the cache, and rely on the cache for future builds, unless instructed otherwise.

Ignore file

docker build reads the contents of .dockerignore do find out which files it should never add to the new image.

test@localhost: cat .dockerignore 
.dockerignore 
secrets
users
loop.df

In this example, we ignore the loop.df (dockerfile) and other files. The content of loop.df is:

FROM debian:latest

ENV APPROOT="/app" \
    APP="loop.sh"  \
    VER="1.0"

LABEL com.test.auth="pepe <pepe@test.com>" \ 
      com.test.loc="east" \
      com.test.ver="${VER}"

WORKDIR ${APPROOT}
ADD . ${APPROOT}
ENTRYPOINT [ "/app/loop.sh" ]
EXPOSE 3333

Mixing multiple LABEL and ENV in one command reduces the number of layers.
The directory only contains the bash script, the docker file and the .dockerignore file:

test@localhost: ls -a
.  ..  .dockerignore  loop.df  loop.sh

Then we can build the image with:

test@localhost: sudo docker build --file loop.df --tag debian:loop .
Sending build context to Docker daemon  4.096kB
[...]
Step 4/7 : WORKDIR ${APPROOT}
 ---> Running in 1f348ea58afd
Removing intermediate container 1f348ea58afd
 ---> f1188008d0a0
Step 5/7 : ADD . ${APPROOT}
 ---> 3b8d4f431e58
Step 6/7 : ENTRYPOINT [ "/app/loop.sh" ]
 ---> Running in 8670ec90d355
Removing intermediate container 8670ec90d355
 ---> d6c0afd193dc
Step 7/7 : EXPOSE 3333
 ---> Running in e84f35d0dae4
Removing intermediate container e84f35d0dae4
 ---> b3925800617d

All the intermediate images using in the build process are left behind un-tagged:

test@localhost: sudo docker image ls --all
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
<none>       <none>    d6c0afd193dc   2 minutes ago   124MB
debian       loop      b3925800617d   2 minutes ago   124MB
<none>       <none>    26a9cad0f0c7   2 minutes ago   124MB
<none>       <none>    f1188008d0a0   2 minutes ago   124MB
<none>       <none>    3b8d4f431e58   2 minutes ago   124MB
<none>       <none>    b2cb3d542a4c   2 minutes ago   124MB
debian       latest    f776cfb21b5e   5 weeks ago     124MB

The ENTRYPOINT parameter accepts two different formats (array or string). When adding it as an array, anything passed as CMD or as parameter when docker container run will be ignored. The preferred way is exec command (array).

File system specific commands

  • COPY: for one specific file or folder
  • ADD: for the current folder
  • VOLUME: attach one or more volumes

Build related commands

  • ONBUILD [RUN] ... : commands that will only be run when the image is used with "FROM"

Arguments

  • ARG VERSION=none: The value of version can be set with --build-arg VERSION=1.0 during docker image build

Multistage docker file

A file that has more than one FROM stanza. Each FROM marks a new stage, and the output layer can be used by the next stage. FROM <image> AS <identifier>. The identifier can be use by other stages in FROM or COPY. At the end we endup with only one single image.