Container restrictions

November 15, 2021 - Reading time: 5 minutes

Memory

Memory limits are specified when a container is create or run. They don't specify the amount of memory that needs to be reserved, but the max available. --memory 256m

test@localhost: sudo docker run --name memtest --detach --memory 7m busybox sleep 1d
b76fa66d685e2ce5a6b143f8b4a1fc0b9953d4598f9be0376a5eff73f6cebc1f
test@localhost: sudo docker stats --no-stream
CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %     NET I/O       BLOCK I/O        PIDS
b76fa66d685e   memtest   0.00%     196KiB / 7MiB       2.73%     2.65kB / 0B   201kB / 1.67MB   1

CPU

There are different ways to restrict CPU availability. With --cpu-shares the container uses as much CPU as it's available, but in case of contention, the processes get scheduled based on the percentage of share assigned to the container. If new containers are created, the percentages are re-calculated.
Initial state: C1: 512, C2: 512 (cpu assigned 50-50)
New state: C1: 512, C2: 512, C3: 1024 (cpu assigned 25-25-50)
With --cpus we can specify exactly the number of CPU assigned to the container.
The containers can also be pinned to a specific CPU with --cpuset-cpus

test@localhost: sudo docker create --help | grep cpu
      --cpu-period int                 Limit CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int                  Limit CPU CFS (Completely Fair Scheduler) quota
      --cpu-rt-period int              Limit CPU real-time period in microseconds
      --cpu-rt-runtime int             Limit CPU real-time runtime in microseconds
  -c, --cpu-shares int                 CPU shares (relative weight)
      --cpus decimal                   Number of CPUs
      --cpuset-cpus string             CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string             MEMs in which to allow execution (0-3, 0,1)

Shared mem

Some process do IPC via shared memory (faster than named pipes, or network IPC).
In order to achieve this, the IPC needs to be explicitly shared in one container shareable, and then the other containers need to have a reference to it container:<name>.

test@localhost: sudo docker run --ipc
container:  host        none        private     shareable

Users

By default (if not specified in the image) the container will run as the root user:

test@localhost: sudo docker image inspect nginx | gron | grep -i user
json[0].Config.User = "";
json[0].ContainerConfig.User = "";
test@localhost: sudo docker run --entrypoint whoami --rm nginx
root

The value can be replaced by any user that exists in the image

test@localhost: sudo docker container run --user nginx --rm --entrypoint whoami nginx
nginx

File permissions are shared between host and container when using binds (or volumes).

test@localhost: echo hi > hi.txt
test@localhost: chmod 400 hi.txt ; sudo chown root:root hi.txt
test@localhost: sudo docker container run --rm --user www-data --volume $(pwd)/hi.txt:/container/hi.txt debian bash -c 'whoami ; cat /container/hi.txt ; ls -l /container'
cat: /container/hi.txt: Permission denied
www-data
total 4
-r-------- 1 root root 3 Nov 15 23:00 hi.txt

Capabilities

Specific capabilities can be added or restricted using --cap

test@localhost: sudo docker container run --cap-add
ALL                     CAP_BLOCK_SUSPEND       CAP_LINUX_IMMUTABLE     CAP_SYS_BOOT            CAP_SYS_RESOURCE        IPC_OWNER               PERFMON                 SYS_PACCT
AUDIT_CONTROL           CAP_BPF                 CAP_MAC_ADMIN           CAP_SYSLOG              CAP_SYS_TIME            LEASE                   RESET                   SYS_PTRACE
AUDIT_READ              CAP_CHECKPOINT_RESTORE  CAP_MAC_OVERRIDE        CAP_SYS_MODULE          CAP_SYS_TTY_CONFIG      LINUX_IMMUTABLE         SYS_ADMIN               SYS_RAWIO
BLOCK_SUSPEND           CAP_DAC_READ_SEARCH     CAP_NET_ADMIN           CAP_SYS_NICE            CAP_WAKE_ALARM          MAC_ADMIN               SYS_BOOT                SYS_RESOURCE
BPF                     CAP_IPC_LOCK            CAP_NET_BROADCAST       CAP_SYS_PACCT           CHECKPOINT_RESTORE      MAC_OVERRIDE            SYSLOG                  SYS_TIME
CAP_AUDIT_CONTROL       CAP_IPC_OWNER           CAP_PERFMON             CAP_SYS_PTRACE          DAC_READ_SEARCH         NET_ADMIN               SYS_MODULE              SYS_TTY_CONFIG
CAP_AUDIT_READ          CAP_LEASE               CAP_SYS_ADMIN           CAP_SYS_RAWIO           IPC_LOCK                NET_BROADCAST           SYS_NICE                WAKE_ALARM

Another option to add all capabilities is using --privileged
Docker also allows loading seccomp and LSM profiles:

test@localhost: sudo docker container run --security-opt
apparmor=               label=                  no-new-privileges       seccomp=                systempaths=unconfined