You can apply the principle of least privilege with a few simple modifications to your Containerfile & by adding a couple of arguments to your container run command. I’ll be referencing a small FastAPI app called netinfo. Here is its Containerfile:
FROM python:3.9
LABEL maintainer="Jarno Timmermans"
RUN groupadd -r netinfo && useradd -r -g netinfo netinfo
RUN chsh -s /usr/sbin/nologin root
WORKDIR /home/netinfo
COPY requirements.txt .
RUN pip install -r requirements.txt
EXPOSE 8000
COPY ./app ./app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Use an unprivileged user
Don’t run your application as the root user within the container, instead:
- Create an unprivileged user.
- Run the app from within that user’s home directory.
Add the following two lines to your Containerfile:
RUN groupadd -r netinfo && useradd -r -g netinfo netinfo
WORKDIR /home/netinfo
Instruct podman to run the container with the netinfo user:
podman run --rm -u netinfo -p 8000:8000 <IMAGE-ID>
If you were to run a shell inside the container, you would be connected as the netinfo user within the /home/netinfo directory:
[email protected]:~$ pwd
/home/netinfo
Disable the root user
An easy way to do this, is by changing the default shell from /bin/bash to /usr/sbin/nologin. Add the following line to your Containerfile:
RUN chsh -s /usr/sbin/nologin root
Use a read-only file system
podman run --rm --read-only -u netinfo -p 8000:8000 <IMAGE-ID>
And you’d be unable to make file system modifications:
[email protected]:~$ touch hello
touch: cannot touch 'hello': Read-only file system
Prevent Privilege Escalation
Add the argument --security-opt=no-new-privileges to your run command.
podman run --rm --read-only --security-opt=no-new-privileges -u netinfo -p 8000:8000 <IMAGE_ID>
Drop All Kernel Capabilities and add as needed
Add the argument --cap-drop=all to your run command.
podman run --rm --read-only --cap-drop=all --security-opt=no-new-privileges -u netinfo -p 8000:8000 <IMAGE_ID>
You can then use the --cap-add argument to add any capabilities your app might need. E.g.:
- CAP_NET_ADMIN allows the process to perform network-related operations,
- CAP_NET_BIND_SERVICE allows it to bind to port numbers less than 1024,
- CAP_SYS_TIME allows it to modify the system clock,
- etc…
Limit resource usage with Control Groups
While Linux Namespaces allow you to separate access to resources, they don’t allow you to limit usage. You need Linux Control Groups for that.
# only half a CPU core
podman run --cpus="0.5" ...
# only 225MB maximum available memory
podman run --memory="225m" ...
seccomp, SELinux and AppArmor
These are bit more advanced and outside of the scope of this article. But seccomp gives you even finer-grained control over the sys-calls a process within your container can make.
SELinux is a MAC (mandatory access control) mechanism thats label users, processes, files & system resources. It governs which user or process can access which files & resources. AppArmor is similar, but uses file paths and focuses on processes.