How Docker Containers Work
From Linux namespaces to macOS support and GPU โ a decade of technical evolution
The fundamental problem Docker solved is simple: running apps that require different library versions on the same machine.
VMs can solve this, but duplicating kernels, filesystems, and caches is heavy. Docker used Linux namespaces to achieve isolation without VMs.
Namespaces โ The Core Mechanism
Namespaces are a kernel feature that separates each process's "view" of system resources like filesystems, network, and PIDs. Two processes requesting the same /etc/passwd actually see different files.
Starting with mount namespaces in 2001 through network namespaces in 2007, seven types were incrementally added. Docker's contribution was packaging these low-level features into something developers could actually use.
Images โ Layered Filesystems
docker build executes each Dockerfile instruction and stacks the resulting filesystem diffs as layers. Stored in content-addressable storage, so identical layers are never duplicated.
Runs on copy-on-write filesystems like overlayfs, btrfs, and ZFS.
Container Execution โ What containerd Does
When docker run is called, containerd dynamically configures namespaces:
cgroups โ resource limits for CPU, memory, etc.
Network namespaces โ remap container ports to host ports
Storage volumes โ attach persistent host filesystem storage
PID namespaces โ isolate process trees
User namespaces โ map container UIDs to different host UIDs
Namespace setup has slight overhead but is far lighter than spawning a VM. Most containers start in under a second.
macOS Support โ The Inverted Architecture
Docker images only run on Linux kernels. For macOS/Windows developers, Docker embedded a hypervisor inside the app itself.
HyperKit โ a library VMM that runs a Linux kernel in a regular user process using Intel CPU hardware virtualization extensions. Inspired by unikernel research.
LinuxKit โ a custom Linux distro containing only the minimum components needed to run Docker. Every component runs inside a container, nothing runs in the root namespace at boot.
HyperKit + LinuxKit boots Linux processes at nearly native macOS speed.
Networking โ The SLIRP Solution
Networking from embedded Linux to macOS turned out to be surprisingly difficult. Corporate firewalls detected container traffic as malicious, generating thousands of bug reports during beta.
The solution is fun: they reused SLIRP, a tool from the 1990s originally for connecting Palm Pilots to the internet. A user-space TCP/IP stack written in OCaml (vpnkit) translates Linux networking requests into native macOS socket calls.
From a VPN policy perspective, outgoing traffic appears to come from the Docker app, so firewall issues vanished. Bug reports from enterprise users dropped 99%+ after vpnkit deployment.
GPU Support โ CDI
With AI workloads rising, GPU dependency management is the new challenge. GPU workloads need exactly matching kernel drivers and user-space libraries, but containers share a single kernel โ the same fundamental conflict Docker originally set out to solve.
Since 2023, Docker supports CDI (Container Device Interface). It bind-mounts GPU device files and dedicated dynamic libraries at container start. Portability is guaranteed within the same GPU vendor, but running Nvidia GPU apps on Apple M-series is still far off.
How It Works
Write Dockerfile โ docker build creates layered filesystem image
docker run invoked โ containerd dynamically configures namespaces (mount, network, PID, user)
cgroups limit CPU/memory resources, volume mounts attach persistent storage
macOS/Windows: HyperKit (library VMM) + LinuxKit (minimal Linux) runs Linux kernel inside the app
vpnkit (SLIRP) translates Linux networking to host OS socket calls โ solves firewall issues
On container exit, Linux kernel garbage collects it like a regular process
Pros
- ✓ Extremely lightweight vs VMs โ most start in under 1 second
- ✓ Layered images enable efficient deduplication and build caching
- ✓ OCI standard allows running on various runtimes without vendor lock-in
- ✓ Linux containers "just work" on macOS/Windows
Cons
- ✗ GPU workloads lack cross-vendor portability
- ✗ On macOS/Windows, an internal VM adds overhead compared to native Linux
- ✗ Image sizes growing exponentially in the AI era โ torch alone is several GB
- ✗ Namespace isolation is not VM-level security isolation