Demystifying Containers 101: A Deep Dive Into Container Technology for Beginners

Containers have fundamentally changed how we develop, package, and deploy applications. But for developers new to the technology, containers can seem like a black box. What exactly are containers? How do they work under the hood? And what makes them so revolutionary?

In this deep dive, we‘ll unpack everything you need to know to truly understand containers, from the basics to advanced topics. Whether you‘re a container novice or already have some experience, this guide will help you level up your knowledge. Let‘s jump in!

The Birth of Containers

The idea of isolating processes and limiting their resource usage originated in the 1970s with the chroot system call in Unix. However, the modern container era began in earnest in 2008 when Linux Containers (LXC) leveraged cgroups and namespaces in the Linux kernel to provide a full OS-level virtualization environment.

But containers didn‘t really take off until Docker came along in 2013. Docker built on the foundation of LXC but focused on ease of use and a developer-friendly workflow. Docker introduced the concept of portable images and a registry model for sharing them. This improved the developer experience and made containerizing applications much more accessible.

Since then, containers have seen rapid adoption. According to a 2020 survey by the Cloud Native Computing Foundation, 92% of organizations are using containers in production, up from 84% in 2019 and 23% in 2016.

CNCF Container Adoption Survey

Virtual Machines vs Containers

To understand what makes containers so powerful, it‘s helpful to compare them to virtual machines (VMs). Both VMs and containers are ways to isolate an application and its dependencies. However, they differ in terms of how they achieve isolation and their overhead.

VMs virtualize at the hardware level, with a hypervisor abstracting the physical hardware of a host and enabling multiple VMs to run on top of it. Each VM runs a full operating system (OS) on top of the virtualized hardware. In contrast, containers virtualize at the OS level. All containers share the host OS kernel but are isolated from each other using cgroups and namespaces.

As a result, containers are much more lightweight than VMs. They have less overhead, start up faster, and use fewer resources. We can run many more containers than VMs on a given host.

The different approaches to virtualization are illustrated well in this diagram from Docker:

Containers vs VMs

How Containers Work

Under the hood, containers rely on several key features of the Linux kernel:

Control Groups (cgroups) – cgroups limit and account for resource usage (CPU, memory, disk I/O, network) by groups of processes. For example, you can use cgroups to limit a container‘s memory usage to 512 MB. Cgroups ensure a single container doesn‘t hog all the resources on a host.

Namespaces – Namespaces provide isolation between processes, limiting what a process can see and access. There are several types of namespaces:

  • Mount (mnt) namespaces isolate the filesystem mount points seen by a group of processes
  • Process ID (pid) namespaces isolate the process ID number space
  • Network (net) namespaces virtualize the network stack
  • Interprocess Communication (ipc) namespaces isolate interprocess communication resources like shared memory
  • User ID (user) namespaces isolate the user and group ID number spaces
  • UTS namespaces isolate the hostname and domain name seen by a container

Using these building blocks, container runtimes like Docker can provide isolation between containers by giving each container its own cgroup and namespaces. The container only sees its own processes, filesystem, network interfaces, etc. It‘s almost as if the container is running on its own little virtual machine even though it‘s sharing the host kernel.

Anatomy of a Docker Container

While Docker isn‘t the only container runtime, it‘s by far the most popular, so it‘s worth diving into how Docker works specifically. The main building blocks of Docker are images and containers.

A Docker image is a lightweight, standalone, executable package that includes everything an application needs to run – the code, runtime, system tools, libraries, and settings. Images are built in layers, with each layer adding to or modifying the layers below it. Layers are read-only, so once an image is built, it‘s immutable.

The underlying technology that makes layering possible is the union filesystem. This allows files and directories of separate filesystems to be transparently overlaid, forming a single coherent filesystem. When you modify an existing file, the file is copied out into the top read-write layer, and changes are made there, leaving the underlying file unchanged. This is known as copy-on-write.

Some benefits of this layered approach include:

  • Smaller image sizes since shared layers are only stored once
  • Faster image builds since you can cache layers and only rebuild layers that have changed
  • Faster container startup times since you only need to copy the thin read-write layer

Here‘s an example of what the layers of a Docker image based on Ubuntu might look like:

Layer Contents
Layer 5 (R/W) Application code
Layer 4 (RO) Application dependencies
Layer 3 (RO) Update Ubuntu packages
Layer 2 (RO) Add custom files
Layer 1 (RO) Base Ubuntu image

A running instance of an image is called a container. When you start a container, Docker creates a thin read-write layer on top of the image layers. Any changes made by the running application are stored in this thin layer. When the container is deleted, the thin read-write layer is also deleted, leaving the underlying image unchanged.

Multiple containers can be started from the same image, each with its own thin read-write layer. This makes containers very fast to start up since you‘re not actually copying the whole image, just creating a new write layer. You can think of images like a class in object-oriented programming, and containers as instances of that class.

Using Containers for Development

To make these concepts more concrete, let‘s walk through the typical workflow of containerizing an application for development.

First, you define the image in a Dockerfile. The Dockerfile specifies the base image to start from and a series of instructions to build the image, like copying in files, installing dependencies, and setting environment variables. Here‘s a simple example:

FROM node:14

WORKDIR /app

COPY package.json .
RUN npm install

COPY . .

EXPOSE 3000  

CMD ["npm", "start"]

This Dockerfile:

  1. Starts from the official Node.js 14 base image
  2. Sets the working directory to /app in the image
  3. Copies package.json into the image and runs npm install to install dependencies
  4. Copies the rest of the application code into the image
  5. Exposes port 3000
  6. Specifies the command to run when a container is started from the image

To build the image, you run:

docker build -t my-app .

This builds the image based on the Dockerfile in the current directory and tags it as "my-app".

To run a container from the image:

docker run -p 3000:3000 my-app  

This starts a container from the "my-app" image, mapping port 3000 in the container to port 3000 on the host. You should be able to access the application at http://localhost:3000.

Some other handy docker commands:

  • docker ps – list running containers
  • docker stop <container-id> – stop a running container
  • docker rm <container-id> – remove a stopped container
  • docker images – list local images

Microservices and Orchestration

While running a single container is a good starting point, production applications often consist of multiple containers deployed across a cluster of servers. A common pattern is to break an application into multiple microservices, each running in its own container. This enables teams to develop, deploy, and scale services independently.

However, managing a fleet of containers across a cluster introduces additional complexity, like scheduling containers onto nodes, service discovery, load balancing, ensuring high availability, and rolling out updates. Container orchestrators like Kubernetes help solve these challenges.

Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units (called pods) for easy management and discovery. Some key concepts in Kubernetes include:

Deployments – A deployment is a way to declaratively define a desired state for a set of pods and replica sets. Kubernetes will ensure the current state matches the desired state, making deployments self-healing.

Services – A Kubernetes service groups a set of pods together and provides a stable networking endpoint for them. This decouples the pods from consumers, allowing pods to be added or removed without affecting the overall application.

Ingresses – An ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster, providing a way to handle external traffic.

Other container orchestrators include Docker Swarm and Apache Mesos. While each have their own pros and cons, Kubernetes has emerged as the de facto standard, with wide adoption and a rich ecosystem of tools.

Future of Containers

As containers have become mainstream, the ecosystem has rapidly expanded to support additional use cases and make working with containers easier. Some notable emerging trends:

Serverless – Serverless computing aims to abstract away infrastructure management and enable developers to focus purely on code. Containers provide the underlying packaging mechanism for serverless functions. Platforms like AWS Fargate and Google Cloud Run enable you to run containers without managing servers or clusters.

Service Meshes – As the number of microservices grows, managing communication between them becomes complex. A service mesh is a dedicated infrastructure layer that handles inter-service communication, providing features like service discovery, load balancing, encryption, authentication and authorization, and observability. Popular service meshes like Istio and Linkerd are built on top of containers.

eBPF – eBPF (extended Berkeley Packet Filter) is a Linux kernel feature that allows running sandboxed programs in the kernel without changing kernel source code or loading modules. eBPF is enabling a new class of software that can tap into kernel events for observability, networking, and security use cases. Tools like Cilium and Falco use eBPF to provide container-aware networking and runtime security.

Forrester Research predicts that by 2022, more than 75% of global organizations will be running containerized applications in production, up from less than 30% today. As adoption grows, the ecosystem will continue to mature and make it even easier to build, deploy, and operate containerized applications at scale.

Conclusion

Containers offer a compelling way to consistently package and run applications across environments. By leveraging features of the OS like cgroups and namespaces, containers provide lightweight isolation between applications.

Getting started with containers is relatively easy thanks to developer-friendly tools like Docker. But to run containers in production, you need an orchestration layer like Kubernetes to deploy and manage apps at scale.

While we‘ve covered a lot, we‘ve only scratched the surface of what‘s possible with containers. The technology is rapidly evolving, with new projects emerging to tackle additional use cases like serverless, service meshes, and eBPF. Expect the container landscape to look quite different in the years ahead.

The important thing is to start learning the fundamentals now. Spin up containers on your local machine, break a monolith into containerized microservices, deploy a small cluster using a managed Kubernetes service. There‘s no substitute for hands-on experience.

To go deeper, check out the following resources:

Happy containerizing!

Similar Posts