Yes, I Coded a Semaphore and No, I Am Not an OS Developer

As a full-stack developer, you may think of semaphores as a low-level synchronization primitive used only by operating system developers. After all, modern high-level languages like Python already come with thread-safe data structures and easy-to-use locking mechanisms. Why would an application developer ever need to worry about something as esoteric as a semaphore?

It turns out, semaphores are a versatile concurrency tool with applications well beyond the operating system kernel. In fact, they‘re the secret sauce behind many of the high-level concurrent programming abstractions we use every day. By understanding how semaphores work and when to apply them, you can add a powerful tool to your concurrency toolbox and gain a deeper understanding of how tools like thread pools and work queues function under the hood.

A brief history of semaphores

The semaphore was invented by Dutch computer scientist Edsger W. Dijkstra in 1965 as a synchronization mechanism for coordinating access to shared resources between concurrent processes. He introduced the concept in a one-page paper titled "Cooperating Sequential Processes" while developing the operating system for the Electrologica X8 computer at the Eindhoven University of Technology.

Dijkstra observed that the usual method of process synchronization at the time, locking, suffered from several problems. Locks were cumbersome to use and error-prone, as they had to be acquired and released in a strict order to avoid deadlock. They also didn‘t provide a clear way for one process to signal an event to another process.

Semaphores provided an elegant solution. A semaphore is simply an integer variable coupled with two atomic operations, historically denoted P and V after the Dutch words "prolaag" (try to reduce) and "verhogen" (increment). The P operation waits for the semaphore to become positive and then decrements it, while the V operation increments the semaphore.

By initializing a semaphore to the number of available resources, concurrent processes could coordinate access to the resource pool simply by calling P before using a resource and V after releasing it back to the pool. The semaphore would keep track of how many resources were available, putting processes to sleep when the pool was exhausted and waking them back up when resources became free.

Dijkstra‘s semaphore concept quickly took hold in the operating systems community. Within a few years, semaphores had been implemented in several influential operating systems like Multics, Unix, and THE multiprogramming system. They became a standard synchronization primitive taught in every operating systems textbook.

Semaphores for the application developer

Over time, as concurrency became more important at the application level, semaphores began to see use outside the operating system kernel. In his influential 1996 paper "Implementing Condition Variables with Semaphores", Andrew Birrell showed how semaphores could be used to construct other synchronization mechanisms like monitors and condition variables. This opened the door to using semaphores as a general-purpose concurrency building block.

Today, most high-level languages provide some form of semaphore as part of their standard library. These may be called by different names such as "counting semaphores", "bounded semaphores", or simply "locks", but the core idea is the same. The semaphore tracks a pool of resources, allowing threads to reserve resources from the pool with an acquire/wait operation and return them to the pool with a release/signal operation.

So when might an application developer need to use a semaphore directly? Here are a few examples:

Limiting the number of simultaneous database connections
Throttling API requests to a external web service
Managing a pool of pre-allocated buffers or caches
Coordinating access to a shared hardware device on an embedded system

In each of these cases, semaphores provide a simple, efficient way to meter access to a pool of resources. By bounding how many threads can hold the resource simultaneously, semaphores prevent the system from being overwhelmed and ensure fair allocation of the constrained resource.

Implementing semaphores in Python

Python‘s threading module has included a Semaphore class since version 2.5. The Semaphore constructor takes an optional argument specifying the initial resource count, which defaults to 1. This makes it easy to use a semaphore for basic mutual exclusion:

from threading import Semaphore

# A shared resource requiring exclusive access
resource = []

# A lock protecting the shared resource 
lock = Semaphore(1)

# Accessing the shared resource
with lock:
    # Perform read/write operations on resource

The with statement ensures that the semaphore is always properly released, even if an exception is raised within the critical section. This is equivalent to the following more verbose code:

lock.acquire()
try:
    # Perform read/write operations on resource
finally:
    lock.release()

By setting the initial resource count to a value greater than 1, we can use a Semaphore to restrict concurrent access to a shared resource pool. Consider a web crawler that wants to limit how many simultaneous HTTP requests it makes to a target website in order to avoid overwhelming the site or triggering rate limits. We can implement this by creating a Semaphore with the maximum desired number of outstanding requests:

from threading import Semaphore
from urllib.request import urlopen

# Maximum number of outstanding requests
MAX_REQUESTS = 10

# Semaphore limiting outstanding requests
http_semaphore = Semaphore(MAX_REQUESTS)

def fetch_url(url):
    with http_semaphore:
        return urlopen(url).read()

Now if we create 100 threads that call fetch_url, at most 10 of them will be making HTTP requests at any given time. The other threads will block on the http_semaphore until a slot becomes free.

Under the hood, Python‘s Semaphore class uses a combination of a Condition object and an internal counter to track the resource count. The acquire method waits on the condition until the counter is positive, then decrements the counter. The release method increments the counter and notifies any threads waiting on the condition.

Here‘s a simplified implementation of the Semaphore class in Python:

from threading import Condition

class Semaphore:
    def __init__(self, value=1):
        self._count = value
        self._condition = Condition()

    def acquire(self):
        with self._condition:
            while self._count == 0:
                self._condition.wait()
            self._count -= 1

    def release(self):
        with self._condition:
            self._count += 1
            self._condition.notify()

The actual CPython implementation is considerably more complex, as it needs to handle edge cases like nested acquire/release operations, but this captures the core idea. The semaphore is just a counter protected by a mutex, with waiting and signaling provided by a condition variable.

Semaphores in the wild

To see a real-world example of semaphores in action, let‘s take a look at the source code for Python‘s multiprocessing.Pool class. Pool is a convenient way to parallelize work across a fixed number of worker processes. Under the hood, it uses a semaphore to limit how many processes are active at once.

The relevant code is in the _setup_queues method of the Pool class:

def _setup_queues(self):
    self._inqueue = SimpleQueue()
    self._outqueue = SimpleQueue()
    self._quick_put = self._inqueue._writer.send
    self._quick_get = self._outqueue._reader.recv

    # Buffer size is max number of active tasks plus queue size
    self._max_tasks_active_count = self._ctx.BoundedSemaphore(
        self._processes + self._max_tasks_per_child
    )

    # Instantiate workers as late as possible
    self._worker_handler = threading.Thread(
        target=Pool._handle_workers,
        args=(self._pool, self._inqueue, self._outqueue,
              self._initializer, self._initargs,
              self._maxtasksperchild, self._wrap_exception),
        name="Thread-PoolWorkerHandler",
    )
    self._worker_handler.daemon = True
    self._worker_handler._state = RUN
    self._worker_handler.start()

The key line is the creation of self._max_tasks_active_count, a BoundedSemaphore with an initial value equal to the maximum number of worker processes plus the task queue size. This semaphore is used to track how many tasks are currently being processed by worker processes.

Each time a new task is submitted to the pool, the _quick_put method is called to add the task to the input queue. This method also calls acquire on the semaphore to reserve a slot for the task. If all slots are currently in use, the put operation will block until a worker process finishes a task and releases the semaphore.

def _quick_put(self, task):
    self._max_tasks_active_count.acquire()
    self._put(task)

On the worker side, each time a worker process finishes processing a task, it calls release on the semaphore to free up a slot for a new task:

def worker(inqueue, outqueue, mme_lock, mme_semaphore, initializer, initargs,
           maxtasks, wrap_exception):
    try:
        job, i, func, args, kwds = task
        result = (True, func(*args, **kwds))
    except Exception as e:
        result = (False, wrap_exception(e))
    finally:
        outqueue.put((job, i, result))
        mme_semaphore.release()

The semaphore and the input queue work together to form a classic bounded buffer or producer-consumer system. The main process "produces" tasks by adding them to the queue, while the worker processes "consume" tasks by pulling them off the queue. The semaphore ensures that the number of active tasks never exceeds the maximum number of worker processes, preventing the system from being overwhelmed.

Conclusion

Semaphores are a classic concurrency primitive with applications that extend well beyond the operating system kernel. By understanding how they work and when to use them, full-stack developers can add a powerful tool to their concurrency toolbox.

We‘ve seen how Python‘s threading module provides a simple Semaphore class that can be used for everything from basic mutex locks to advanced throttling and resource management. Under the hood, this class is implemented using a condition variable and a counter, demonstrating how semaphores can be used as a building block for more complex synchronization mechanisms.

We‘ve also looked at a real-world example of semaphores in the Python multiprocessing module, where they‘re used to limit the number of active tasks in a process pool. This is just one of many uses of semaphores in the Python standard library – the Queue and connectionpool modules also rely on semaphores for throttling and resource management.

So the next time you find yourself reaching for a lock or a condition variable, ask yourself if a semaphore might be a better fit. By providing a flexible way to manage shared resource pools, semaphores can help you write more efficient, scalable concurrent code. While you may never need to implement your own kernel-level semaphore, understanding this classic concurrency primitive will make you a better developer across the full stack.

Yes, I Coded a Semaphore and No, I Am Not an OS Developer

A brief history of semaphores

Semaphores for the application developer

Implementing semaphores in Python

Semaphores in the wild

Conclusion

Related

How to Know When You‘ve Learned Everything You Can From a Programming Problem

The most difficult things about learning to code by yourself — and how to tackle them

Code review checklist: how to tackle issues with Java concurrency

An Introduction to Trees in Programming: The Oxygen of Efficient Coding

Linux Turns 25: Celebrating the Open Source OS That Changed Everything

From Coding Bootcamp to Google Developer Expert in 3 Years

A brief history of semaphores

Semaphores for the application developer

Implementing semaphores in Python

Semaphores in the wild

Conclusion

Related

Similar Posts