Combine contents of file1.txt and file2.txt

In 1983, Rob Pike and Brian Kernighan published a seminal paper titled "Program Design in the Unix Environment" that helped codify the Unix philosophy of software design. Nearly 40 years later, the key ideas and examples outlined in the paper are still highly relevant to how we build software today. Let‘s dive in and see what insights we can glean.

The success of Unix

The paper opens by examining the reasons behind Unix‘s success and widespread adoption:

  • Portability: Since the Unix kernel and userland programs were written in C, they could be easily ported to new hardware without having to rewrite everything in platform-specific assembly.
  • Uniformity across systems: The same Unix OS ran on many different hardware platforms. This meant users could leverage their existing knowledge when moving to new systems.
  • Small, hackable code base: The entire system, being written in C, was relatively small and easy for developers to modify and extend.
  • General purpose tools: Unix introduced a new paradigm based on small, general purpose tools that could be combined to solve larger problems. This was in contrast to monolithic programs that tried to provide every possible feature in one big blob of code.

This last point is really the core of the Unix philosophy – write programs that do one thing and do it well. By composing a suite of small, focused tools, users can mix and match to accomplish complex tasks. As we‘ll see, much of the paper focuses on examples of how this philosophy is applied (or in some cases, ignored).

Doing one thing well with cat

One of the most common Unix tools is cat. In its most basic usage, cat simply reads the contents of one or more files and prints them to standard output. By default, it concatenates the input files together, hence the name.

Combined with pipes and redirection, cat becomes an incredibly versatile tool:

# Print contents of file.txt
cat file.txt

cat file1.txt file2.txt

cat -n file.txt

cat -v file.txt

As the paper notes, later versions of cat grew additional options, like numbering output lines or showing non-printable characters. The authors argue that instead of piling these features into cat, it would have been better to either use existing tools (like nl for numbering lines) or create new, separate programs for the additional functionality.

By overloading cat with options outside its core focus of simply concatenating input, it becomes more difficult to use cat in the middle of a pipeline. If some of the options transform the input data, downstream programs may not be able to handle it. The lesson is that programs should stick to doing one thing and leave unrelated functionality to other tools.

The right place for features

Another example the paper discusses is the difference between the ls and lsc commands for listing files in a directory.

ls is the classic tool for listing directory contents. It prints each file on its own line, one after the other.

lsc, on the other hand, adjusted its output based on where it was being sent. If printing to a terminal, it showed the list of files in columns to better fit the screen. However, if output was redirected to a file or pipe, it printed one file per line just like ls.

At first this may seem clever, but it violates the rule of least surprise. Programs should be predictable – the same input should always produce the same output, regardless of where that output is going. Imagine trying to debug a script that uses lsc and the output looks totally different when you redirect it to a file!

The columnation feature is clearly useful, but it doesn‘t belong in a program like lsc. A better solution is to have a separate dedicated tool for formatting output into columns. That way any program can take advantage of it, not just lsc.

# List files in the current directory
ls

ls | column

ls -l | sort -n -k5 | column

By separating concerns in this way, each program can focus on its main purpose. Tools can be mixed and matched as needed without worrying about unpredictable side effects.

Composability through piping

A huge strength of the Unix approach is that programs can be chained together using pipes. The output of one program becomes the input to the next.

For example, let‘s say we want to find the 5 largest files in a directory. We can combine common tools like find, sort, and head to get our answer:

find . -type f -exec du -a {} + | sort -n -r | head -n 5

Here‘s what‘s happening:

  1. find recursively lists all files in the current directory and subdirectories
  2. The size of each file is calculated via du (disk usage)
  3. The results are piped to sort which orders the list of files from largest to smallest
  4. Finally, head takes the first 5 lines of input and prints them

Notice how this is infinitely customizable. We can swap head for tail to get the smallest files instead. Or change the sort options to order by file name rather than size. The possibilities are endless!

This is much more powerful than having a monolithic "find_largest_files" program. Such a program would inevitably lack options to cover every use case. By using small, composable commands, users can build up the exact functionality they need.

When to add options

Now, this doesn‘t mean programs should never have options. The paper makes it clear that some features make sense as options rather than wholly separate tools.

The general guidance is that options are appropriate when they directly relate to the program‘s core function. For example, grep‘s purpose is to search for patterns in text. Adding options to control the type of pattern matching (basic vs extended regular expressions) or handling of case sensitivity make sense, since they extend grep‘s core functionality in a relevant way.

In contrast, if we wanted a version of grep that searched only C++ source code files, that would be better suited as a separate program, perhaps called cppgrep. The file type is a separate concern from pattern matching itself.

Lessons learned

So what can we take away from all this?

First and foremost, it‘s clear that constraining programs to a single purpose is a powerful technique. It enables programs to be predictable, composable, and much easier to debug and maintain.

A common objection is that splitting up functionality results in less efficient code. After all, if we put column formatting inside ls, we can optimize it for that specific use case.

However, as the paper rightly points out, this is a false economy. 90% of the time, users invoke tools like ls without any formatting options. Keeping the default behavior simple ensures it is as fast as possible for the most common case. Additional features are only paid for when they are used.

More broadly, the Unix philosophy recognizes that software is as much about people as it is about code. Small, focused programs are easier for developers to understand, change, and debug. Composability empowers users to combine programs in ways the original authors may never have imagined.

Perhaps most importantly, constraints breed creativity. Working within the limits of "do one thing well" challenges us to craft the most elegant, minimal solution to a problem. It‘s a different mindset from the kitchen-sink approach of throwing every possible option into a program.

Ultimately, the examples and principles in this paper are not about Unix specifics so much as they are about managing complexity. The core ideas are just as relevant today as they were in 1983. Keeping programs small, focused, and composable is a time-tested way to build robust, adaptable software. As the paper concludes: "The right solution in the right place is always more effective than haphazard hacking."

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *