We need a new document markup language — here‘s why

The problem with today‘s markup languages

As a full-stack developer who has written a lot of technical documentation over the years, I have become frustrated with the shortcomings of existing document markup languages. Whether it‘s HTML, Markdown, LaTeX or Docbook, each option either sacrifices ease of use or limits our ability to create complex documents.

On one end of the spectrum you have languages like HTML and Docbook that are extremely verbose and cumbersome to write by hand. Creating even a simple document involves wrapping everything in tags, which quickly becomes a strain on the eyes and the fingers. Here‘s the HTML required just to display an image with a border and center-align it:

<div style="text-align: center">
    <img src="image.png" style="border:1px solid black;">
</div>

Not exactly natural or intuitive. While HTML is immensely powerful, hand-coding it is simply not a pleasant experience, even with the help of modern editors.

On the other end are lightweight markup languages like Markdown. While Markdown is much more natural to type, it lacks many features needed for professional documents like automatic table of contents, cross references, captions, and more. Fundamentally, Markdown was designed for simple webpages and not complex technical docs.

So we‘re left with an unfortunate tradeoff – either sacrifice writability or sacrifice capability. Ideally a markup language should be both easy to write and easy to extend for complex use cases. But such an option doesn‘t really exist today.

And this isn‘t just a theoretical issue. The amount of technical writing in the world is staggering and only growing:

  • There are over 21 million developers globally as of 2020 (source) – a number that has doubled in the past decade
  • The global technical writing market is $5.2 billion and growing at 12% per year (source)
  • IBM found that on average developers spend over 50% of their time reading and writing documentation (source)

Clearly, making the experience of writing technical documentation faster, easier, and more flexible would have a massive impact on overall developer productivity. And yet the tools at our disposal haven‘t fundamentally changed in over a decade. It‘s time for an overhaul.

The specific pain points

To understand why a new approach is desperately needed, let‘s look at some specific examples of the pain points with existing markup languages.

One common annoyance is how different markup is needed to style words in italics vs bold. In HTML it‘s:

This is <i>italics</i> and this is <b>bold</b>.

While in Markdown it‘s:

This is *italics* and this is **bold**.

This may seem trivial, but it creates cognitive overhead to have to mentally map different symbols to what are fundamentally similar styling operations.

Another frequent frustration is needing extra markup just to italicize part of a word. For example, here‘s what it takes in Markdown:

un*believ*able

And in reStructuredText:

un\ *believ*\ able

Again, this is a distraction that takes us out of our natural writing flow. And it‘s the kind of thing I have to look up nearly every time because it‘s so unintuitive.

Things get even hairier when we try to combine styles, like bold and italic. This isn‘t even supported natively in reStructuredText! We‘re forced to fall back to raw HTML:

**<i>really important</i>**

These are just a few simple examples. The list goes on and on, from awkward handling of nested inline tags to indentation quirks to escaping rules. Death by a thousand paper cuts.

I would bet that nearly every developer who has written a substantial amount of technical documentation has run into similar frustrations. Here are a few choice quotes from the web:

"Almost every tech writing syntax I‘ve used gets really cumbersome for anything beyond the trivial." (source)

"Markdown is fantastic for writing basic HTML but if you want to leverage any additional features like automatic figure numbering, cross-references, hyperlinking, etc, it gets ugly fast." (source)

"Documentation can be incredibly time consuming… It‘s easy to spend more time fighting with your documentation tools than it is to actually write the docs." (source)

These sentiments perfectly capture the status quo – our markup languages are simultaneously too basic for non-trivial use cases yet too complex and inconsistent for intuitive writing. Something‘s gotta give.

Now one could argue that these are just minor syntactic quibbles. What really matters is the expressiveness of the language, right? Well, even there the options disappoint when you get into advanced use cases.

Consider this scenario: You‘re writing a complex technical document with multiple parts, chapters, and subchapters. After writing a few chapters you decide to insert a new parent chapter to better organize the flow. Logically, this should be as simple as adding one line for the new parent.

But in every language I‘ve used, it also requires manually changing the document hierarchy for every single child chapter! For example, bumping subchapters from level 2 to level 3 headings. That quickly becomes tedious and error prone for larger documents, breaking the flow of editing and refactoring.

Curiously, this is NOT an issue in more programmery formats like JSON and XML, since the nesting level is independent of the tag name. It‘s a self-imposed limitation in the name of a simpler syntax but in practice it seriously limits flexibility.

So these are just a few specific examples but they illustrate the fundamental issues faced by technical writers. On one side, the languages that are powerful enough are too cumbersome for humans to read and write. While on the other side, the languages that are simple to write are too limited for real-world technical docs.

In my many years writing documentation, I‘ve estimated that these little idiosyncrasies and limitations can slow me down as much as 20-30% compared to writing in a more natural, expressive way. Extrapolate that across all the developers in the world and it‘s a staggering amount of lost productivity.

A new approach – PML

Having struggled with these issues for years, I finally decided to do something about it. I created a new markup language, PML (Practical Markup Language), that aims to hit the sweet spot between simplicity and power.

The guiding principles behind PML are:

  1. Excellent readability, approaching natural text
  2. Simple, consistent syntax with minimal cognitive load
  3. Extensible for advanced technical writing needs
  4. Ease of implementation in parsers and tooling

To achieve this, PML borrows some of the best ideas from existing languages while adding some novel concepts of its own. Let‘s look at how it handles the pain points we discussed earlier.

First, inline text styling becomes a breeze in PML:

This is {i italics}, this is {b bold}, and this is {i bol{b d}}.

The {tag content} syntax is natural and consistent for all types of styling. No more mental mapping or escaping required. Any style can be nested inside another without limits.

For complex document structures, PML ensures the markup itself never changes, only the relative nesting of sections:

{section Introduction}
...
{section Background}
    {section Prior Work}
    ...
    {section Key Challenges}
    ... 
{section Methodology}
...

Inserting a new section is as simple as adding a line in the proper nesting position. The parser takes care of determining the actual section number, table of contents, etc. It‘s a small change but a massive quality of life improvement when wrangling larger documents.

PML also has powerful features that go beyond just better syntax. Things like:

  • Built-in figure numbering, captioning, and cross-referencing, e.g. {figure src="diagram.png" caption="System overview showing main components."}
  • Extensible code block support with automatic syntax highlighting, e.g.
{code:python}
def hello():
    print("Hello world!")
  • Rich table support with cell merging, header/footer rows, and text alignment
  • First class API documentation generation, e.g.
{func get_user}
Retrieves a user by ID.
{param id int} ID of user to retrieve
{return User if exists or None}
{except UserNotFoundError}
  • Highly customizable output via pluggable renderers, e.g. HTML, PDF, ePub from same source

These are just a few examples. The key thing with PML is that all of these features are implemented in an orthogonal, composable way using a common {tag} syntax. So you get tremendous power and flexibility while still retaining the readability and learnability that attracted people to languages like Markdown in the first place.

It‘s a radically different approach compared to traditional markup languages that rely on bespoke syntax for each feature. By lifting those features into a unified, extensible system, PML can grow to handle advanced use cases without losing coherence or simplicity.

But PML isn‘t just designed to be writer-friendly. It‘s also meant to be easy to parse and build tooling around. A few key aspects of its technical design:

  • Strict, unambiguous grammar that can be expressed in ABNF
  • Abstract syntax tree (AST) is regular and fully typed
  • Streaming-friendly to allow parsing of large files with constant memory
  • Whitespace-insensitive syntax that preserves meaning over formatting
  • Unicode safe with well-defined encoding behavior

These properties make it straightforward to write fast, robust parsers and build higher level tools like syntax highlighters, formatters, linters, and more. So not only does PML help writers be more productive, it helps the whole documentation ecosystem level up.

The vision going forward

My hope with PML is not to replace existing general purpose markup languages overnight. They each have their strengths and their place.

Rather, the goal is to provide a powerful yet approachable alternative for the sizable population of technical writers who are dissatisfied with their current options. People who want something as writable as Markdown but as flexible as Docbook.

Based on my experience and conversations with other developers, I believe this is a vastly underserved market. Documentation is only becoming more important as the software world grows in complexity. Yet the tools at our disposal have not evolved to keep pace.

So far, PML is just one humble developer‘s attempt to modernize technical writing. But already it has made my own doc writing faster, easier, and more enjoyable. If it can do the same for others, the future is bright.

Of course, PML is still a young language and doesn‘t claim to have all the answers yet. To truly fulfill its potential will require a community effort. Here are some of the key areas I see for future growth:

  • Formalized spec and compliance tests to ensure consistency across implementations
  • Diverse corpus of real-world PML documents to validate design decisions
  • Partnerships with major documentation platforms for first-class PML support
  • Integrations with popular developer tools like code editors, static site generators, etc.
  • Growing ecosystem of extensions, themes, and tooling to support a wide variety of use cases
  • Non-profit organization to oversee language evolution and steward community efforts

But the most important ingredient is the active involvement of technical writers themselves. So if you find yourself nodding along with the frustrations outlined in this post, I encourage you to take a look at PML. Try converting a few existing documents and see how it feels. Share your experiences, your successes, your stumbling blocks.

And if you find it makes your life a little easier, spread the word! The more people that get involved, the faster we can evolve PML into a truly powerful and productive tool. Together, I believe we can build a next generation markup language that is a joy to use while still powerful enough for the most advanced technical writing.

It won‘t be easy or quick. But few worthwhile things are. This is a cause that directly benefits all of us in the trenches doing the hard work of documenting the world‘s knowledge. In that way, it‘s a cause that can‘t afford to fail. I hope you‘ll join me in the effort to make PML a reality. Let‘s do this!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *