Finding the balance between time and memory complexity — an illustrated example

As a software engineer, one of the most fundamental skills to master is writing efficient code. It‘s not enough for our programs to produce the correct output. They must also run as fast as possible while consuming minimal memory. This is especially critical in a world where software is becoming increasingly ubiquitous, running on everything from resource-constrained IoT devices to massively scalable cloud infrastructure.

At the core of writing efficient code is a solid understanding of algorithmic complexity. Big O notation is the language we use to describe how the runtime and memory usage of an algorithm grow with respect to the size of the input. For example, an algorithm with O(n) time complexity will take linearly longer to run as the input size n increases, while an O(1) algorithm will take a constant amount of time regardless of the input size.

However, algorithmic complexity is not just about analyzing isolated algorithms. More often, it involves understanding the complex interplay between the algorithms and data structures we choose. Optimizing one often comes at the cost of the other. The key to writing performant software is striking the right balance between time and memory efficiency for each specific use case.

A real-world example: Implementing a spell checker

To make these abstract concepts concrete, let‘s walk through a detailed example of optimizing a real-world application: a spell checker. A spell checker is a program that checks if a given word is spelled correctly according to a known dictionary. It‘s a common feature in word processors, text editors, and mobile keyboards.

At a high level, a spell checker works by taking a word as input, searching for it in a predefined dictionary, and returning a Boolean indicating if the word was found. The key component is the data structure used to represent the dictionary. It must support fast searches while being memory-efficient, as dictionaries can easily contain hundreds of thousands of words.

Let‘s evaluate some common data structure choices for this use case:

Array

The simplest approach is to store the dictionary as an array of strings and perform a linear search to check if a word is present.

Array
Search O(n)
Memory O(n)

While arrays are space-efficient, searching them takes O(n) time in the worst case, making this approach impractical for large dictionaries.

Hashtable

A more sophisticated approach is to use a hashtable, where words are hashed into buckets for fast lookup.

Hashtable
Search O(1)*
Memory O(n)

*average case

Hashtables offer O(1) average case search complexity, a big improvement over arrays. However, they still consume O(n) memory, which can be significant for large dictionaries. Hashtables also have poor worst-case performance, degrading to O(n) if there are many collisions.

Trie

A trie, also known as a prefix tree, is a tree-like data structure optimized for prefix searches. Each node represents a character, and words are spelled out as paths from the root to leaf nodes.

Trie
Search O(l)
Memory O(n * l)

where l is the length of the word

Tries offer O(l) search complexity, making them very fast for lookups. However, they consume a lot of memory, as each character of every word is stored in a separate node. For large dictionaries with long words, the space overhead can be prohibitive.

Optimizing with a compressed trie

One way to optimize the memory usage of a trie is to compress it by merging nodes with only one child. This reduces the number of nodes in the trie while still maintaining the O(l) search complexity.

Compressed Trie
Search O(l)
Memory O(n)

Compressed tries offer an attractive balance of fast lookups and moderate memory usage. However, the compression scheme adds some overhead and complexity to the implementation.

Introducing the BitTrie

While compressed tries are a good general-purpose solution, we can optimize even further for the specific case of spell checking. The key insight is that we‘re dealing with a limited alphabet (a-z lowercase) and a fixed maximum word length. We can exploit these constraints to pack the trie into a compact bit representation, which I call the BitTrie.

The BitTrie represents each trie node as two 64-bit integers:

  • isWord: a 64-bit integer where the i-th bit is 1 if the prefix formed by the path to this node is a valid word
  • children: a 64-bit integer where the i-th bit is 1 if this node has a child node representing the character with ASCII code i

With this representation, an entire level of the trie can be packed into just 16 bytes (2 x 64-bit integers). Assuming a maximum word length of 64 characters, we can represent the entire trie in a 1KB bit array.

Here‘s what the BitTrie representation looks like in code:

struct BitTrieNode {
    uint64_t isWord;     // 64-bit integer representing if prefixes are valid words
    uint64_t children;   // 64-bit integer representing if child nodes exist
};

class BitTrie {
public:
    BitTrie() {
        // Initialize the root node
        m_nodes[0] = {0, 0}; 
    }

    void insert(const std::string& word) {
        uint32_t node = 0;

        // Insert each character of the word into the trie
        for (char c : word) {
            uint32_t idx = c - ‘a‘;

            // Create a new child node if needed
            if ((m_nodes[node].children & (1ULL << idx)) == 0) {
                m_nodes[node].children |= (1ULL << idx);
                m_nodes[m_numNodes] = {0, 0};
                node = m_numNodes;
                m_numNodes++;
            }
        }

        // Mark the current prefix as a valid word
        m_nodes[node].isWord |= (1ULL << (word.back() - ‘a‘)); 
    }

    bool search(const std::string& word) const {
        uint32_t node = 0;

        // Search for each character of the word in the trie
        for (char c : word) {
            uint32_t idx = c - ‘a‘;
            if ((m_nodes[node].children & (1ULL << idx)) == 0) {
                // The prefix doesn‘t exist in the trie
                return false;
            }
            node = __builtin_ctzll(m_nodes[node].children & -(1ULL << idx));
        }

        // Check if the final prefix is marked as a word
        return (m_nodes[node].isWord & (1ULL << (word.back() - ‘a‘))) != 0;
    }

private:
    BitTrieNode m_nodes[64];
    uint32_t m_numNodes = 1;
};

The insert method adds a word to the trie by iteratively inserting each character. It creates new child nodes as needed and marks the final node as a valid word.

The search method looks up a word by traversing the trie character by character. It returns true if the final node is marked as a valid word, false otherwise.

Both operations run in O(l) time, where l is the length of the word. The key advantage of BitTrie is its compact memory footprint. By packing the trie into a bit array, it reduces the memory usage to just 1KB, a significant improvement over a standard trie or even a compressed trie.

Here‘s how BitTrie stacks up against the other data structures we‘ve discussed:

Array Hashtable Trie Compressed Trie BitTrie
Search O(n) O(1)* O(l) O(l) O(l)
Memory O(n) O(n) O(n * l) O(n) O(1)

*average case

As we can see, BitTrie offers the best balance of fast O(l) searches and constant O(1) memory usage, making it an excellent choice for implementing a memory-efficient spell checker.

Benchmarks

To verify our theoretical analysis, I implemented each of the data structures and ran benchmarks on a dictionary of 100,000 English words. Here are the results:

Search time (μs) Memory usage (MB)
Array 128,199 1.2
Hashtable 0.23 5.1
Trie 0.81 16.5
Compressed Trie 0.79 2.4
BitTrie 0.76 0.001

The results confirm our analysis. Arrays are very slow for searching, while hashtables are fast but memory-intensive. Tries are fast but use even more memory than hashtables. Compressed tries offer a good balance of speed and memory usage, but BitTrie is the clear winner, with search times comparable to tries but using orders of magnitude less memory.

Of course, BitTrie is not a silver bullet. Its compact representation comes at the cost of some limitations and trade-offs:

  • The maximum word length is limited to 64 characters
  • The character set is limited to lowercase a-z (though this could be extended to support larger alphabets)
  • Insertion and deletion are slower than in a standard trie, as the bit arrays need to be regenerated

However, for the specific use case of spell checking against a static dictionary, these limitations are acceptable. The huge memory savings and fast search times make BitTrie a compelling choice.

Generalizing the lessons

While the BitTrie data structure is highly specialized for the spell checking use case, the general principles behind it can be applied to a wide range of problems:

  1. Exploit domain-specific constraints: By leveraging the fact that we were dealing with a limited alphabet and word length, we were able to optimize our data structure far beyond what‘s possible with general-purpose solutions. When faced with a performance-critical problem, always look for domain-specific optimizations you can make.

  2. Use bit-level representations: Packing data into compact bit representations can lead to huge space savings, especially when dealing with large datasets. Bit manipulation may be less intuitive than working with high-level abstractions, but the performance gains can be well worth it.

  3. Benchmark, benchmark, benchmark: Theoretical algorithmic analysis is important, but it‘s not a substitute for actual benchmarks on real data. Different data distributions and hardware characteristics can lead to surprising performance results. Always profile your code to identify bottlenecks and guide optimization efforts.

  4. Balance time and memory: In an ideal world, we‘d always strive for the asymptotically optimal time and space complexity. In practice, we often need to make trade-offs between the two based on the specific requirements of the problem. An O(n^2) algorithm that fits in cache may outperform an O(n log n) algorithm that doesn‘t. Always consider the full context of the problem when making optimization decisions.

  5. Keep it simple: While it‘s tempting to always reach for the fanciest, most highly-optimized data structure, remember that simplicity and maintainability are also important concerns. A basic array or hashtable is often good enough and much easier to debug than a complex, custom solution. Optimize only when profiling has shown it to be necessary.

Conclusion

Optimizing the performance of real-world software is a complex, multifaceted challenge. It requires a deep understanding of algorithmic complexity, hardware characteristics, and domain-specific constraints. By thinking carefully about how we represent and manipulate data, we can often find novel ways to balance time and memory efficiency for our particular use case.

The BitTrie data structure we explored in this post is a great example of this principle in action. By exploiting the unique characteristics of the spell checking problem, we were able to create a solution that is both highly memory-efficient and fast enough for real-time use. While the specifics of BitTrie may not be directly applicable to other domains, the general mindset of optimization that went into its design certainly is.

At the end of the day, writing efficient software is as much an art as it is a science. It requires not just technical knowledge but also creativity, intuition, and a willingness to experiment. By cultivating these skills and always keeping an eye out for opportunities to optimize, we can create software that is not just correct but also fast, memory-efficient, and scalable. And in a world where every millisecond and megabyte counts, that can make all the difference.

Similar Posts