Data Structures 101: Linked Lists

As a full-stack developer, choosing the right data structure for the job is crucial for writing efficient and scalable code. One fundamental data structure every programmer should have in their toolkit is the linked list.

In this deep dive, we‘ll explore what linked lists are, their advantages and tradeoffs, how to implement them in code, and common scenarios where they can be leveraged effectively. By the end, you‘ll gain a solid grasp on linked lists and how to wield them in your own projects.

What is a linked list?

A linked list is a linear data structure that consists of a sequence of nodes. Each node contains both a value and a reference (or link) to the next node in the sequence.

Singly linked list diagram

Unlike an array which stores elements in contiguous memory locations, a linked list‘s nodes can be scattered throughout memory. The nodes are linked together by storing the memory address of the next node as part of each node‘s data.

This means that linked lists are a dynamic data structure – they can grow and shrink as needed and do not require a fixed amount of memory to be allocated upfront like arrays do. This flexibility is one of the key advantages of linked lists.

Advantages of linked lists

The dynamic sizing of linked lists lends itself well to several scenarios:

  1. You don‘t know the size of the data ahead of time
  2. You need constant-time insertions/deletions from the list (such as in stacks or queues)
  3. You don‘t need random access to elements
  4. You want to be able to rearrange elements in the list efficiently

Compared to arrays, the main benefits of linked lists are:

  • Efficient insertion and deletion of elements. Elements can be added or removed from a linked list without having to shift around other elements, as is necessary with arrays. Insertions and deletions can be done in constant O(1) time (assuming the location of the element to delete is known).

  • No need to pre-allocate memory. With arrays, you either have to know the size in advance or allocate more memory than you need upfront in case the array grows. Linked lists can grow dynamically without wasting memory.

Of course, linked lists also have some notable downsides:

  • Slow traversal and search. To access an element in the middle of the list, you have to traverse the list from the beginning node by node. There‘s no way to do efficient random access like you can with array indices. Traversing a linked list takes linear O(n) time.

  • Extra memory usage. Each linked list node has to store an additional reference to the next node, which takes up more space than just the element itself. With lots of small nodes, the overhead of these next pointers can add up to a nontrivial amount of memory.

  • Not cache-friendly. Arrays are very cache-friendly since elements are contiguous in memory. Linked lists don‘t get the advantage of locality of reference since nodes can be dispersed in memory.

So in general, linked lists make the most sense in cases where you‘ll primarily be adding or removing elements and don‘t need fast random access or searches. Let‘s look at how to actually implement linked lists in code.

Types of linked lists

The three main types of linked lists are:

  1. Singly linked lists – each node stores a single reference to the next node
  2. Doubly linked lists – each node stores references to both the next and previous nodes, allowing traversal in both directions
  3. Circular linked lists – the last node stores a reference back to the first node, forming a loop

Types of linked lists

We‘ll focus on singly linked lists for our code examples, but the concepts readily extend to the other types. A doubly linked list, for instance, just needs an extra ‘previous‘ pointer in each node.

Implementing linked lists

To build a linked list, we need to define two structures:

  1. A Node class will represent each node in the list. It will store the node‘s value and a pointer to the next node.
  2. A LinkedList class will hold a reference to the head node and contain methods to manipulate the list, such as adding and removing nodes.

Here‘s what a basic implementation looks like in Python:

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

The Node constructor takes in a value and initializes next to None, since we don‘t have the next node reference yet when first creating a node.

The LinkedList constructor just initializes head to None to represent an empty list.

Now let‘s add some key methods to make our linked list actually useful.

Insertion methods

We‘ll implement three insertion methods:

  1. prepend(value) – insert a new node at the beginning (head) of the list
  2. append(value) – insert a new node at the end (tail) of the list
  3. insertAfter(prev_node, value) – insert a new node after a given existing node in the list
def prepend(self, val):
    new_node = Node(val)
    new_node.next = self.head
    self.head = new_node

def append(self, val):
    new_node = Node(val)

    if self.head is None:
        self.head = new_node
        return

    cur = self.head
    while cur.next:
        cur = cur.next
    cur.next = new_node

def insertAfter(self, prev_node, val):
    if prev_node is None:
        print("The given previous node must not be None")
        return

    new_node = Node(val)
    new_node.next = prev_node.next
    prev_node.next = new_node

To prepend, we create a new node, have it point to the current head, and update head to be the new node. This takes O(1) time since we always have direct access to the head node.

To append, we create a new node and then:

  • If the list is empty, simply make the new node the head and return
  • Else, we traverse to the end of the list and tack on the new node there

In the worst case (if the list is not empty), appending takes O(n) time since we have to walk all the way from head to the tail node. But if we have a direct reference to the tail, appending can be done in O(1) by just updating tail.next.

Inserting after a given node takes O(1) time since we don‘t need to traverse the list at all. We just tweak a couple pointers and we‘re done, assuming we‘re given a reference to the node to insert after.

Deletion methods

Removing nodes is also a matter of carefully manipulating node references. We‘ll implement two methods:

  1. removeFirst() – remove the first node (head)
  2. remove(value) – remove the first occurrence of a node with the given value
def removeFirst(self):
    if self.head is None:
        return None

    temp = self.head
    self.head = self.head.next
    temp.next = None
    return temp.val

def remove(self, val):
    if self.head is None:
        return

    if self.head.val == val:
        self.head = self.head.next
        return

    cur = self.head
    while cur.next:
        if cur.next.val == val:
            cur.next = cur.next.next
            break
        cur = cur.next

Removing the head node is straightforward – we just make head point to head.next and sever the old head‘s next pointer for good measure before returning its value. This is an O(1) operation.

To remove a node with a certain value, we first check if the head contains the value, in which case we just update head and return. Else, we traverse the list looking for the value, keeping track of the current and next nodes. When we find the node to remove, we simply "skip over" it by pointing the previous node to the one after the removed node. Again, if the node to remove is not the head, this requires traversing the list, so it takes linear time in the worst case.

Note that we only remove the first occurrence of the given value. If there are duplicates, those after the first occurrence will remain.

Search and traversal

Searching for a node with a given value requires walking the list from head to tail in sequential order. There‘s no more efficient way to search since we can‘t do random access:

def search(self, val):
    cur = self.head
    while cur:
        if cur.val == val:
            return cur
        cur = cur.next

    return None

To print out the values in the list, we can similarly traverse from head to tail:

def print_list(self):
    cur = self.head
    while cur:
        print(cur.val, end=‘->‘)
        cur = cur.next
    print(‘None‘)

Both search and traversal are O(n) operations in the worst case when the sought element is at the tail or not present at all.

Linked lists vs arrays

We‘ve touched on some of the key differences between linked lists and arrays. Let‘s summarize them here:

Operation Linked List Array
Access i-th element O(n) O(1)
Insert/delete at beginning O(1) O(n)
Insert/delete at end O(n) (O(1) if tail ref) O(1) (if space), O(n) (if resize)
Insert/delete in middle Search time + O(1) O(n)
Wasted space O(n) (for refs) 0 (if size known)
O(n) (if allocated too much)

So in general, prefer a linked list over an array when:

  • You need to frequently add/remove from the front of the list
  • You don‘t know how many elements you‘ll need to store
  • You don‘t need indexing or random access

And prefer an array when:

  • You need indexed/random access to elements
  • You know the number of elements in advance
  • You need to iterate over the elements frequently

Applications of linked lists

Linked lists serve as the backbone for many other data structures and have a variety of uses:

  • Implementing stacks, queues, and hash tables
  • Creating circular lists (round-robin)
  • Representing sparse matrices and polynomials
  • Performing arithmetic operations on long integers
  • Maintaining a sorted list of elements
  • Undo functionality (history stack)
  • Adjacency lists for graphs
  • Separate chaining for collision resolution

Generally, linked lists are a good choice when insertion/deletion needs to be efficient and random access isn‘t required.

Other variations of linked lists

In addition to the singly/doubly/circular variations, some other notable linked list types include:

  • Skip lists – adds layers of express lanes to skip over nodes and enable fast search
  • Unrolled linked lists – stores multiple elements per node to improve cache performance
  • XOR linked lists – uses bitwise XOR of node addresses to save space

Conclusion

We‘ve taken a comprehensive look at linked lists, from their structure and basic operations to their strengths, weaknesses, and common use cases.

To recap, linked lists are useful when you need a dynamically-resizable structure and fast insertion/deletion, don‘t need random access to elements, and have enough extra space to store the node references. Compared to arrays, they‘re more flexible but less efficient for access.

I encourage you to try implementing your own linked list variations and using them to solve problems. With practice, you‘ll gain an intuition for when linked lists are the optimal structure and how to manipulate them effectively.

Similar Posts