Stuck in Legacy Code Hell? Here‘s How to Dig Your Way Out

If you‘re a software developer, chances are you‘ve encountered legacy code at some point in your career. You know, that daunting, untested, undocumented codebase you inherit that everyone is afraid to touch. It‘s the stuff of nightmares for developers – a house of cards ready to topple over at the slightest change.

Sadly, legacy code is all too common in our industry. A 2020 survey of over 80,000 developers found that 66% work with legacy code bases. Even more concerning, 31% said more than half of their time is spent working with legacy code!

As unpleasant as it is, legacy code is a reality most of us have to deal with. It‘s easy to complain or daydream about a complete rewrite, but the pragmatic path forward is usually to incrementally improve the system. In this post, I‘ll share proven strategies to manage legacy code and modernize it safely, with a focus on getting it under test.

What Makes Legacy Code So Hellish

Let‘s start by defining "legacy code" and why it‘s so problematic. I like Michael Feathers‘ definition from his book Working Effectively with Legacy Code:

"Legacy code is simply code without tests."

Code without tests is very difficult and risky to change. If you make a change, you have no way apart from manual testing to know if you broke something. Legacy code is:

  • Rigid: Hard to change because you fear introducing bugs. New features are tough to add.
  • Fragile: Small changes can trigger unexpected breakages. Fixing one bug causes another.
  • Tightly coupled: Everything depends on everything else. No separation of concerns.
  • Slow to develop: Features take weeks instead of days. Countless hours wasted debugging.

These factors lead to development paralysis. But there is a way out! It starts with getting tests in place so you can refactor and improve the code with confidence.

Why Legacy Code Lacks Tests

To break the vicious cycle, we first need to understand why legacy code typically lacks tests. Often it‘s because automated testing was treated as optional in the past.

Historically, the attitude was "ship fast and break things!" Testing, if done at all, was tacked on at the end by a separate QA team. This led to a testing backlog and quality issues.

There are also properties of legacy code that make it harder to test, such as:

  • Hard-coded dependencies: Code relies on global state and hard-to-mock dependencies.
  • Complex setup: Excessive setup needed to test small parts of the system in isolation.
  • Long functions: Incohesive modules and god functions require end-to-end integration tests.
  • Side effects: Impure functions with hidden side effects are hard to verify.

Michael Feathers calls the tests you end up within this environment "hell tests". Hell tests are a nightmare to write, take forever to run, and often don‘t test the right things. The key to escaping hell is to be strategic about how you add tests to legacy code.

Techniques for Testing Legacy Code

Characterization Tests

Instead of trying to redesign the system upfront to enable unit tests, start by writing "characterization tests". These capture the current behavior of the legacy code as-is. They don‘t test correctness, just document what the code does today.

Characterization tests form a safety net for refactoring. You can change the code with more confidence if the tests verify the observable behavior hasn‘t changed. You can gradually evolve these into finer-grained unit tests as you improve the code design.

Break Dependencies

Legacy code often has hard-coded dependencies that are difficult to isolate in a test. To break these dependencies, you can use techniques like:

  • Parameterize Constructor: Pass dependencies as constructor arguments vs. hard-coding.
  • Extract Interface: Extract an interface for the dependency and pass in a test double.
  • Adapt Parameter: If a legacy method takes a concrete dependency, add an overload that takes an interface instead.

These refactorings make the code more flexible and testable. You can find many more dependency-breaking techniques in Michael Feathers‘ Working Effectively with Legacy Code.

Approval and Golden Master Tests

For code with complex output that‘s hard to verify with assertions, you can use techniques like approval tests or golden master tests:

  • Approval Tests: Save the output of the code under test to a file. Have a human "approve" the output the first time. Then compare the output to the approved file on each test run.
  • Golden Master: Similar to approval tests, but compare the full output of the legacy system to a "golden output" saved from a previous run. Useful for large, end-to-end flows.

These are good techniques to quickly capture and lock down legacy behavior. You can use them to identify unintended changes as you refactor.

Test Critical Paths First

Be pragmatic about what you try to test, especially at first. Focus your efforts on the most critical user flows and business logic.

Trying to achieve 100% coverage on a legacy system is unrealistic. Studies show that 80% of production failures come from 20% of the code. Make sure that critical 20% is well-tested before moving on to less important areas.

Refactor in Small, Safe Steps

Resist the temptation to rewrite a legacy system from scratch. Even with good tests, a full rewrite is very risky. Case studies show it‘s usually better to refactor legacy code incrementally with tests as a safety net:

  1. Pick a small area to refactor.
  2. Write characterization tests for it.
  3. Refactor the code in baby steps.
  4. Verify the tests still pass after each change.
  5. Commit the refactored code and tests.
  6. Repeat!

Over time, these small improvements compound to make the codebase more habitable and maintainable. Having fast, automated tests prevents quality from backsliding as you make changes.

Design New Code to Avoid Legacy Problems

As you pay down technical debt in legacy code, design new code to avoid repeating past mistakes:

  • Use TDD: Write tests first to get a fast feedback loop and document intended behavior.
  • Separate concerns: Keep modules focused and cohesive. Avoid tangled dependencies.
  • Depend on abstractions: Code against interfaces, not concrete classes. Enables loose coupling.
  • Pure functions: Prefer pure, deterministic functions over side effects. Easier to reason about and test.
  • Continuous refactoring: Don‘t let technical debt pile up. Refactor continuously as you add features.

Following these practices takes discipline, but keeps your codebase healthy and nimble over the long haul. Today‘s well-designed code prevents tomorrow‘s legacy nightmares.

Assessing Legacy Code Health with Metrics

To get a baseline of a legacy codebase‘s health and track improvements, you can use code metrics like:

  • Cyclomatic complexity: Measures number of linearly independent paths in code. High complexity means code that‘s harder to understand and test. Aim for CC under 10 per method.

  • Test coverage: Percentage of the codebase exercised by tests. Higher coverage means less risk of undetected bugs. Studies recommend 80% as a good minimum threshold.

  • Coupling: Degree to which modules depend on each other. High coupling means changes are harder to make without breakage. Aim to keep coupling below 0.5 on a scale of 0-1.

Tools like SonarQube can automatically analyze a codebase and surface these metrics. Track the metrics in your team‘s dashboard. Make improving them a priority alongside delivering features.

Strangling Legacy Systems Incrementally

For more ambitious legacy modernizations, the strangler pattern is a powerful approach. The idea is to incrementally replace a legacy system with a new one:

  1. Put a facade in front of the legacy system to intercept calls.
  2. Route a portion of requests to the new system.
  3. If successful, the facade delegates to the new code. If not, fall back to legacy.
  4. Repeat until the new system fully replaces the old one. Then decommission the legacy system.

Etsy used this pattern successfully to modernize their PHP monolith. Over a few years, they safely replaced it with a new architecture that could scale with their growth.

Examples of Refactoring Legacy Code

Let‘s walk through a concrete example of refactoring a piece of legacy code safely with tests. Consider this tangled, untested PHP function:

function calculateTotalPrice($products, $user) {
  $total = 0;
  foreach ($products as $product) {
    if ($user->isAdmin()) {
      $total += $product->price * 0.8;
    } elseif ($user->isVIP()) {
      $total += $product->price * 0.9; 
    } else {
      $total += $product->price;
    }
  }

  $shipping = 5.0;
  if ($total > 100) {
    $shipping = 10.0;  
  }
  $total += $shipping;

  return $total;
}

This function has a couple of responsibilities mixed together: calculating the total product cost and the shipping cost. It‘s doing too much, which makes it harder to understand and test.

Let‘s start by writing a characterization test to capture the function‘s current behavior:

class PriceCalculatorTest extends TestCase {
  public function testCalculateTotalPrice() {
    $products = [
      new Product("Widget", 10.0),
      new Product("Gadget", 20.0)
    ];

    $user = new User("John Doe", "admin"); 

    $totalPrice = calculateTotalPrice($products, $user);

    $this->assertEquals(29.0, $totalPrice);
  }
}

With this test in place, we can start refactoring the function in small steps.

First, let‘s extract the product cost calculation into its own function:

function calculateProductCost($products, $user) {
  $total = 0;
  foreach ($products as $product) {
    if ($user->isAdmin()) {
      $total += $product->price * 0.8;
    } elseif ($user->isVIP()) { 
      $total += $product->price * 0.9;
    } else {
      $total += $product->price;  
    }
  }
  return $total;
}

function calculateTotalPrice($products, $user) {
  $productCost = calculateProductCost($products, $user);

  $shipping = 5.0;
  if ($productCost > 100) {
    $shipping = 10.0;
  } 

  return $productCost + $shipping;
}

We run the tests and they still pass. So far so good!

Next let‘s extract the shipping cost calculation:

function calculateShippingCost($productCost) {
  $shipping = 5.0;
  if ($productCost > 100) {
    $shipping = 10.0;
  }
  return $shipping;
}

function calculateTotalPrice($products, $user) {
  $productCost = calculateProductCost($products, $user);
  $shippingCost = calculateShippingCost($productCost);

  return $productCost + $shippingCost;  
}

Our tests still pass. The original function is now doing a lot less, delegating to the new functions. We can also add unit tests for the new functions in isolation.

There are more refactorings we could do (like using polymorphism instead of if/else for user roles), but this is good progress for one sitting! The tests give us confidence our refactorings haven‘t changed behavior.

Shifting Organizational Culture

Paying down legacy code isn‘t just a technical challenge. It requires an organizational culture that values code quality and continual improvement.

Too often, the pressure to ship features leads to quality being sacrificed. Technical debt piles up until it hits a crisis point. But by then, it‘s very expensive to fix.

Changing the culture starts with awareness. Help your team understand the long-term costs of technical debt and the benefits of keeping a clean codebase. Make refactoring and testing part of your team‘s definition of done for features.

Leaders need to create space for reducing technical debt continuously, not just when it boils over. Saying "no" to some feature work in the short term is hard, but it keeps you healthy and competitive in the long run.

Celebrate victories along the legacy journey, whether it‘s increasing test coverage by 5% or eliminating a particularly gnarly code smell. Build momentum and pride in the legacy crusade!

Escaping Legacy Hell for Good

Legacy code can be daunting, but it doesn‘t have to be a death march. With smart testing and refactoring techniques, you can gradually make even the most tangled legacy code more habitable and malleable.

The journey takes patience and persistence, but it‘s so worth it. Working in a cleanly designed, well-tested codebase is a joy. Escaping legacy hell liberates your team to rapidly deliver value to users with confidence.

So the next time you curse that legacy monstrosity you‘ve inherited, remember there‘s hope! Sharpen your refactoring sword, don your test-driven armor, and slay that legacy beast once and for all. Your team (and your future self) will thank you.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *