Excel Tutorial – How to Clean Data with the TRIM() and CLEAN() Functions

As a full-stack developer and Excel expert, I‘ve worked with my fair share of datasets over the years. And let me tell you, messy, inconsistent data is far more common than you might think. In fact, according to a survey by Experian, the average company believes 32% of their data is inaccurate. That‘s nearly a third of all information potentially being unusable or leading to incorrect conclusions!

Dirty data can come from a variety of sources – data entry errors, importing from multiple disparate systems, inconsistent formatting, and more. And the costs can be significant. IBM estimates that poor data quality costs the US economy $3.1 trillion per year. On an individual company level, issues stemming from bad data can lead to lost productivity, missed opportunities, and even reputational damage in the case of public reporting errors.

The good news is that Excel provides a robust set of functions for cleaning and standardizing data. Two of the most essential are TRIM() and CLEAN(). Used separately or in combination, these functions can help turn a mess of data into a lean, mean, analysis-ready machine. Let‘s dive in!

Removing Excess Whitespace with TRIM()

One of the most common issues I see in datasets is inconsistent spacing. Leading spaces, trailing spaces, double spaces between words – they all contribute to making data harder to work with and analyze. That‘s where the TRIM() function comes in.

TRIM() removes all leading and trailing whitespace from a text string, and replaces any sequence of inner spaces with a single space character. Here‘s the syntax:

=TRIM(text)

The text argument can be a hard-coded string in quotes, like " too many spaces ", or more commonly, a reference to a cell containing the text you want to trim.

To see TRIM() in action, look at this example data:

Raw Data
Excess Spaces Ltd.
Trailing Space Inc.
Double Space LLC

Pretty messy, right? But watch what happens when we apply TRIM():

Raw Data Trimmed Data
Excess Spaces Ltd. Excess Spaces Ltd.
Trailing Space Inc. Trailing Space Inc.
Double Space LLC Double Space LLC

Voila! The leading, trailing, and double spaces are gone, leaving us with nice, consistently-formatted text.

Under the hood, TRIM() works by looking at each character in the string one by one. When it hits a space character (ASCII code 32), it checks the surrounding characters. If the space is at the very beginning or end of the string, or if the previous or next character is also a space, it removes it. This process repeats until only single spaces between words remain.

One thing to note is that TRIM() only removes regular space characters. Non-breaking spaces (ASCII code 160) and other types of whitespace like tabs or line breaks are left intact. That‘s where CLEAN() comes in…

Getting Rid of Non-Printing Characters with CLEAN()

While spaces are visible culprits of dirty data, there‘s a more insidious type of character that can lurk in your datasets unnoticed: non-printing characters. These include things like:

  • Line breaks (ASCII 10 and 13)
  • Non-breaking spaces (ASCII 160)
  • Tabs (ASCII 9)
  • Other control characters in the 0-31 ASCII range

These characters are often invisible to the naked eye, but can still interfere with functions, formulas, and analysis. For example, if you have line breaks in what should be a single line of text, it can throw off string manipulation functions like LEFT(), RIGHT(), and MID().

The CLEAN() function was designed specifically to address non-printing characters. It removes all characters in the 0-31 ASCII range, except for tabs, line breaks, and non-breaking spaces. The syntax is identical to TRIM():

=CLEAN(text)

Let‘s see it in action on some example data:

Raw Data
Line↵Breaks
Non Breaking Spaces
Control Chars

When we apply the CLEAN() function:

Raw Data Cleaned Data
Line↵Breaks LineBreaks
Non Breaking Spaces Non Breaking Spaces
Control Chars Control Chars

The line breaks and invisible control characters are removed, but tabs, spaces, and non-breaking spaces are left as-is.

Technically, CLEAN() works by checking the ASCII code of each character in the string. If the code is less than 32 (excluding 9, 10, and 13), the character is removed. This cleans out unprintable characters while preserving whitespace and line breaks.

Putting it All Together

As useful as TRIM() and CLEAN() are on their own, they become even more powerful when used together. In many cases, you‘ll want to remove both excess whitespace and non-printing characters from your data. There are two main ways to combine these functions:

  1. Nesting one inside the other
  2. Applying them in separate columns and concatenating the results

To nest TRIM() and CLEAN(), you simply put one function as the text argument of the other, like so:

=CLEAN(TRIM(text))

This first applies TRIM() to the text, removing excess spaces, then applies CLEAN() to the trimmed result, removing non-printing characters. You can also reverse the order and do TRIM(CLEAN(text)) – the end result will be the same.

The other approach is to apply TRIM() and CLEAN() in separate columns, then concatenate the results. For example:

Raw Data Trimmed Data Cleaned Data Clean & Trimmed Data
Line↵Breaks Line↵Breaks LineBreaks LineBreaks
Excess Spaces Ltd. Excess Spaces Ltd. Excess Spaces Ltd. Excess Spaces Ltd.

In this setup, column B contains the TRIM() formula, column C contains the CLEAN() formula, and column D concatenates the two with a formula like:

=B2&C2

This approach takes up more columns, but can be useful if you want to see the individual steps or need the partially-cleaned data for other purposes.

Checking Your Work

Before unleashing your newly-cleaned data on the world, it‘s always a good idea to do a quick audit and make sure everything looks as expected. A few things I like to check:

  • Consistent casing (e.g. all proper nouns capitalized)
  • No leading/trailing spaces (a quick way to check is to use =LEN() and compare the length of the original and trimmed data)
  • No unexpected characters or artefacts from the cleaning process
  • Spot check a few random cells to make sure the cleaning formulas were applied correctly

You can also use Excel‘s Filter feature to check for blanks, duplicates, or unusual values. Select your range of data, then go to Data > Filter. Click the filter arrow on each column and look for anything out of place.

Preventing Dirty Data

As the saying goes, an ounce of prevention is worth a pound of cure. While TRIM() and CLEAN() are great for tidying up messy data, it‘s even better to prevent dirty data from creeping in to begin with. Here are a few proactive steps you can take:

  • Use data validation to restrict cell entries to specific formats or value ranges
  • Provide clear instructions and examples for manual data entry
  • Use input masks and form controls to enforce consistent data entry
  • Perform data quality checks and validation on data imports
  • Train staff on best practices for data entry and management

By combining proactive data quality measures with reactive cleaning functions like TRIM() and CLEAN(), you can ensure your data is always in tip-top shape.

Advanced Cleaning Techniques

While TRIM() and CLEAN() can handle a lot of common data quality issues, sometimes you need to bring out the big guns. Here are a couple advanced techniques for taming particularly unruly data:

  • Flash Fill: This AI-powered feature detects patterns in your data and can automatically extract, combine, or format data based on a few examples. It‘s great for parsing inconsistently formatted data like addresses or names.

  • Power Query: For really complex data cleaning tasks, Power Query is your best friend. With Power Query, you can automate cleaning steps, parse JSON or XML data, pivot and unpivot tables, and much more. It‘s an essential tool for any data-wrangling Excel power user.

Of course, covering these advanced techniques in depth is beyond the scope of this article. But if you find yourself struggling to clean your data with standard Excel functions, I highly recommend checking them out.

Conclusion

Dirty data is a pervasive problem in the business world, but with the right tools and techniques, it doesn‘t have to be a show-stopper. The TRIM() and CLEAN() functions in Excel are two of the most powerful weapons in the fight against messy data.

In this article, we‘ve covered:

  • What TRIM() and CLEAN() do and how they work under the hood
  • Combining TRIM() and CLEAN() for maximum data-cleaning efficiency
  • Auditing cleaned data to check for issues
  • Proactive steps for preventing dirty data entry
  • Advanced cleaning techniques like Flash Fill and Power Query

By applying these concepts consistently, you can turn even the ugliest data into clean, analysis-ready information. Clean data means better decisions, fewer errors, and ultimately, a healthier bottom line for your business.

So go forth and clean that data! With TRIM(), CLEAN(), and a bit of Excel elbow grease, you‘ll be amazed at how much insight you can extract from even the messiest of datasets.

Similar Posts