How to Rename a Column in Pandas – Python Pandas Dataframe Renaming Tutorial

As a data scientist or analyst, you‘ll frequently work with Pandas DataFrames to manipulate and analyze data in Python. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It‘s like a spreadsheet or SQL table.

One common task when working with DataFrames is renaming the columns. Why rename columns? The reasons include:

  • Making column names more descriptive and readable
  • Following a consistent naming convention (like all lowercase)
  • Removing spaces, punctuation or special characters
  • Shortening long column names
  • Complying with naming requirements for other libraries or databases

In this tutorial, we‘ll cover various methods for renaming the columns of a DataFrame. But first, let‘s create an example DataFrame to work with throughout this guide.

Create an Example DataFrame

We‘ll use some stock market data for our example. Let‘s assume we have a CSV file called "stocks.csv" with these contents:

Ticker,Date,Open,Close,High,Low,Volume
AAPL,2022-01-03,$182.01,$182.01,$182.94,$179.12,104487500
AAPL,2022-01-04,181.92,179.70,182.94,179.12,99310400
MSFT,2022-01-03,334.75,336.32,338.50,333.67,28039100
MSFT,2022-01-04,339.53,334.75,342.00,334.01,28516500

We can read this CSV into a DataFrame using pandas.read_csv():

import pandas as pd

df = pd.read_csv(‘stocks.csv‘)
print(df)

This prints:

Ticker Date Open Close High Low Volume
0 AAPL 2022-01-03 182.01 182.01 182.94 179.12 104487500
1 AAPL 2022-01-04 181.92 179.70 182.94 179.12 99310400
2 MSFT 2022-01-03 334.75 336.32 338.50 333.67 28039100
3 MSFT 2022-01-04 339.53 334.75 342.00 334.01 28516500

The DataFrame has columns for the stock ticker symbol, date, opening price, closing price, day high, day low, and trading volume. Notice a few issues with the column names we may want to address:

  • Inconsistent capitalization (Ticker vs Date)
  • Spaces in the names
  • Prices should indicate currency
  • Volume is ambiguous, should clarify units

Let‘s look at various ways to rename these columns to improve the DataFrame.

Rename Columns Using .rename()

The most straightforward way to rename DataFrame columns is using the .rename() method. You pass a dictionary mapping the old names to new names. Here‘s the general syntax:

df.rename(columns={‘old_name1‘: ‘new_name1‘, ‘old_name2‘: ‘new_name2‘}, inplace=True)

The inplace=True argument modifies the DataFrame in place. If you omit it, .rename() will return a new DataFrame with the updated names instead of changing the original.

To fix our columns, we can do:

df.rename(columns={
‘Ticker‘: ‘Symbol‘,
‘Open‘: ‘OpenPrice‘,
‘Close‘: ‘ClosePrice‘,
‘High‘: ‘HighPrice‘,
‘Low‘: ‘LowPrice‘,
‘Volume‘: ‘VolumeShares‘
}, inplace=True)

print(df)

The output:

Symbol Date OpenPrice ClosePrice HighPrice LowPrice VolumeShares
0 AAPL 2022-01-03 182.01 182.01 182.94 179.12 104487500
1 AAPL 2022-01-04 181.92 179.70 182.94 179.12 99310400
2 MSFT 2022-01-03 334.75 336.32 338.50 333.67 28039100
3 MSFT 2022-01-04 339.53 334.75 342.00 334.01 28516500

This addresses most of the issues – standardizing Ticker to Symbol, expanding Price and Volume, and removing the spaces.

Set Columns Directly with df.columns

Another way to rename all the columns is to assign a list of names directly to df.columns. The list must contain a name for each column, in order.

For example:

df.columns = [‘Symbol‘, ‘Date‘, ‘OpenPrice‘, ‘ClosePrice‘, ‘HighPrice‘, ‘LowPrice‘, ‘VolumeShares‘]

print(df)

Prints the same output as the .rename() example. This method is convenient if you want to specify all new names from scratch vs individually mapping them.

Rename with .set_axis()

The .set_axis() method is another option for assigning new column names from a list:

df.set_axis([‘Symbol‘, ‘Date‘, ‘OpenPrice‘, ‘ClosePrice‘, ‘HighPrice‘, ‘LowPrice‘, ‘VolumeShares‘], axis=1, inplace=True)

The axis=1 argument indicates we‘re modifying the column labels (axis=0 is for row labels). And like with .rename(), include inplace=True to modify the DataFrame rather than returning a copy.

Apply a Function to Column Names

To perform more complex renaming operations, you can apply a function or lambda to df.columns. For instance, this code lowercases and removes underscores from all the column names:

df.columns = df.columns.str.lower().str.replace(‘_‘, ‘‘)

print(df)

Output:
symbol date openprice closeprice highprice lowprice volumeshares
0 AAPL 2022-01-03 182.01 182.01 182.94 179.12 104487500
1 AAPL 2022-01-04 181.92 179.70 182.94 179.12 99310400
2 MSFT 2022-01-03 334.75 336.32 338.50 333.67 28039100
3 MSFT 2022-01-04 339.53 334.75 342.00 334.01 28516500

We access the Series of column names with df.columns, call str.lower() to lowercase them and str.replace() to substitute characters. You can chain any Series string methods this way.

Another useful trick is to use a lambda function to add a prefix or suffix to all names:

df.columns = [‘stock_‘ + col for col in df.columns]

print(df)

Result:
stock_Ticker stock_Date stock_Open stock_Close stock_High stock_Low stock_Volume
0 AAPL 2022-01-03 182.01 182.01 182.94 179.12 104487500
1 AAPL 2022-01-04 181.92 179.70 182.94 179.12 99310400
2 MSFT 2022-01-03 334.75 336.32 338.50 333.67 28039100
3 MSFT 2022-01-04 339.53 334.75 342.00 334.01 28516500

Renaming Specific Columns

So far we‘ve renamed all the columns at once. But what if you only want to change specific names?

The simplest way is to use .rename() and only include the columns to change in the dictionary:

df.rename(columns={‘Ticker‘: ‘Symbol‘, ‘Close‘: ‘ClosePrice‘}, inplace=True)

Alternatively, you can select a subset of columns and apply a renaming function:

df[[‘Open‘, ‘High‘, ‘Low‘]] = df[[‘Open‘, ‘High‘, ‘Low‘]].rename(lambda x: x + ‘Price‘, axis=1)

This selects the ‘Open‘, ‘High‘, and ‘Low‘ columns, applies a function to append ‘Price‘ to each, and assigns the result back to those columns in the original DataFrame.

Best Practices for Column Names

When renaming DataFrame columns, it‘s best to follow these guidelines:

  • Use lowercase names for consistency
  • Replace spaces with underscores
  • Remove punctuation and special characters
  • Keep names concise but meaningful
  • Avoid using reserved keywords (like "class", "def", "import", etc.)

Adhering to these rules will make your code more readable and avoid errors when interacting with databases or other libraries.

Advanced Renaming with Regex

For more sophisticated renaming tasks, you can use regular expressions (regex) to match patterns in the column names. Pandas provides the .replace() method for this.

For instance, this renames columns starting with "Open" or "Close":

df.columns = df.columns.str.replace(‘^(Open|Close)‘, lambda x: x.group(1) + ‘Price‘)

The ^ anchors the match to the start of the string, (Open|Close) matches either "Open" or "Close", and x.group(1) refers to the matched text to append "Price" to.

You can also use regex to remove or substitute characters:

df.columns = df.columns.str.replace(r‘[!@#$%^&*()]‘, ‘‘) # Remove special chars
df.columns = df.columns.str.replace(r‘\s+‘, ‘_‘) # Replace whitespace with underscore

With the power of regex, you can handle just about any renaming pattern you encounter.

Summary

In this tutorial, we covered several methods to rename the columns of a Pandas DataFrame:

  • Using .rename() with a dictionary to map old to new names
  • Assigning a list of names to df.columns
  • Renaming with .set_axis()
  • Applying string methods or lambdas to df.columns
  • Renaming specific columns with .rename() or indexing
  • Utilizing regular expressions for advanced renaming

We also discussed motivations for renaming columns and best practices to follow. With these techniques in your toolbox, you can whip any DataFrame columns into shape!

The key takeaways are:

  1. Rename columns to make them more meaningful and consistent
  2. .rename() is the go-to method, but df.columns assignment or .set_axis() also work
  3. Access the Series of column names with df.columns and apply string methods
  4. Use a dictionary to map old to new names, or a list to assign all new names
  5. Select specific columns to rename with indexing
  6. Follow naming conventions like lowercase and underscores
  7. Leverage regular expressions for advanced matching and substitution

Now you‘re ready to tackle renaming columns in your own DataFrames. Happy coding!

Similar Posts