Matplotlib Course – Learn Python Data Visualization

If a picture is worth a thousand words, a well-crafted data visualization is worth at least a million. Being able to visually represent data in a clear and compelling way is an invaluable skill for data scientists, analysts, and really anyone working with quantitative information. Fortunately, Python provides an amazingly powerful yet easy-to-use library for creating data visualizations: Matplotlib.

In this course, we‘ll take a deep dive into Matplotlib and learn how to create a wide variety of charts, plots, and other visualizations. By the end, you‘ll have the skills to make your data truly shine. Let‘s get started!

What is Matplotlib?

At its core, Matplotlib is a Python library for creating static, animated, and interactive visualizations. It was originally created by John D. Hunter and is now maintained by a large team of developers.

Some key features of Matplotlib include:

  • Wide range of plot types including line plots, scatter plots, bar charts, histograms, heatmaps, and many more
  • Fine-grained control over every aspect of a figure
  • Output in many file formats and GUI backends across platforms
  • Integrates closely with other Python libraries like NumPy and Pandas

Matplotlib has become the foundational plotting library for the scientific Python stack and is used widely in commercial and academic settings. While there are other newer Python viz libraries like Seaborn and Plotly, Matplotlib remains the most widely-used and comprehensive library.

Installing Matplotlib

Before we start coding, we need to make sure Matplotlib is installed. The easiest way to install Matplotlib is using pip:

pip install matplotlib

Matplotlib has a few dependencies like NumPy that will also be installed if you don‘t have them already. Alternatively, you can install Matplotlib as part of the Anaconda distribution, which includes many of the most popular Python libraries for data science.

Anatomy of a Matplotlib Figure

To use Matplotlib effectively, it‘s important to understand the components that make up a figure. Here‘s a labeled diagram of a basic line plot:

Anatomy of a Matplotlib figure

The main components are:

  • Figure: The top-level container for all plot elements
  • Axes: The actual plot where data is represented. A figure can contain multiple Axes
  • Axis: The number lines that define the boundaries of the plot. Each Axes has an x-axis and y-axis
  • Artist: Everything you can see on the figure is an artist, including Text, Line2D, collections, Patches, etc.

When you call a plotting function like plt.plot(), Matplotlib creates a Figure with a single Axes and plots the data on that Axes. You can then use a variety of functions to customize the title, labels, limits, color, style, etc.

Let‘s see a basic example:

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title("My first plot")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.show()

This code generates a simple line plot of a sine wave:

Basic line plot in Matplotlib

We first imported the pyplot module from matplotlib and gave it the common alias plt. We also imported numpy to generate our data.

We then created a Figure and Axes implicitly by calling plt.plot() and passing the x and y data to plot. We customized the title and labels using the title(), xlabel() and ylabel() functions.

Finally, we displayed the plot in a new window by calling plt.show(). This basic template can be used to create all kinds of plots by passing different data and customizing with the various formatting functions.

Essential Plot Types

Matplotlib supports a large number of different plot types to represent different kinds of data. Here are some of the most commonly used ones:

Line Plot

A line plot is used to plot series data like time series or parametric curves. Data points are represented by markers and connected by straight lines. Here‘s an example with multiple series:

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label=‘sin(x)‘)
plt.plot(x, y2, label=‘cos(x)‘)
plt.legend()
plt.show()

Line plot with multiple series

Scatter Plot

A scatter plot is used to plot two-dimensional data where each data point has an x and y value. Points are not connected by lines. Here‘s an example with custom colors and marker styles:

x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)
area = (30 * np.random.rand(50))**2

plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()  

Scatter plot with custom colors and sizes

Bar Chart

A bar chart is used to represent categorical data with rectangular bars, where the length of each bar indicates its value. Here‘s an example of a horizontal bar chart:

x = [‘A‘, ‘B‘, ‘C‘, ‘D‘, ‘E‘]
y = [3, 7, 2, 5, 1]

plt.barh(x, y)
plt.xlabel(‘Value‘) 
plt.ylabel(‘Category‘)
plt.show()

Horizontal bar chart

Histogram

A histogram represents the distribution of a dataset by dividing the range of values into bins and plotting the number of values that fall into each bin. Here‘s an example with customized bins:

x = np.random.normal(170, 10, 250) 

plt.hist(x, bins=[150, 160, 170, 180, 190], edgecolor=‘black‘)
plt.xlabel(‘Height (cm)‘)
plt.ylabel(‘Count‘)  
plt.show()

Histogram with custom bins

These are just a few of the many plot types available in Matplotlib. I encourage you to explore the gallery and documentation to learn about all the possibilities.

Customizing Plots

One of Matplotlib‘s greatest strengths is the ability to customize nearly every element of a figure. You can control things like:

  • Line and marker styles
  • Colors and colormaps
  • Text and font properties
  • Axes limits and scaling
  • Legend and annotations
  • Figure size and resolution

Here‘s an example showcasing a few common customizations:

fig, ax = plt.subplots(figsize=(8, 4), dpi=100)

x = np.linspace(0, 10, 100)
y = np.exp(-x / 2) * np.sin(2 * np.pi * x)

ax.plot(x, y, ls=‘--‘, lw=2, c=‘purple‘, marker=‘o‘, ms=5, label=‘damped sine‘)  
ax.set_xlim(0, 10)
ax.set_ylim(-1, 1)
ax.set_title(‘Damped Sine Wave‘, fontsize=16)
ax.set_xlabel(‘Time (s)‘, fontsize=14)
ax.set_ylabel(‘Amplitude‘, fontsize=14)
ax.legend(fontsize=12)

fig.tight_layout()
plt.show()

Customized plot of damped sine wave

In this example, we explicitly created a Figure and Axes using plt.subplots() so we could set the figure size and resolution. We then called ax.plot() to plot on the Axes and used several formatting arguments to control the line style, width, color, marker style, and label.

We set the x and y axis limits using set_xlim() and set_ylim() and added a title and labels with increased font sizes. We also created a legend with a specified font size.

Finally, we used fig.tight_layout() to automatically adjust the padding between elements and avoid clipping.

This is just scratching the surface of what‘s possible with Matplotlib‘s customization. You can create your own color maps, define custom styles, use LaTeX for rendering text, and much more. Refer to the API docs to see all the available options.

Plotting Real-World Data

So far we‘ve been plotting toy data generated by NumPy. Let‘s see how we can load a real-world dataset and create some useful visualizations.

We‘ll work with the Titanic passenger dataset, which contains information about the passengers on the Titanic like age, sex, ticket fare, and whether they survived. You can download the data from Kaggle: https://www.kaggle.com/c/titanic/data

Here‘s how we can load the data into a Pandas DataFrame:

import pandas as pd

df = pd.read_csv(‘titanic.csv‘)
print(df.head())
   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S
3            4         1       1  ...  53.1000  C123         S
4            5         0       3  ...   8.0500   NaN         S

Let‘s create a bar chart showing the number of survivors by passenger class:

survived_counts = df.groupby([‘Pclass‘, ‘Survived‘]).size().unstack()

survived_counts.plot(kind=‘bar‘, stacked=True)
plt.xlabel(‘Passenger Class‘)
plt.ylabel(‘Count‘)
plt.show()

Titanic survivors by passenger class

We first grouped the data by passenger class and survival status, counted the number of passengers in each group, and reshaped the result into a DataFrame with survival status as columns.

We then created a stacked bar chart by calling the DataFrame‘s plot() method with kind=‘bar‘ and stacked=True. Matplotlib integrates with Pandas to allow plotting directly from DataFrames.

Let‘s look at the age distribution of passengers using a histogram with custom bins:

plt.hist(df[‘Age‘], bins=[0, 10, 20, 30, 40, 50, 60, np.inf], edgecolor=‘black‘)
plt.xlabel(‘Age‘)
plt.ylabel(‘Count‘)
plt.title(‘Titanic Passenger Ages‘)
plt.show()  

Histogram of Titanic passenger ages

We selected the ‘Age‘ column from the DataFrame and passed it to plt.hist(), specifying the desired bin edges. Setting the last bin to np.inf catches all ages above 60.

Finally, let‘s compare the fares paid by survivors vs. non-survivors using box plots:

fig, ax = plt.subplots()

survived_fares = df[df[‘Survived‘] == 1][‘Fare‘]
not_survived_fares = df[df[‘Survived‘] == 0][‘Fare‘]

data = [survived_fares, not_survived_fares]
labels = [‘Survived‘, ‘Did not survive‘]

ax.boxplot(data, labels=labels)
ax.set_ylabel(‘Fare‘)
ax.set_title(‘Fares by Survival Status‘)

plt.show()

Box plots of fares by survival status

We first selected the ‘Fare‘ data for survivors and non-survivors separately. We then passed the data and labels to ax.boxplot() to create side-by-side box plots.

The box plots show that survivors tended to pay higher fares than non-survivors, but there are many outliers in both groups. This suggests that fare was not the only factor determining survival.

These examples demonstrate how Matplotlib can be used to gain insights from real-world datasets. By creating visualizations of the data, we can identify patterns, relationships, and outliers that may not be apparent from the raw numbers.

What‘s Next?

In this course, we‘ve covered the fundamentals of creating visualizations with Matplotlib, from basic plot types to customizing and styling plots to exploring real-world data. But there‘s still much more to learn!

Here are some suggestions for further exploration:

  • Experiment with other plot types like contour plots, pie charts, polar charts, 3D plots, etc.
  • Learn about Matplotlib‘s object-oriented interface for more control over your figures
  • Explore other Python viz libraries like Seaborn (statistical plotting) and Plotly (interactive web plots)
  • Check out Matplotlib‘s gallery and examples for inspiration and code snippets
  • Practice visualizing your own datasets to hone your skills

Remember, the best way to learn is by doing. Don‘t be afraid to experiment, make mistakes, and iterate. With practice, you‘ll be creating beautiful, insightful visualizations in no time!

Happy plotting!

Similar Posts