Pandas Count Rows – How to Get the Number of Rows in a Dataframe
As a data scientist, one of the most fundamental things to know about your dataset is its size – especially the number of rows. Pandas is the go-to Python library for data manipulation and analysis, so let‘s dive into how to count the number of rows in a pandas DataFrame.
Why the Number of Rows in a DataFrame Matters
Before we get to the "how", let‘s discuss the "why". Knowing the number of rows in your DataFrame is important for several reasons:
- It gives you a sense of the size and scale of your dataset.
- Many machine learning algorithms are sensitive to sample size. Knowing your row count helps determine if you have enough data to train a model.
- Understanding the size guides your choice of computational approaches. Certain operations may be too slow or memory-intensive for large datasets.
- Verifying the row count serves as a quick data integrity check after filtering, merging, or other data transformations.
Clearly, being able to get the row count of a DataFrame is a critical skill. Fortunately, pandas provides multiple ways to achieve this.
Setting Up a Sample DataFrame
Before we explore the different methods to count rows, let‘s create a sample DataFrame to work with. We‘ll use pandas‘ built-in read_csv()
function to load data from a CSV file containing information about planets in our solar system.
import pandas as pd
planets_df = pd.read_csv(‘planets.csv‘)
print(planets_df)
This gives us the following DataFrame:
Name Mass (10^24kg) Diameter (km) Density (kg/m^3) Gravity (m/s^2) Escape Velocity (km/s) Rotation Period (hours) Length of Day (hours) Distance from Sun (10^6 km) Perihelion (10^6 km) Aphelion (10^6 km) Orbital Period (days) Orbital Velocity (km/s) Orbital Inclination (degrees) Orbital Eccentricity Obliquity to Orbit (degrees) Mean Temperature (C) Surface Pressure (bars) Number of Moons Has Ring System Has Global Magnetic Field
0 Mercury 0.330 4879 5429 3.7 4.3 1407.6 4222.6 57.9 46.0 69.8 88.0 47.4 7.0 0.206 0.034 167 0 0 False True
1 Venus 4.87 12104 5234 8.9 10.4 5832.5 2802.0 108.2 107.5 108.9 224.7 35.0 3.4 0.007 177.4 464 92 0 False False
2 Earth 5.97 12756 5514 9.8 11.2 23.9 24.0 149.6 147.1 152.1 365.2 29.8 0.0 0.017 23.4 15 1 1 False True
3 Mars 0.642 6792 3934 3.7 5.0 24.6 24.7 227.9 206.7 249.2 687.0 24.1 1.8 0.094 25.2 -65 0.01 2 False False
4 Jupiter 1898 142984 1326 23.1 59.5 9.9 9.9 778.5 740.6 816.4 4331 13.1 1.3 0.049 3.1 -110 Unknown 79 True True
5 Saturn 568 120536 687 9.0 35.5 10.7 10.7 1432.0 1357.6 1506.5 10747 9.7 2.5 0.052 26.7 -140 Unknown 82 True True
6 Uranus 86.8 51118 1270 8.7 21.3 17.2 17.2 2867.0 2732.7 3001.4 30589 6.8 0.8 0.047 97.8 -195 Unknown 27 True True
7 Neptune 102 49528 1638 11.0 23.5 16.1 16.1 4515.0 4471.1 4558.9 59800 5.4 1.8 0.010 28.3 -200 Unknown 14 True True
Great, we now have a DataFrame called planets_df
containing 8 rows of data about the planets. Let‘s use this to demonstrate various methods for getting the row count.
Using the len() Function
The simplest way to get the number of rows in a DataFrame is to use Python‘s built-in len()
function. Simply pass your DataFrame to len()
and it will return the number of rows.
num_rows = len(planets_df)
print(f‘The number of rows is: {num_rows}‘)
Output:
The number of rows is: 8
The len()
function returns the length of the DataFrame, which is the number of rows. Easy!
Using the shape Attribute
DataFrames have a shape
attribute that returns a tuple specifying the dimensions of the DataFrame. The first element is the number of rows and the second is the number of columns.
num_rows, num_cols = planets_df.shape
print(f‘The number of rows is: {num_rows}‘)
print(f‘The number of columns is: {num_cols}‘)
Output:
The number of rows is: 8
The number of columns is: 20
If you only need the row count, you can index the first element of the shape
tuple:
num_rows = planets_df.shape[0]
print(f‘The number of rows is: {num_rows}‘)
Using the index Attribute
A DataFrame‘s index
attribute contains the row labels. You can count the number of labels in the index to get the number of rows.
One way is to use the size
property of the index:
num_rows = planets_df.index.size
print(f‘The number of rows is: {num_rows}‘)
Alternatively, you can pass the index to len()
:
num_rows = len(planets_df.index)
print(f‘The number of rows is: {num_rows}‘)
Both approaches yield the same result – the number of rows in the DataFrame.
Using the axes Attribute
The axes
attribute of a DataFrame contains the row and column labels. The row labels are contained in axes[0]
.
Similar to the index
attribute, you can use either the size
property or len()
on axes[0]
to get the row count:
num_rows = planets_df.axes[0].size
# Or equivalently: num_rows = len(planets_df.axes[0])
print(f‘The number of rows is: {num_rows}‘)
Using the info() Method
The info()
method prints a concise summary of a DataFrame, including the number of rows. While it doesn‘t directly return the row count, it can be a handy way to quickly inspect your DataFrame.
planets_df.info()
Output:
<class ‘pandas.core.frame.DataFrame‘>
RangeIndex: 8 entries, 0 to 7
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 8 non-null object
1 Mass (10^24kg) 8 non-null float64
2 Diameter (km) 8 non-null int64
3 Density (kg/m^3) 8 non-null int64
4 Gravity (m/s^2) 8 non-null float64
5 Escape Velocity (km/s) 8 non-null float64
6 Rotation Period (hours) 8 non-null float64
7 Length of Day (hours) 8 non-null float64
8 Distance from Sun (10^6 km) 8 non-null float64
9 Perihelion (10^6 km) 8 non-null float64
10 Aphelion (10^6 km) 8 non-null float64
11 Orbital Period (days) 8 non-null float64
12 Orbital Velocity (km/s) 8 non-null float64
13 Orbital Inclination (degrees) 8 non-null float64
14 Orbital Eccentricity 8 non-null float64
15 Obliquity to Orbit (degrees) 8 non-null float64
16 Mean Temperature (C) 8 non-null int64
17 Surface Pressure (bars) 4 non-null object
18 Number of Moons 8 non-null int64
19 Has Ring System 8 non-null bool
20 Has Global Magnetic Field 8 non-null object
dtypes: bool(1), float64(13), int64(4), object(2)
memory usage: 1.4+ KB
The second line of the output tells us there are 8 entries (rows) in the DataFrame.
Comparing the Methods
We‘ve seen five different ways to get the number of rows in a DataFrame – len()
, shape
, index
, axes
, and info()
. Which one should you use?
In terms of performance, len(df)
and df.shape[0]
are generally the fastest, followed by the index
and axes
attributes. The info()
method is the slowest as it computes additional summary statistics.
I recommend using len(df)
or df.shape[0]
in most cases. They are concise, readable, and efficient. Use info()
when you want a more comprehensive overview of your DataFrame.
Handling Large DataFrames
When working with very large DataFrames, counting the number of rows can be time and memory-consuming. In such cases, you can use the index
attribute with the size
property or len()
. These leverage the index directly without loading the entire DataFrame into memory.
If you only need an approximate row count for a large DataFrame, consider using pandas‘ sample()
method to work with a smaller, random subset of the data.
Counting Rows in Filtered or Grouped DataFrames
Often, you‘ll want to count the number of rows meeting certain criteria or belonging to different groups. You can combine the counting methods we‘ve learned with boolean indexing and the groupby()
function.
For example, to count the number of planets with a diameter greater than 10,000 km:
num_large_planets = len(planets_df[planets_df[‘Diameter (km)‘] > 10000])
print(f‘There are {num_large_planets} planets with a diameter greater than 10,000 km‘)
Or to count the number of planets with and without rings:
ring_counts = planets_df.groupby(‘Has Ring System‘).size()
print(ring_counts)
Output:
Has Ring System
False 4
True 4
dtype: int64
Counting Non-Null Rows
Sometimes your DataFrame may contain missing values represented as NaN
(Not a Number). If you want to count the number of non-missing values in each column, you can use the count()
method:
non_null_counts = planets_df.count()
print(non_null_counts)
Output:
Name 8
Mass (10^24kg) 8
Diameter (km) 8
Density (kg/m^3) 8
Gravity (m/s^2) 8
Escape Velocity (km/s) 8
Rotation Period (hours) 8
Length of Day (hours) 8
Distance from Sun (10^6 km) 8
Perihelion (10^6 km) 8
Aphelion (10^6 km) 8
Orbital Period (days) 8
Orbital Velocity (km/s) 8
Orbital Inclination (degrees) 8
Orbital Eccentricity 8
Obliquity to Orbit (degrees) 8
Mean Temperature (C) 8
Surface Pressure (bars) 4
Number of Moons 8
Has Ring System 8
Has Global Magnetic Field 8
dtype: int64
This is especially useful when cleaning data – a low non-null count indicates a column with many missing values that may need special handling.
Counting Rows with Specific Values
To count the number of rows with a specific value in a column, you can use the value_counts()
method. For instance, to count the number of planets with each possible number of moons:
moon_counts = planets_df[‘Number of Moons‘].value_counts()
print(moon_counts)
Output:
0 3
1 1
2 1
14 1
27 1
79 1
82 1
Name: Number of Moons, dtype: int64
This tells us there are 3 planets with 0 moons, 1 planet with 1 moon, 1 planet with 2 moons, and so on.
Summary
In this post, we‘ve covered several ways to count the number of rows in a pandas DataFrame:
- Using the
len()
function - Using the
shape
attribute - Using the
index
attribute withsize
orlen()
- Using the
axes
attribute withsize
orlen()
- Using the
info()
method
We also discussed performance considerations for large DataFrames and counting rows in filtered, grouped, or aggregated results.
Counting the number of rows is a fundamental operation in data analysis. With the techniques covered here, you‘re well-equipped to assess the size and dimensions of your DataFrames. Go forth and analyze!