Pandas Count Rows – How to Get the Number of Rows in a Dataframe

As a data scientist, one of the most fundamental things to know about your dataset is its size – especially the number of rows. Pandas is the go-to Python library for data manipulation and analysis, so let‘s dive into how to count the number of rows in a pandas DataFrame.

Why the Number of Rows in a DataFrame Matters

Before we get to the "how", let‘s discuss the "why". Knowing the number of rows in your DataFrame is important for several reasons:

  1. It gives you a sense of the size and scale of your dataset.
  2. Many machine learning algorithms are sensitive to sample size. Knowing your row count helps determine if you have enough data to train a model.
  3. Understanding the size guides your choice of computational approaches. Certain operations may be too slow or memory-intensive for large datasets.
  4. Verifying the row count serves as a quick data integrity check after filtering, merging, or other data transformations.

Clearly, being able to get the row count of a DataFrame is a critical skill. Fortunately, pandas provides multiple ways to achieve this.

Setting Up a Sample DataFrame

Before we explore the different methods to count rows, let‘s create a sample DataFrame to work with. We‘ll use pandas‘ built-in read_csv() function to load data from a CSV file containing information about planets in our solar system.

import pandas as pd

planets_df = pd.read_csv(‘planets.csv‘)
print(planets_df)

This gives us the following DataFrame:

   Name  Mass (10^24kg)  Diameter (km)  Density (kg/m^3)  Gravity (m/s^2)  Escape Velocity (km/s)  Rotation Period (hours)  Length of Day (hours)  Distance from Sun (10^6 km)  Perihelion (10^6 km)  Aphelion (10^6 km)  Orbital Period (days)  Orbital Velocity (km/s)  Orbital Inclination (degrees)  Orbital Eccentricity  Obliquity to Orbit (degrees)  Mean Temperature (C)  Surface Pressure (bars)  Number of Moons  Has Ring System  Has Global Magnetic Field
0  Mercury   0.330          4879            5429              3.7               4.3                      1407.6                  4222.6                   57.9                46.0               69.8                  88.0              47.4                        7.0                 0.206               0.034                       167                  0                     0              False             True                     
1  Venus     4.87           12104           5234              8.9               10.4                     5832.5                  2802.0                  108.2               107.5              108.9                 224.7             35.0                        3.4                 0.007               177.4                      464                  92                    0              False             False                    
2  Earth     5.97           12756           5514              9.8               11.2                     23.9                    24.0                    149.6               147.1              152.1                 365.2             29.8                        0.0                 0.017               23.4                       15                   1                     1              False             True                     
3  Mars      0.642          6792            3934              3.7               5.0                      24.6                    24.7                    227.9               206.7              249.2                 687.0             24.1                        1.8                 0.094               25.2                       -65                  0.01                  2              False             False                    
4  Jupiter   1898            142984          1326              23.1              59.5                     9.9                     9.9                     778.5               740.6              816.4                4331              13.1                        1.3                 0.049               3.1                        -110                 Unknown               79             True              True                     
5  Saturn    568             120536          687               9.0               35.5                     10.7                    10.7                    1432.0              1357.6             1506.5               10747             9.7                         2.5                 0.052               26.7                       -140                 Unknown               82             True              True                     
6  Uranus    86.8            51118           1270              8.7               21.3                     17.2                    17.2                    2867.0              2732.7             3001.4               30589             6.8                         0.8                 0.047               97.8                       -195                 Unknown               27             True              True                     
7  Neptune   102             49528           1638              11.0              23.5                     16.1                    16.1                    4515.0              4471.1             4558.9               59800             5.4                         1.8                 0.010               28.3                       -200                 Unknown               14             True              True                     

Great, we now have a DataFrame called planets_df containing 8 rows of data about the planets. Let‘s use this to demonstrate various methods for getting the row count.

Using the len() Function

The simplest way to get the number of rows in a DataFrame is to use Python‘s built-in len() function. Simply pass your DataFrame to len() and it will return the number of rows.

num_rows = len(planets_df)
print(f‘The number of rows is: {num_rows}‘)

Output:

The number of rows is: 8

The len() function returns the length of the DataFrame, which is the number of rows. Easy!

Using the shape Attribute

DataFrames have a shape attribute that returns a tuple specifying the dimensions of the DataFrame. The first element is the number of rows and the second is the number of columns.

num_rows, num_cols = planets_df.shape
print(f‘The number of rows is: {num_rows}‘)
print(f‘The number of columns is: {num_cols}‘)  

Output:

The number of rows is: 8
The number of columns is: 20

If you only need the row count, you can index the first element of the shape tuple:

num_rows = planets_df.shape[0]
print(f‘The number of rows is: {num_rows}‘)

Using the index Attribute

A DataFrame‘s index attribute contains the row labels. You can count the number of labels in the index to get the number of rows.

One way is to use the size property of the index:

num_rows = planets_df.index.size
print(f‘The number of rows is: {num_rows}‘)

Alternatively, you can pass the index to len():

num_rows = len(planets_df.index)  
print(f‘The number of rows is: {num_rows}‘)

Both approaches yield the same result – the number of rows in the DataFrame.

Using the axes Attribute

The axes attribute of a DataFrame contains the row and column labels. The row labels are contained in axes[0].

Similar to the index attribute, you can use either the size property or len() on axes[0] to get the row count:

num_rows = planets_df.axes[0].size
# Or equivalently: num_rows = len(planets_df.axes[0])

print(f‘The number of rows is: {num_rows}‘)  

Using the info() Method

The info() method prints a concise summary of a DataFrame, including the number of rows. While it doesn‘t directly return the row count, it can be a handy way to quickly inspect your DataFrame.

planets_df.info()

Output:

<class ‘pandas.core.frame.DataFrame‘>
RangeIndex: 8 entries, 0 to 7
Data columns (total 20 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Name                          8 non-null      object 
 1   Mass (10^24kg)                8 non-null      float64
 2   Diameter (km)                 8 non-null      int64  
 3   Density (kg/m^3)              8 non-null      int64  
 4   Gravity (m/s^2)               8 non-null      float64
 5   Escape Velocity (km/s)        8 non-null      float64
 6   Rotation Period (hours)       8 non-null      float64
 7   Length of Day (hours)         8 non-null      float64
 8   Distance from Sun (10^6 km)   8 non-null      float64
 9   Perihelion (10^6 km)          8 non-null      float64
 10  Aphelion (10^6 km)            8 non-null      float64
 11  Orbital Period (days)         8 non-null      float64
 12  Orbital Velocity (km/s)       8 non-null      float64
 13  Orbital Inclination (degrees) 8 non-null      float64
 14  Orbital Eccentricity          8 non-null      float64
 15  Obliquity to Orbit (degrees)  8 non-null      float64
 16  Mean Temperature (C)          8 non-null      int64  
 17  Surface Pressure (bars)       4 non-null      object 
 18  Number of Moons               8 non-null      int64  
 19  Has Ring System               8 non-null      bool   
 20  Has Global Magnetic Field     8 non-null      object 
dtypes: bool(1), float64(13), int64(4), object(2)
memory usage: 1.4+ KB

The second line of the output tells us there are 8 entries (rows) in the DataFrame.

Comparing the Methods

We‘ve seen five different ways to get the number of rows in a DataFrame – len(), shape, index, axes, and info(). Which one should you use?

In terms of performance, len(df) and df.shape[0] are generally the fastest, followed by the index and axes attributes. The info() method is the slowest as it computes additional summary statistics.

I recommend using len(df) or df.shape[0] in most cases. They are concise, readable, and efficient. Use info() when you want a more comprehensive overview of your DataFrame.

Handling Large DataFrames

When working with very large DataFrames, counting the number of rows can be time and memory-consuming. In such cases, you can use the index attribute with the size property or len(). These leverage the index directly without loading the entire DataFrame into memory.

If you only need an approximate row count for a large DataFrame, consider using pandas‘ sample() method to work with a smaller, random subset of the data.

Counting Rows in Filtered or Grouped DataFrames

Often, you‘ll want to count the number of rows meeting certain criteria or belonging to different groups. You can combine the counting methods we‘ve learned with boolean indexing and the groupby() function.

For example, to count the number of planets with a diameter greater than 10,000 km:

num_large_planets = len(planets_df[planets_df[‘Diameter (km)‘] > 10000])
print(f‘There are {num_large_planets} planets with a diameter greater than 10,000 km‘)

Or to count the number of planets with and without rings:

ring_counts = planets_df.groupby(‘Has Ring System‘).size()
print(ring_counts)

Output:

Has Ring System
False    4
True     4
dtype: int64

Counting Non-Null Rows

Sometimes your DataFrame may contain missing values represented as NaN (Not a Number). If you want to count the number of non-missing values in each column, you can use the count() method:

non_null_counts = planets_df.count()
print(non_null_counts)

Output:

Name                             8
Mass (10^24kg)                   8
Diameter (km)                    8
Density (kg/m^3)                 8
Gravity (m/s^2)                  8
Escape Velocity (km/s)           8
Rotation Period (hours)          8
Length of Day (hours)            8
Distance from Sun (10^6 km)      8
Perihelion (10^6 km)             8
Aphelion (10^6 km)               8
Orbital Period (days)            8
Orbital Velocity (km/s)          8
Orbital Inclination (degrees)    8
Orbital Eccentricity             8
Obliquity to Orbit (degrees)     8
Mean Temperature (C)             8
Surface Pressure (bars)          4
Number of Moons                  8
Has Ring System                  8
Has Global Magnetic Field        8
dtype: int64

This is especially useful when cleaning data – a low non-null count indicates a column with many missing values that may need special handling.

Counting Rows with Specific Values

To count the number of rows with a specific value in a column, you can use the value_counts() method. For instance, to count the number of planets with each possible number of moons:

moon_counts = planets_df[‘Number of Moons‘].value_counts()
print(moon_counts)

Output:

0      3
1      1
2      1
14     1
27     1
79     1
82     1
Name: Number of Moons, dtype: int64

This tells us there are 3 planets with 0 moons, 1 planet with 1 moon, 1 planet with 2 moons, and so on.

Summary

In this post, we‘ve covered several ways to count the number of rows in a pandas DataFrame:

  1. Using the len() function
  2. Using the shape attribute
  3. Using the index attribute with size or len()
  4. Using the axes attribute with size or len()
  5. Using the info() method

We also discussed performance considerations for large DataFrames and counting rows in filtered, grouped, or aggregated results.

Counting the number of rows is a fundamental operation in data analysis. With the techniques covered here, you‘re well-equipped to assess the size and dimensions of your DataFrames. Go forth and analyze!

Similar Posts