How to Get the Last N Rows of a Pandas DataFrame

In this blog, we will learn about the essential role of working with data for data scientists and software engineers. Specifically, we will focus on a common task encountered in data manipulation, which involves retrieving the last N rows of a pandas DataFrame. Throughout this post, we will explore various methods within pandas to efficiently achieve this objective.

As a data scientist or software engineer, working with data is a crucial part of your job. One of the most common tasks you may encounter when working with data is retrieving the last N rows of a pandas DataFrame. In this blog post, we will explore some ways to accomplish this task using pandas.

Table of Contents

  1. What Is a Pandas DataFrame?
  2. How to Get the Last N Rows of a Pandas DataFrame?
  3. Common Errors and Solutions
  4. Conclusion

What Is a Pandas DataFrame?

Before we delve into the solution, let’s first understand what a pandas DataFrame is. A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table. You can think of it as a dictionary of Series objects, where each Series represents a column of data.

Pandas is a popular data analysis library for Python, which provides powerful data manipulation and analysis capabilities. It is built on top of the NumPy library and provides easy-to-use data structures and data analysis tools.

How to Get the Last N Rows of a Pandas DataFrame?

Now, let’s get into the main topic of this blog post - retrieving the last N rows of a pandas DataFrame. There are several ways to accomplish this task, but we will focus on the two most commonly used methods.

Using the tail() Method

The first method to retrieve the last N rows of a pandas DataFrame is to use the tail() method. The tail() method returns the last N rows of a DataFrame. By default, it returns the last 5 rows, but you can pass a parameter to specify the number of rows you want to retrieve. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
        'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
        'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}

df = pd.DataFrame(data)

# Get the last 3 rows of the DataFrame using tail()
last_n_rows = df.tail(3)

print(last_n_rows)

Output:

  name  age     city
7  Luke   60    Dubai
8  Lucy   65   Moscow
9   Tom   70   Berlin

In the above example, we created a sample DataFrame and used the tail() method to retrieve the last 3 rows of the DataFrame.

Using Slicing

Another method to retrieve the last N rows of a pandas DataFrame is to use slicing. You can use the slicing notation df[-N:] to retrieve the last N rows of a DataFrame. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
        'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
        'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}

df = pd.DataFrame(data)

# Get the last 3 rows of the DataFrame using slicing
last_n_rows = df[-3:]

print(last_n_rows)

Output:

  name  age     city
7  Luke   60    Dubai
8  Lucy   65   Moscow
9   Tom   70   Berlin

In the above example, we used slicing notation to retrieve the last 3 rows of the DataFrame.

Using iloc

The iloc function allows index-based selection. You can use it to obtain the last N rows by specifying the range of indices.

import pandas as pd

# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
        'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
        'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}

df = pd.DataFrame(data)

# Get the last 3 rows of the DataFrame using iloc
last_n_rows = df.iloc[-3:]
print(last_n_rows)

Output:

  name  age     city
7  Luke   60    Dubai
8  Lucy   65   Moscow
9   Tom   70   Berlin

Common Errors and Solutions

Error: “IndexError: index out of range”

This error occurs when the specified N is greater than the number of rows in the DataFrame. To avoid this, ensure that N is within the DataFrame’s size.

Error: “KeyError: -N”

Negative indexing errors may occur if the DataFrame has fewer than N rows. Double-check the DataFrame size before using negative indices.

Conclusion

In this blog post, we explored three ways to retrieve the last N rows of a pandas DataFrame. The first method is to use the tail() method, which returns the last N rows of a DataFrame. The second method is to use slicing notation df[-N:] to retrieve the last N rows of a DataFrame. The last method is to use iloc which is similar to the second one. All methods are simple and easy to use, and you can choose the one that suits your needs.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.