How to Get the Last N Rows of a Pandas DataFrame
As a data scientist or software engineer, working with data is a crucial part of your job. One of the most common tasks you may encounter when working with data is retrieving the last N rows of a pandas DataFrame. In this blog post, we will explore some ways to accomplish this task using pandas.
Table of Contents
- What Is a Pandas DataFrame?
- How to Get the Last N Rows of a Pandas DataFrame?
- Common Errors and Solutions
- Conclusion
What Is a Pandas DataFrame?
Before we delve into the solution, let’s first understand what a pandas DataFrame is. A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table. You can think of it as a dictionary of Series objects, where each Series represents a column of data.
Pandas is a popular data analysis library for Python, which provides powerful data manipulation and analysis capabilities. It is built on top of the NumPy library and provides easy-to-use data structures and data analysis tools.
How to Get the Last N Rows of a Pandas DataFrame?
Now, let’s get into the main topic of this blog post - retrieving the last N rows of a pandas DataFrame. There are several ways to accomplish this task, but we will focus on the two most commonly used methods.
Using the tail() Method
The first method to retrieve the last N rows of a pandas DataFrame is to use the tail() method. The tail() method returns the last N rows of a DataFrame. By default, it returns the last 5 rows, but you can pass a parameter to specify the number of rows you want to retrieve. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}
df = pd.DataFrame(data)
# Get the last 3 rows of the DataFrame using tail()
last_n_rows = df.tail(3)
print(last_n_rows)
Output:
name age city
7 Luke 60 Dubai
8 Lucy 65 Moscow
9 Tom 70 Berlin
In the above example, we created a sample DataFrame and used the tail() method to retrieve the last 3 rows of the DataFrame.
Using Slicing
Another method to retrieve the last N rows of a pandas DataFrame is to use slicing. You can use the slicing notation df[-N:]
to retrieve the last N rows of a DataFrame. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}
df = pd.DataFrame(data)
# Get the last 3 rows of the DataFrame using slicing
last_n_rows = df[-3:]
print(last_n_rows)
Output:
name age city
7 Luke 60 Dubai
8 Lucy 65 Moscow
9 Tom 70 Berlin
In the above example, we used slicing notation to retrieve the last 3 rows of the DataFrame.
Using iloc
The iloc
function allows index-based selection. You can use it to obtain the last N rows by specifying the range of indices.
import pandas as pd
# Create a sample DataFrame
data = {'name': ['John', 'Alice', 'Bob', 'Mary', 'Jane', 'Mark', 'Emma', 'Luke', 'Lucy', 'Tom'],
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
'city': ['New York', 'Paris', 'Tokyo', 'London', 'San Francisco', 'Sydney', 'Toronto', 'Dubai', 'Moscow', 'Berlin']}
df = pd.DataFrame(data)
# Get the last 3 rows of the DataFrame using iloc
last_n_rows = df.iloc[-3:]
print(last_n_rows)
Output:
name age city
7 Luke 60 Dubai
8 Lucy 65 Moscow
9 Tom 70 Berlin
Common Errors and Solutions
Error: “IndexError: index out of range”
This error occurs when the specified N is greater than the number of rows in the DataFrame. To avoid this, ensure that N is within the DataFrame’s size.
Error: “KeyError: -N”
Negative indexing errors may occur if the DataFrame has fewer than N rows. Double-check the DataFrame size before using negative indices.
Conclusion
In this blog post, we explored three ways to retrieve the last N rows of a pandas DataFrame. The first method is to use the tail() method, which returns the last N rows of a DataFrame. The second method is to use slicing notation df[-N:]
to retrieve the last N rows of a DataFrame. The last method is to use iloc
which is similar to the second one. All methods are simple and easy to use, and you can choose the one that suits your needs.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.