📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem.
← Back to Blog

How to Filter Pandas DataFrames by Column of Strings

In this blog, explore how to filter Pandas DataFrames efficiently by a string column, leveraging the powerful data manipulation and analysis features of this widely-used Python library.

How to Filter Pandas DataFrames by Column of Strings

How to Filter Pandas DataFrames by Column of Strings

Pandas is a popular library in Python that is used extensively in data science and software engineering. It provides data structures and tools for data manipulation, analysis, and visualization. In this article, we will discuss how to filter Pandas DataFrames by a column of strings.

Introduction

Pandas DataFrames are two-dimensional labeled data structures that can hold data of different types, including strings. Filtering DataFrames by a column of strings is a common task in data science and software engineering. This can be achieved using the str attribute of a DataFrame column. The str attribute provides a set of string methods that can be used to filter, search, and manipulate strings in a DataFrame column.

Filtering by a Single String

To filter a DataFrame by a single string value in a column, we can use the str.contains() method. The str.contains() method returns a Boolean mask that can be used to select the rows that contain the specified string value in the column.

import pandas as pd

# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
        "City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)

# Filter the DataFrame by a string value in the "City" column
filtered_df = df[df["City"].str.contains("Los Angeles")]

print(filtered_df)

The output should be:

         Name         City
1        Jane  Los Angeles

In this example, we filtered the DataFrame by the string value “Los Angeles” in the “City” column using the str.contains() method.

Filtering by Multiple Strings

To filter a DataFrame by multiple string values in a column, we can use the str.contains() method with a regular expression. The regular expression can be used to match multiple strings separated by the or (|) operator.

import pandas as pd

# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
        "City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)

# Filter the DataFrame by multiple string values in the "City" column
filtered_df = df[df["City"].str.contains("Los Angeles|Chicago")]

print(filtered_df)

The output should be:

   Name         City
1  Jane  Los Angeles
2  Mary     Chicago

In this example, we filtered the DataFrame by the string values Los Angeles and Chicago in the City column using the str.contains() method with a regular expression.

Filtering by a List of Strings

To filter a DataFrame by a list of string values in a column, we can use the isin() method. The isin() method returns a Boolean mask that can be used to select the rows that contain any of the specified string values in the column.

import pandas as pd

# Create a DataFrame
data = {"Name": ["John", "Jane", "Mary", "Adam"],
        "City": ["New York", "Los Angeles", "Chicago", "Houston"]}
df = pd.DataFrame(data)

# Filter the DataFrame by a list of string values in the "City" column
filtered_df = df[df["City"].isin(["Los Angeles", "Chicago"])]

print(filtered_df)

The output should be:

   Name         City
1  Jane  Los Angeles
2  Mary     Chicago

In this example, we filtered the DataFrame by the string values Los Angeles and Chicago in the City column using the isin() method.

Conclusion

Filtering Pandas DataFrames by a column of strings is a common task in data science and software engineering. In this article, we discussed how to filter DataFrames by a single string value, multiple string values using a regular expression, and a list of string values using the str attribute and the isin() method. These methods can be used to select the rows that meet certain criteria based on the string values in a DataFrame column.

Keep reading

Related articles

How to Filter Pandas DataFrames by Column of Strings
Dec 29, 2023

How to Resolve Memory Errors in Amazon SageMaker

How to Filter Pandas DataFrames by Column of Strings
Dec 22, 2023

Loading S3 Data into Your AWS SageMaker Notebook: A Guide

How to Filter Pandas DataFrames by Column of Strings
Dec 19, 2023

How to Convert Pandas Series to DateTime in a DataFrame