How to select rows by column value in Pandas

If you need to select rows in your pandas DataFrame based on the value of one or more columns, you’re in luck - there are several methods for accessing the data you need. Which method to use depends on performance considerations and how your data is set up.

One straightforward method is boolean indexing. Pandas.DataFrame.loc allows you to simply select rows by value:

import pandas as pd

data = pd.DataFrame({'Color': 'Tabby Black Calico Tabby Tabby Black'.split(),
                   'Name': 'Maxine Angel Delilah Tom Jeff Fluffy'.split(),
                   'Age': [2, 5, 17, 10, 7, 2]})

#select by scalar value
data.loc[data['Color'] == 'Tabby']

#select by iterable value
data.loc[data['Age'].isin([2, 5])]

Boolean indexing also allows for selection by negation, or by multiple conditions (with &, |):

#select rows where column value does NOT equal some scalar value
data.loc[data['Color'] != 'Calico']

#select rows where column value is NOT in some iterable value
data.loc[~data['Name'].isin(['Tom', 'Fluffy'])]

#select rows by multiple conditions
data.loc[(data['Color'] == 'Tabby') & (data['Age'] <= 7)]

Note: Using the loc method is usually not strictly necessary. These two lines of code accomplish the same task, with no major differences in performance:

data.loc[data['Color'] == 'Tabby']
data[data['Color'] == 'Tabby']

However, loc has the advantage of being explicit rather than implicit, which means that is often a safer choice for edge cases. Additionally, loc does have a performance advantage in certain situations, such as chained assignment.

If you are calling your boolean indexing function repeatedly, it’s more efficient to first use set_index():

#set index
data = data.set_index(['Color'])

#select by scalar value
data.loc['Tabby']

#select by iterable value
data.loc[data.index.isin(['Tabby', 'Calico'])]

For larger datasets (over approximately 200,000 rows), using the query() function may speed up your computation time:

#select by scalar value
data.query('Color == "Tabby"')

#select by iterable value
data.query('Age in [2, 5]')

Finally, in some cases, Numpy methods can offer a faster alternative to Pandas methods, although at the cost of some readability:

import numpy as np

#select by scalar value
data[data['Age'].values == 2]

#select by iterable value
data[np.in1d(data['Age'].values, [2, 5])]

To wrap up, there are a variety of ways to select DataFrame rows by column value. Boolean indexing (with or without loc) offers a quick and intuitive way to index DataFrames, especially for smaller datasets. For bigger datasets, query() is an elegant alternative. And for some use cases, it may be more efficient to skip the Pandas API altogether and use an applicable Numpy method.

Additional Resources:

About Saturn Cloud

Related articles

Choosing an MLOps Platform in 2026

SageMaker vs. Saturn Cloud: Which One Is Better for Your Team?

Production Inference at Scale with Saturn Cloud & Nebius Token Factory