📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem.
← Back to Blog

How to select rows by column value in Pandas

Methods for efficiently selecting rows by column value

How to select rows by column value in Pandas

If you need to select rows in your pandas DataFrame based on the value of one or more columns, you’re in luck - there are several methods for accessing the data you need. Which method to use depends on performance considerations and how your data is set up.

One straightforward method is boolean indexing. Pandas.DataFrame.loc allows you to simply select rows by value:

import pandas as pd

data = pd.DataFrame({'Color': 'Tabby Black Calico Tabby Tabby Black'.split(),
                   'Name': 'Maxine Angel Delilah Tom Jeff Fluffy'.split(),
                   'Age': [2, 5, 17, 10, 7, 2]})

#select by scalar value
data.loc[data['Color'] == 'Tabby']

#select by iterable value
data.loc[data['Age'].isin([2, 5])]

Boolean indexing also allows for selection by negation, or by multiple conditions (with &, |):

#select rows where column value does NOT equal some scalar value
data.loc[data['Color'] != 'Calico']

#select rows where column value is NOT in some iterable value
data.loc[~data['Name'].isin(['Tom', 'Fluffy'])]

#select rows by multiple conditions
data.loc[(data['Color'] == 'Tabby') & (data['Age'] <= 7)]

Note: Using the loc method is usually not strictly necessary. These two lines of code accomplish the same task, with no major differences in performance:

data.loc[data['Color'] == 'Tabby']
data[data['Color'] == 'Tabby']

However, loc has the advantage of being explicit rather than implicit, which means that is often a safer choice for edge cases. Additionally, loc does have a performance advantage in certain situations, such as chained assignment.

If you are calling your boolean indexing function repeatedly, it’s more efficient to first use set_index():

#set index
data = data.set_index(['Color'])

#select by scalar value
data.loc['Tabby']

#select by iterable value
data.loc[data.index.isin(['Tabby', 'Calico'])]

For larger datasets (over approximately 200,000 rows), using the query() function may speed up your computation time:

#select by scalar value
data.query('Color == "Tabby"')

#select by iterable value
data.query('Age in [2, 5]')

Finally, in some cases, Numpy methods can offer a faster alternative to Pandas methods, although at the cost of some readability:

import numpy as np

#select by scalar value
data[data['Age'].values == 2]

#select by iterable value
data[np.in1d(data['Age'].values, [2, 5])]

To wrap up, there are a variety of ways to select DataFrame rows by column value. Boolean indexing (with or without loc) offers a quick and intuitive way to index DataFrames, especially for smaller datasets. For bigger datasets, query() is an elegant alternative. And for some use cases, it may be more efficient to skip the Pandas API altogether and use an applicable Numpy method.

Additional Resources:

Keep reading

Related articles

How to select rows by column value in Pandas
Dec 11, 2025

Choosing an MLOps Platform in 2026

How to select rows by column value in Pandas
Dec 10, 2025

SageMaker vs. Saturn Cloud: Which One Is Better for Your Team?

How to select rows by column value in Pandas
Dec 1, 2025

Production Inference at Scale with Saturn Cloud & Nebius Token Factory