How to Create a New Column Based on the Value of Another Column in Pandas
As a data scientist or software engineer, you may encounter situations where you need to create a new column in a pandas DataFrame based on the value of another column. This can be useful for a variety of reasons, such as calculating new metrics or transforming data for analysis. In this article, we will explore the process of creating a new column based on the value of another column in pandas.
What is Pandas?
Pandas is an open-source data analysis and manipulation library written in Python. It provides easy-to-use data structures and data analysis tools for handling tabular data. Pandas is widely used in data science and machine learning projects for data cleaning, transformation, and analysis.
How to Create a New Column Based on the Value of Another Column in Pandas
Step 1: Load Data into a Pandas DataFrame
Before we can create a new column based on the value of another column in pandas, we need to create our data into a pandas DataFrame.
import pandas as pd
# Load data into a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Marie', 'Jude'],
'Age': [25, 30, 35, 40, 67, 10],
}
df = pd.DataFrame(data)
df
Output:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40
4 Marie 67
5 Jude 10
Step 2: Create a New Column Based on the Value of Another Column
Once we have had our data into a pandas DataFrame, we can create a new column based on the value of another column using the apply()
function. The apply()
function allows us to apply a function to each row or column of a DataFrame and return a new DataFrame.
For example, let’s say we have a DataFrame with two columns, age
and gender
, and we want to create a new column called age_group
based on the age of each individual. We can create a function that takes an age value as input and returns an age group based on the following criteria:
def get_age_group(age):
if age < 18:
return 'Under 18'
elif age >= 18 and age < 25:
return '18-24'
elif age >= 25 and age < 35:
return '25-34'
elif age >= 35 and age < 45:
return '35-44'
elif age >= 45 and age < 55:
return '45-54'
elif age >= 55 and age < 65:
return '55-64'
else:
return '65+'
We can then apply this function to the age
column using the apply()
function and assign the result to a new column called age_group
.
# Apply the get_age_group function to the age column
df['Age_Group'] = df['Age'].apply(get_age_group)
Output:
Name Age Age_Group
0 Alice 25 25-34
1 Bob 30 25-34
2 Charlie 35 35-44
3 David 40 35-44
4 Marie 67 65+
5 Jude 10 Under 18
This will create a new column called age_group
in our DataFrame, which contains the age group for each individual based on their age.
Step 3: Save the Data to a New CSV File
Once we have created a new column based on the value of another column in pandas, we may want to save the data to a new CSV file for further analysis or sharing with others. We can do this using the to_csv()
function, which writes the contents of a DataFrame to a CSV file.
# Save the data to a new CSV file
data.to_csv('new_data.csv', index=False)
This will save the contents of our DataFrame to a new CSV file called new_data.csv
, without including the index column.
Conclusion
In this article, we have explored the process of creating a new column based on the value of another column in pandas. We have learned how to load data into a pandas DataFrame, create a new column using the apply()
function, and save the data to a new CSV file. By following these steps, you can easily transform and analyze your data using pandas, and create new columns based on the values of existing columns.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.