How to Split One Column into Multiple Columns in Pandas DataFrame
As a data scientist or software engineer, you may have come across the need to split a column in a Pandas DataFrame into multiple columns. This can be a common task, especially when dealing with messy orunstructured data. In this tutorial, we’ll explore different ways to split one column into multiple columns in Pandas DataFrame.
What is Pandas DataFrame?
Pandas is a popular open-source library used for data manipulation and analysis in Python. A DataFrame is a two-dimensional table-like data structure that consists of rows and columns. It is similar to a spreadsheet or SQL table, where each column can have a different data type.
Splitting a Column into Multiple Columns
Let’s start with an example. Suppose we have a Pandas DataFrame with a column named Name
that contains names in the format First Last
. We want to split this column into two separate columns, one for first names and one for last names.
Using the str.split()
Method
One way to split a column into multiple columns is by using the str.split()
method in Pandas. This method splits a string into a list of strings based on a separator.
Here’s an example of how we can use the str.split()
method to split the Name
column into two separate columns:
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})
# split the Name column into two columns
df[['First Name', 'Last Name']] = df['Name'].str.split(' ', expand=True)
# view the updated DataFrame
print(df)
Output:
Name First Name Last Name
0 John Doe John Doe
1 Jane Smith Jane Smith
2 Bob Johnson Bob Johnson
In the above example, we first create a sample DataFrame with the Name
column. Then, we use the str.split()
method to split the Name
column into two columns using a space as the separator. We set the expand parameter to True to create two new columns, First Name
and Last Name
, and assign them to the original DataFrame using the double bracket notation.
Using the str.extract()
Method
Another way to split a column into multiple columns is by using the str.extract()
method in Pandas. This method extracts substrings from a string based on a regular expression.
Here’s an example of how we can use the str.extract()
method to split the “Name” column into two separate columns:
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})
# extract first and last names using a regular expression
df[['First Name', 'Last Name']] = df['Name'].str.extract('(\w+)\s(\w+)', expand=True)
# view the updated DataFrame
print(df)
Output:
Name First Name Last Name
0 John Doe John Doe
1 Jane Smith Jane Smith
2 Bob Johnson Bob Johnson
In the above example, we first create a sample DataFrame with the “Name” column. Then, we use the str.extract() method to extract the first and last names using a regular expression. The regular expression (\w+)\s(\w+) matches one or more word characters followed by a space, followed by one or more word characters. We set the expand parameter to True to create two new columns, “First Name” and “Last Name”, and assign them to the original DataFrame using the double bracket notation.
Using the pd.Series.str.split()
Method
The pd.Series.str.split()
method is another way to split a column into multiple columns in Pandas. This method splits a string into a list of strings based on a separator and returns a new DataFrame with each element in the list as a new column.
Here’s an example of how we can use the pd.Series.str.split()
method to split the “Name” column into two separate columns:
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})
# split the Name column into two columns using pd.Series.str.split()
df[['First Name', 'Last Name']] = df['Name'].str.split(' ', expand=True)
# view the updated DataFrame
print(df)
Output:
Name First Name Last Name
0 John Doe John Doe
1 Jane Smith Jane Smith
2 Bob Johnson Bob Johnson
In the above example, we first create a sample DataFrame with the “Name” column. Then, we use the pd.Series.str.split()
method to split the “Name” column into two columns using a space as the separator. The method returns a new DataFrame with each element in the list as a new column. We assign the new DataFrame to the original DataFrame using the double bracket notation.
Conclusion
In this tutorial, we explored different ways to split one column into multiple columns in Pandas DataFrame. We learned how to use the str.split() method, the str.extract()
method, and the pd.Series.str.split()
method to split a column into multiple columns. These methods are useful for handling messy or unstructured data and can help make data analysis more efficient and accurate.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.