How to Split One Column into Multiple Columns in Pandas DataFrame

In this blog, we’ll discuss various techniques for breaking down a column in a Pandas DataFrame into multiple columns, a task often encountered in data science and software engineering, particularly when working with unstructured or messy data. Explore the methods to efficiently manage and extract valuable information from your data.

As a data scientist or software engineer, you may have come across the need to split a column in a Pandas DataFrame into multiple columns. This can be a common task, especially when dealing with messy orunstructured data. In this tutorial, we’ll explore different ways to split one column into multiple columns in Pandas DataFrame.

What is Pandas DataFrame?

Pandas is a popular open-source library used for data manipulation and analysis in Python. A DataFrame is a two-dimensional table-like data structure that consists of rows and columns. It is similar to a spreadsheet or SQL table, where each column can have a different data type.

Splitting a Column into Multiple Columns

Let’s start with an example. Suppose we have a Pandas DataFrame with a column named Name that contains names in the format First Last. We want to split this column into two separate columns, one for first names and one for last names.

Using the str.split() Method

One way to split a column into multiple columns is by using the str.split() method in Pandas. This method splits a string into a list of strings based on a separator.

Here’s an example of how we can use the str.split() method to split the Name column into two separate columns:

import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})

# split the Name column into two columns
df[['First Name', 'Last Name']] = df['Name'].str.split(' ', expand=True)

# view the updated DataFrame
print(df)

Output:

          Name First Name Last Name
0     John Doe       John       Doe
1   Jane Smith       Jane     Smith
2  Bob Johnson        Bob   Johnson

In the above example, we first create a sample DataFrame with the Name column. Then, we use the str.split() method to split the Name column into two columns using a space as the separator. We set the expand parameter to True to create two new columns, First Name and Last Name, and assign them to the original DataFrame using the double bracket notation.

Using the str.extract() Method

Another way to split a column into multiple columns is by using the str.extract() method in Pandas. This method extracts substrings from a string based on a regular expression.

Here’s an example of how we can use the str.extract() method to split the “Name” column into two separate columns:

import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})

# extract first and last names using a regular expression
df[['First Name', 'Last Name']] = df['Name'].str.extract('(\w+)\s(\w+)', expand=True)

# view the updated DataFrame
print(df)

Output:

          Name First Name Last Name
0     John Doe       John       Doe
1   Jane Smith       Jane     Smith
2  Bob Johnson        Bob   Johnson

In the above example, we first create a sample DataFrame with the “Name” column. Then, we use the str.extract() method to extract the first and last names using a regular expression. The regular expression (\w+)\s(\w+) matches one or more word characters followed by a space, followed by one or more word characters. We set the expand parameter to True to create two new columns, “First Name” and “Last Name”, and assign them to the original DataFrame using the double bracket notation.

Using the pd.Series.str.split() Method

The pd.Series.str.split() method is another way to split a column into multiple columns in Pandas. This method splits a string into a list of strings based on a separator and returns a new DataFrame with each element in the list as a new column.

Here’s an example of how we can use the pd.Series.str.split() method to split the “Name” column into two separate columns:

import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']})

# split the Name column into two columns using pd.Series.str.split()
df[['First Name', 'Last Name']] = df['Name'].str.split(' ', expand=True)

# view the updated DataFrame
print(df)

Output:

          Name First Name Last Name
0     John Doe       John       Doe
1   Jane Smith       Jane     Smith
2  Bob Johnson        Bob   Johnson

In the above example, we first create a sample DataFrame with the “Name” column. Then, we use the pd.Series.str.split() method to split the “Name” column into two columns using a space as the separator. The method returns a new DataFrame with each element in the list as a new column. We assign the new DataFrame to the original DataFrame using the double bracket notation.

Conclusion

In this tutorial, we explored different ways to split one column into multiple columns in Pandas DataFrame. We learned how to use the str.split() method, the str.extract() method, and the pd.Series.str.split() method to split a column into multiple columns. These methods are useful for handling messy or unstructured data and can help make data analysis more efficient and accurate.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.