# How to Convert Categorical Data to Numerical Data with Pandas

As a data scientist or software engineer, you may encounter datasets that contain categorical data. Categorical data is data that is divided into groups or categories, such as colors, types of fruit, or educational levels. To perform certain types of analyses, this data must be converted from categorical data to numerical data. In this post, we will explore how to use Pandas, a popular Python library for data manipulation and analysis, to convert categorical data to numerical data.

## What is Pandas?

Pandas is an open-source Python library that is designed for data manipulation and analysis. It provides tools for reading and writing data, as well as powerful data structures for working with tabular data. Pandas is widely used in the data science community and is a popular choice for data analysis tasks.

## Converting Categorical Data to Numerical Data

Converting categorical data to numerical data is an important step in many data analysis tasks. In Pandas, there are several ways to convert categorical data to numerical data, including the following:

### Method 1: Using the cat.codes Attribute

The easiest way to convert categorical data to numerical data in Pandas is to use the `cat.codes`

attribute. This attribute is available for categorical data types in Pandas and returns a numerical representation of each category.

Here is an example of how to use the `cat.codes`

attribute to convert categorical data to numerical data:

```
import pandas as pd
# Create a DataFrame with categorical data
df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Red', 'Green']})
# Convert categorical data to numerical data using cat.codes
df['Color'] = df['Color'].astype('category')
df['Color_Codes'] = df['Color'].cat.codes
# View the converted DataFrame
print(df)
```

The output of this code would be:

```
Color Color_Codes
0 Red 2
1 Blue 0
2 Green 1
3 Red 2
4 Green 1
```

In this example, we created a DataFrame with a column ‘Color’ that contains categorical data. We then converted this column to a categorical data type using the `astype()`

method. Finally, we used the `cat.codes`

attribute to create a new column ‘Color_Codes’ with numerical representations of each category.

### Method 2: Using the replace() Method

Another way to convert categorical data to numerical data in Pandas is to use the `replace()`

method. This method replaces each category with a specified numerical value.

Here is an example of how to use the `replace()`

method to convert categorical data to numerical data:

```
import pandas as pd
# Create a DataFrame with categorical data
df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Red', 'Green']})
# Convert categorical data to numerical data using replace
df['Color'] = df['Color'].replace({'Red': 0, 'Blue': 1, 'Green': 2})
# View the converted DataFrame
print(df)
```

The output of this code would be:

```
Color
0 0
1 1
2 2
3 0
4 2
```

In this example, we created a DataFrame with a column ‘Color’ that contains categorical data. We then used the `replace()`

method to replace each category with a specified numerical value.

### Method 3: Using the LabelEncoder Class

A third way to convert categorical data to numerical data in Pandas is to use the `LabelEncoder`

class. This class is part of the `sklearn.preprocessing`

module and provides a way to encode categorical features as a numeric array.

Here is an example of how to use the `LabelEncoder`

class to convert categorical data to numerical data:

```
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Create a DataFrame with categorical data
df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Red', 'Green']})
# Convert categorical data to numerical data using LabelEncoder
le = LabelEncoder()
df['Color'] = le.fit_transform(df['Color'])
# View the converted DataFrame
print(df)
```

The output of this code would be:

```
Color
0 2
1 0
2 1
3 2
4 1
```

In this example, we created a DataFrame with a column ‘Color’ that contains categorical data. We then used the `LabelEncoder`

class from the `sklearn.preprocessing`

module to create a new column ‘Color’ with numerical representations of each category.

## Conclusion

In this post, we explored how to use Pandas, a popular Python library for data manipulation and analysis, to convert categorical data to numerical data. We discussed three methods for converting categorical data to numerical data, including using the `cat.codes`

attribute, the `replace()`

method, and the `LabelEncoder`

class. By using these methods, you can prepare your data for analysis and gain insights that would not be possible with categorical data.

#### About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.

#### Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.

Try Saturn Cloud and join thousands of users moving to the cloud without

having to switch tools.