How to Access a JSON Column with Pandas

As a data scientist or software engineer you may come across JSON columns in your data JSON is a popular data format used for storing and exchanging data on the web It is a lightweight textbased format that is easy to read and write

As a data scientist or software engineer, you may come across JSON columns in your data. JSON is a popular data format used for storing and exchanging data on the web. It is a lightweight, text-based format that is easy to read and write.

When working with JSON data in Python, Pandas is an excellent library to use. Pandas is a powerful data manipulation tool that provides efficient data structures for working with structured data.

In this tutorial, we will walk through the steps of accessing a JSON column with Pandas. We will cover the following topics:

  1. What is a JSON column?
  2. How to load a JSON file into Pandas
  3. How to access a JSON column in Pandas
  4. How to manipulate a JSON column in Pandas

Table of Contents

  1. What is a JSON Column?
  2. How to Load a JSON File into Pandas
  3. How to Access a JSON Column in Pandas
  4. How to Manipulate a JSON Column in Pandas
  5. Conclusion

What is a JSON Column?

A JSON column is a column in a table that contains data in JSON format. JSON stands for JavaScript Object Notation, and it is a lightweight data format that is easy to read and write.

JSON columns are commonly used in databases such as MySQL, Postgres, and MongoDB. They are used to store data that is not easily represented in a traditional tabular format.

Let’s consider the following json-formatted DF.

[
    {
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "attribute": {
            "height": 176,
            "weight": 80
        }
    },
    {
        "id": 2,
        "name": "Alice Smith",
        "age": 28,
        "attribute": {
            "height": 167,
            "weight": 55
        }
    },
    {
        "id": 3,
        "name": "Bob Johnson",
        "age": 35,
        "attribute": {
            "height": 192,
            "weight": 85
        }
    }
]

How to Load a JSON File into Pandas

Before we can access a JSON column with Pandas, we need to load the JSON data into a Pandas DataFrame. The easiest way to do this is to use the read_json function in Pandas.

Here is an example of how to load a JSON file into a Pandas DataFrame:

import pandas as pd

df = pd.read_json('data.json')
print(df)

Output:

   id         name  age                      attribute
0   1     John Doe   30  {'height': 176, 'weight': 80}
1   2  Alice Smith   28  {'height': 167, 'weight': 55}
2   3  Bob Johnson   35  {'height': 192, 'weight': 85}

In this example, we are loading a JSON file called data.json into a Pandas DataFrame called df.

How to Access a JSON Column in Pandas

Once we have loaded the JSON data into a Pandas DataFrame, we can access the JSON column using the bracket notation.

Here is an example of how to access a JSON column in Pandas:

import pandas as pd

df = pd.read_json('data.json')

json_column = df['attribute']
print(json_column)

Output:

0    {'height': 176, 'weight': 80}
1    {'height': 167, 'weight': 55}
2    {'height': 192, 'weight': 85}
Name: attribute, dtype: object

In this example, we are accessing a JSON column called attribute in the Pandas DataFrame called df.

How to Manipulate a JSON Column in Pandas

Pandas provides several methods for manipulating JSON data. Here are some common operations:

Extracting Data from a JSON Column

To extract data from a JSON column in Pandas, we can use the apply method along with a lambda function.

Here is an example of how to extract data from a JSON column in Pandas:

import pandas as pd

df = pd.read_json('data.json')

json_column = df['attribute']

extracted_data = json_column.apply(lambda x: x['height'])

print(extracte_data)

Output:

0    176
1    167
2    192
Name: attribute, dtype: int64

In this example, we are extracting data from a JSON column called json_column_name in the Pandas DataFrame called df. We are extracting the value associated with the key 'key' in the JSON data.

Adding Data to a JSON Column

To add data to a JSON column in Pandas, we can use the apply method along with a lambda function.

Here is an example of how to add data to a JSON column in Pandas:

import pandas as pd

df = pd.read_json('data.json')

json_column = df['attribute']
# Add new data to json column
json_column = json_column.apply(lambda x: {**x, 'city': 'DC'})

df['attribute'] = json_column
print(df)

Output:

   id         name  age                                    attribute
0   1     John Doe   30  {'height': 176, 'weight': 80, 'city': 'DC'}
1   2  Alice Smith   28  {'height': 167, 'weight': 55, 'city': 'DC'}
2   3  Bob Johnson   35  {'height': 192, 'weight': 85, 'city': 'DC'}

In this example, we are adding a new key-value pair to a JSON column called attribute in the Pandas DataFrame called df. We are adding the key 'city' with the value 'DC' to each JSON object in the column.

Updating Data in a JSON Column

To update data in a JSON column in Pandas, we can use the apply method along with a lambda function.

Here is an example of how to update data in a JSON column in Pandas:

import pandas as pd

df = pd.read_json('data.json')

json_column = df['attribute']
# replace a value in json column
json_column = json_column.apply(lambda x: {**x, 'city': 'New York'})

df['attribute'] = json_column

print(df)

Output:

   id         name  age                                          attribute
0   1     John Doe   30  {'height': 176, 'weight': 80, 'city': 'New York'}
1   2  Alice Smith   28  {'height': 167, 'weight': 55, 'city': 'New York'}
2   3  Bob Johnson   35  {'height': 192, 'weight': 85, 'city': 'New York'}

In this example, we are updating the value associated with the key 'city' in a JSON column called attribute in the Pandas DataFrame called df. We are updating the value to 'New York' for each JSON object in the column.

Conclusion

In this tutorial, we have covered the basics of accessing a JSON column with Pandas. We have learned how to load a JSON file into Pandas, how to access a JSON column in Pandas, and how to manipulate a JSON column in Pandas.

JSON columns are a powerful tool for storing and exchanging data in a flexible and lightweight format. With Pandas, you can easily work with JSON columns and perform common data manipulation operations.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.