Python: Display All Columns of a Pandas DataFrame in '.describe()'

In the world of data science, Python’s Pandas library is a powerful tool for data manipulation and analysis. One of its most useful features is the .describe() method, which provides a summary of the central tendencies, dispersion, and shape of a dataset’s distribution. However, when working with large datasets with numerous columns, you might have noticed that not all columns are displayed. In this blog post, we’ll explore how to display all columns of a Pandas DataFrame using the .describe() method.

Setting Up Your Environment

Before we dive in, ensure you have the necessary tools installed. You’ll need Python and the Pandas library. If you haven’t installed these yet, you can do so using the following commands:

pip install python
pip install pandas

Understanding the “.describe()” Method

The .describe() method in Pandas is a convenient way to get a quick overview of your data. By default, it provides the count, mean, standard deviation, minimum, 25th percentile (Q1), median (50th percentile or Q2), 75th percentile (Q3), and maximum of the columns.

import pandas as pd

# Create a simple dataframe
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
        'B': [2, 3, 4, 5, 6],
        'C': [3, 4, 5, 6, 7],
        'D': [1, 2, 3, 4, 5],
        'E': [2, 3, 4, 5, 6],
        'F': [3, 4, 5, 6, 7],
        'G': [1, 2, 3, 4, 5],
        'H': [2, 3, 4, 5, 6],
        'I': [3, 4, 5, 6, 7],
        'J': [1, 2, 3, 4, 5],
        'K': [2, 3, 4, 5, 6],
        'L': [3, 4, 5, 6, 7],
        'M': [1, 2, 3, 4, 5],
        'N': [2, 3, 4, 5, 6],
        'O': [3, 4, 5, 6, 7],
        'P': [1, 2, 3, 4, 5],
        'Q': [2, 3, 4, 5, 6],
        'R': [3, 4, 5, 6, 7]
})

print(df.describe())

So, if your DataFrame has many columns like above, not all of them will be displayed. This is where we need to tweak our settings.

Displaying All Columns:

To display all columns, you need to adjust the Pandas display options. You can do this by setting the max_columns option to None, which tells Pandas to display as many columns as there are in the DataFrame.

pd.set_option('display.max_columns', None)

Now, when you use the .describe() method, all columns will be displayed.

print(df.describe())

Customizing the “.describe()” Method

While the default summary statistics provided by .describe() are useful, you might want to customize them to better suit your needs. You can do this by passing a list of percentiles to the percentiles parameter.

print(df.describe(percentiles=[.10, .20, .30, .40, .50, .60, .70, .80, .90]))

This will display the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th percentiles of your data.

Conclusion

The Pandas .describe() method is a powerful tool for quickly understanding your data. By adjusting the display options in Python, you can ensure that all columns are displayed, giving you a complete overview of your dataset. Remember, data understanding is a crucial step in the data science process, and tools like Pandas make this step easier and more efficient.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.