Exporting Pandas DataFrame into a PDF file using Python

As a data scientist or software engineer, you may often find yourself working with large amounts of data in the form of Pandas DataFrames. These DataFrames can be very useful but sometimes you may need to share your data with others who may not have access to the same tools and software that you do. In such cases exporting your DataFrame into a PDF file can be a great option. In this article, we will discuss how to export a Pandas DataFrame into a PDF file using Python.

Table of Contents

  1. Why Export Pandas DataFrame to a PDF file?
  2. Exporting Pandas DataFrame to a PDF file using Python
  3. Conclusion

Why Export Pandas DataFrame to a PDF file?

Before diving into the technical details, let’s first understand why exporting a Pandas DataFrame into a PDF file can be useful. There are several reasons why you might want to do this:

  • Sharing data with non-technical stakeholders: If you need to share your data with people who are not familiar with programming or data science, a PDF file can be a great way to present your data in a readable and accessible format.

  • Archiving data for future reference: PDF files are a popular format for archiving documents and data. By exporting your DataFrame to a PDF file, you can ensure that your data is stored in a format that is easily accessible and can be viewed by anyone.

  • Creating reports: If you need to create reports based on your data, exporting your DataFrame to a PDF file can be a great way to present your findings in a professional and readable format.

Now that we have established why exporting a Pandas DataFrame into a PDF file can be useful, let’s move on to the technical details.

Exporting Pandas DataFrame to a PDF file using Python

There are several libraries in Python that can be used to export a Pandas DataFrame into a PDF file. In this article, we will focus on using the reportlab library to create a PDF file and the Pandas library to read in the data.

Installing the required libraries

Before we can start exporting our DataFrame to a PDF file, we need to install the required libraries. To install the reportlab library, simply run the following command in your terminal:

pip install reportlab

To install the Pandas library, run the following command:

pip install pandas

Creating a PDF file

To create a PDF file using the reportlab library, we first need to import the required modules:

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.lib import colors

Next, we create a SimpleDocTemplate object, which will be used to create our PDF file:

pdf = SimpleDocTemplate("dataframe.pdf", pagesize=letter)

In this example, we are creating a PDF file named dataframe.pdf. You can choose any name you like for your PDF file.

Reading in the data

Now that we have created our PDF file object, we need to read in our data using the Pandas library. For this example, let’s assume that we have a CSV file named data.csv that contains the following data:

Alice, 25, Female
Bob, 30, Male
Charlie, 35, Male

To read this data into a dataframe, we use the following code:

import pandas as pd

data = pd.read_csv("data.csv")

Creating a table

Once we have read in our data, we can create a table using the Table module from the reportlab library:

table_data = []
for i, row in data.iterrows():
    table_data.append(list(row))

table = Table(table_data)

In this example, we are iterating over each row in our DataFrame and appending it to a list. We then create a Table object using this list.

Adding style to the table

To make our table look more professional, we can add some style to it using the TableStyle module:

table_style = TableStyle([
    ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
    ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
    ('ALIGN', (0, 0), (-1, -1), 'CENTER'),
    ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
    ('FONTSIZE', (0, 0), (-1, 0), 14),
    ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
    ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
    ('TEXTCOLOR', (0, 1), (-1, -1), colors.black),
    ('ALIGN', (0, 1), (-1, -1), 'CENTER'),
    ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
    ('FONTSIZE', (0, 1), (-1, -1), 12),
    ('BOTTOMPADDING', (0, 1), (-1, -1), 8),
])

table.setStyle(table_style)

In this example, we are adding some basic styling to our table. We are setting the background color of the first row to grey, the text color to white, and the font to Helvetica-Bold. We are also setting the font size to 14 and adding some bottom padding.

Adding the table to the PDF file

Finally, we can add our table to the PDF file using the build method of our SimpleDocTemplate object:

pdf_table = []
pdf_table.append(table)

pdf.build(pdf_table)

In this example, we are creating a list of tables and adding our table to it. We then call the build method of our SimpleDocTemplate object and pass in our list of tables.

That’s it! We have now successfully exported our Pandas DataFrame to a PDF file using Python.

1

Conclusion

Exporting a Pandas DataFrame to a PDF file can be a great way to share your data with others in a readable and accessible format. In this article, we have discussed how to export a Pandas DataFrame to a PDF file using Python. We used the reportlab library to create a PDF file and the Pandas library to read in the data. We then created a table using the Table module and added some style to it using the TableStyle module. Finally, we added our table to the PDF file and saved it. With this knowledge, you can now export your own Pandas DataFrames to PDF files and share your data with others.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.