Writing a Pandas Dataframe to MySQL

As a data scientist or software engineer you may often find yourself working with large datasets that need to be stored and accessed in a relational database management system RDBMS such as MySQL One of the most popular ways to work with data in Python is to use the Pandas library which provides tools for data manipulation and analysis In this article we will discuss how to write a Pandas dataframe to MySQL using Python

As a data scientist or software engineer, you may often find yourself working with large datasets that need to be stored and accessed in a relational database management system (RDBMS) such as MySQL. One of the most popular ways to work with data in Python is to use the Pandas library, which provides tools for data manipulation and analysis. In this article, we will discuss how to write a Pandas dataframe to MySQL using Python.

What Is Pandas?

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures such as Series and DataFrame that allow you to work with structured data in a tabular format. You can use Pandas to load data from various sources, such as CSV files, Excel files, SQL databases, and more. Pandas also provides tools for data cleaning, transformation, and aggregation.

What Is MySQL?

MySQL is a popular open-source RDBMS that is widely used for web applications and data-driven applications. It allows you to store and retrieve data in a structured format using SQL (Structured Query Language). MySQL is known for its reliability, scalability, and performance.

Writing a Pandas Dataframe to MySQL

To write a Pandas dataframe to MySQL, you will need to perform the following steps:

  1. Connect to the MySQL database
  2. Create a table in the database to store the data
  3. Convert the Pandas dataframe to a format that can be inserted into the MySQL table
  4. Insert the data into the MySQL table

Let’s go through each step in detail.

Step 1: Connect to the MySQL database

To connect to a MySQL database in Python, you can use the mysql-connector-python package. This package provides a Python interface for connecting to MySQL databases using the MySQL Connector/Python API.

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

In the above code snippet, we are connecting to a MySQL database running on the local machine with a username and password. Replace yourusername, yourpassword, and mydatabase with your own values.

Step 2: Create a table in the database to store the data

Before we can insert data into the MySQL database, we need to create a table to store the data. This table should have the same structure as the Pandas dataframe. You can use the following code to create a table:

mycursor = mydb.cursor()

mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")

In the above code, we are creating a table called customers with three columns: id, name, and address. The id column is an auto-incrementing primary key.

Step 3: Convert the Pandas DataFrame to a Format for MySQL Table Insertion

To insert data from a Pandas DataFrame into a MySQL table, the DataFrame needs to be converted into a suitable format for the MySQL table. This can be accomplished using the to_sql() method provided by Pandas. Here’s an updated explanation and code example:

# Import necessary libraries
import pandas as pd
from sqlalchemy import create_engine

# Step 1: Create a DataFrame with the data
data = {'name': ['Alice', 'Bob'],
        'address': ['Street 123', 'Avenue 456']}
df = pd.DataFrame(data)

# Step 2: Create a SQLAlchemy engine to connect to the MySQL database
engine = create_engine("mysql+mysqlconnector://root:new_password@localhost/mydatabase")

# Step 3: Convert the Pandas DataFrame to a format for MySQL table insertion
df.to_sql('customers', con=engine, if_exists='append', index=False)

Output:

+----+-------+------------+
| id | name  | address    |
+----+-------+------------+
|  1 | Alice | Street 123 |
|  2 | Bob   | Avenue 456 |
+----+-------+------------+

In the above code:

Step 1: The DataFrame df is directly created with the data.

Step 2: Creates a SQLAlchemy engine to establish a connection to the MySQL database. Replace “root,” “password,” “localhost,” and “mydatabase” with your actual MySQL username, password, host, and database name, respectively.

Step 3: Uses the to_sql() method to insert the data from the Pandas DataFrame into the “customers” table in the MySQL database. The if_exists parameter is set to ‘append’ to add the data to the existing table. The index parameter is set to False to exclude the Pandas index from being inserted into the MySQL table.

Make sure to customize the MySQL connection details according to your database configuration.

Step 4: Insert the data into the MySQL table

Now that we have converted the Pandas dataframe into a format that can be inserted into the MySQL table, we can insert the data into the table using SQL.

mycursor = mydb.cursor()

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")

mycursor.execute(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted.")

Output:

1 record inserted.

+----+-------+------------+
| id | name  | address    |
+----+-------+------------+
|  1 | Alice | Street 123 |
|  2 | Bob   | Avenue 456 |
|  3 | John  | Highway 21 |
+----+-------+------------+

In the above code, we are using the execute() method of the MySQL cursor object to insert a record into the customers table. The sql variable contains the SQL query to insert a record into the table. The val variable contains the values to be inserted into the table. After executing the query, we commit the changes to the database using the commit() method.

Conclusion

In this article, we have discussed how to write a Pandas dataframe to MySQL using Python. We covered the basic steps involved in connecting to a MySQL database, creating a table, converting a Pandas dataframe to a format that can be inserted into the MySQL table, and inserting the data into the MySQL table using SQL. By following these steps, you can easily store and retrieve data from a MySQL database using Pandas in Python.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.