Writing a Pandas Dataframe to MySQL
As a data scientist or software engineer, you may often find yourself working with large datasets that need to be stored and accessed in a relational database management system (RDBMS) such as MySQL. One of the most popular ways to work with data in Python is to use the Pandas library, which provides tools for data manipulation and analysis. In this article, we will discuss how to write a Pandas dataframe to MySQL using Python.
What Is Pandas?
Pandas is a powerful Python library for data manipulation and analysis. It provides data structures such as Series and DataFrame that allow you to work with structured data in a tabular format. You can use Pandas to load data from various sources, such as CSV files, Excel files, SQL databases, and more. Pandas also provides tools for data cleaning, transformation, and aggregation.
What Is MySQL?
MySQL is a popular open-source RDBMS that is widely used for web applications and data-driven applications. It allows you to store and retrieve data in a structured format using SQL (Structured Query Language). MySQL is known for its reliability, scalability, and performance.
Writing a Pandas Dataframe to MySQL
To write a Pandas dataframe to MySQL, you will need to perform the following steps:
- Connect to the MySQL database
- Create a table in the database to store the data
- Convert the Pandas dataframe to a format that can be inserted into the MySQL table
- Insert the data into the MySQL table
Let’s go through each step in detail.
Step 1: Connect to the MySQL database
To connect to a MySQL database in Python, you can use the mysql-connector-python
package. This package provides a Python interface for connecting to MySQL databases using the MySQL Connector/Python API.
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
In the above code snippet, we are connecting to a MySQL database running on the local machine with a username and password. Replace yourusername
, yourpassword
, and mydatabase
with your own values.
Step 2: Create a table in the database to store the data
Before we can insert data into the MySQL database, we need to create a table to store the data. This table should have the same structure as the Pandas dataframe. You can use the following code to create a table:
mycursor = mydb.cursor()
mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")
In the above code, we are creating a table called customers
with three columns: id
, name
, and address
. The id
column is an auto-incrementing primary key.
Step 3: Convert the Pandas DataFrame to a Format for MySQL Table Insertion
To insert data from a Pandas DataFrame into a MySQL table, the DataFrame needs to be converted into a suitable format for the MySQL table. This can be accomplished using the to_sql()
method provided by Pandas. Here’s an updated explanation and code example:
# Import necessary libraries
import pandas as pd
from sqlalchemy import create_engine
# Step 1: Create a DataFrame with the data
data = {'name': ['Alice', 'Bob'],
'address': ['Street 123', 'Avenue 456']}
df = pd.DataFrame(data)
# Step 2: Create a SQLAlchemy engine to connect to the MySQL database
engine = create_engine("mysql+mysqlconnector://root:new_password@localhost/mydatabase")
# Step 3: Convert the Pandas DataFrame to a format for MySQL table insertion
df.to_sql('customers', con=engine, if_exists='append', index=False)
Output:
+----+-------+------------+
| id | name | address |
+----+-------+------------+
| 1 | Alice | Street 123 |
| 2 | Bob | Avenue 456 |
+----+-------+------------+
In the above code:
Step 1: The DataFrame df is directly created with the data.
Step 2: Creates a SQLAlchemy engine to establish a connection to the MySQL database. Replace “root,” “password,” “localhost,” and “mydatabase” with your actual MySQL username, password, host, and database name, respectively.
Step 3: Uses the to_sql()
method to insert the data from the Pandas DataFrame into the “customers” table in the MySQL database. The if_exists
parameter is set to ‘append’ to add the data to the existing table. The index
parameter is set to False to exclude the Pandas index from being inserted into the MySQL table.
Make sure to customize the MySQL connection details according to your database configuration.
Step 4: Insert the data into the MySQL table
Now that we have converted the Pandas dataframe into a format that can be inserted into the MySQL table, we can insert the data into the table using SQL.
mycursor = mydb.cursor()
sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)
mydb.commit()
print(mycursor.rowcount, "record inserted.")
Output:
1 record inserted.
+----+-------+------------+
| id | name | address |
+----+-------+------------+
| 1 | Alice | Street 123 |
| 2 | Bob | Avenue 456 |
| 3 | John | Highway 21 |
+----+-------+------------+
In the above code, we are using the execute()
method of the MySQL cursor object to insert a record into the customers
table. The sql
variable contains the SQL query to insert a record into the table. The val
variable contains the values to be inserted into the table. After executing the query, we commit the changes to the database using the commit()
method.
Conclusion
In this article, we have discussed how to write a Pandas dataframe to MySQL using Python. We covered the basic steps involved in connecting to a MySQL database, creating a table, converting a Pandas dataframe to a format that can be inserted into the MySQL table, and inserting the data into the MySQL table using SQL. By following these steps, you can easily store and retrieve data from a MySQL database using Pandas in Python.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.