Using pandas to read/write data in Excel.
In this post we’re going to explore how easy it is to read and write data in Excel using Python. There’s a few different ways to do this. We’re going to use pandas. The pandas DataFrame is the main data structure that we’re going to be working with.
The sample Excel data we’ll be using is available on Tableau’s Community page.
To load a single sheet of the Excel file into Python, we’ll use the read_excel function:
import pandas as pd
sales_data=pd.read_excel(r'C:\Users\Craig\Downloads\Sample - Superstore Sales (Excel).xls')
This loads one tab of the spreadsheet (.xls, .xlsx, or .xlsm) into a DataFrame.
In fact, if we didn’t want to download the Excel file locally, we can load it into Python directly from the URL:
Note that we can load specific sheets (sheetname), grab specific columns (parse_cols), and handle N/A values (na_values) by using the optional keyword arguments.
To load all of the sheets/tabs within an Excel file into Python, we can set sheetname=None:
sales_data_all=pd.read_excel(r'C:\Users\Craig\Downloads\Sample - Superstore Sales (Excel).xls', sheetname=None)
This will return a dictionary of DataFrames – one for each sheet.
Writing existing Python data to an Excel file is just as straightforward. If our data is already a DataFrame, we can call the pd.DataFrame.to_excel(‘filename.xlsx’) function. If not, we can just convert the data into a DataFrame and then call to_excel.
import pandas as pd
import numpy as np
This will work for the .xls, .xlsx, and .xlsm. Pandas also writer functions such as to_csv, to_sql, to_html, and a few others.
To write data on multiple sheets, we can use the pd.ExcelWriter function as shown in the pandas documentation:
with pd.ExcelWriter('filename.xlsx') as writer:
Quick Data Grabs
Try experimenting with the
functions to quickly transfer data from Excel to Python and vice-versa.
Thank you, pandas, for creating and maintaining excellent documentation.