Tools For Working With Excel And Python

Microsoft Excel is widely used in almost every industry. Its intuitive interface and ease of use for organizing data, performing calculations, and analysis of data sets has led to it being commonly used in countless different fields globally.

Whether you’re a fan of Excel or not, at some point you will have to deal with it! For many applications, you won’t want to do complex calculations or manage large data sets in Excel itself, but you may need to take values from Excel as inputs, produce reports in an Excel format, or provide tools to Excel users. Python can be a better choice for complex tasks and fortunately, there are many tools for the Python developer to work with so Excel and Python can be used together.

This post gives an overview of some of the most popular and useful tools out there to help you choose which is the right one for your specific application.

Below there’s a feature matrix outlining the different features of the packages for calling Python from Excel.

Building Interactive Python Tools with Excel as a Front-End

Excel is a well known and really good user interface for many tasks. When you get into more complex tasks and processing larger datasets however you can soon reach the limits of what can sensibly be achieved in Excel. Python is a popular choice for data science and other disciplines as it can handle these complex cases far better than Excel alone. By using both together and recognizing the strengths of each it’s possible to build really powerful interactive tools using Excel as a user-friendly front end, with all the heavy lifting done in Python.

 

Python is an extremely powerful language with an extensive ecosystem of 3rd party libraries. Leveraging Python in Excel spreadsheets can be a fantastic way to enhance productivity and remove the need for importing and exporting data into and out of Excel. Interactive worksheets can be developed using Python code in the same way as you might use VBA, but with all of the advantages of Python.

There are a few tools available that can be used to bring Python to Excel, and it can be difficult to know which one is right for different situations. Below is an overview of each, which I hope will highlight the differences between them and help you decide which ones are right for what you need to achieve.

See the table of features along with the packages that support them below.

PyXLL – The Python Excel Add-In

PyXLL is currently the only package that enables developers to write fully featured Excel add-ins in Python. It embeds the Python interpreter into Excel so that it can be used as a complete VBA replacement. You can think of it conceptually to be similar to something like Excel-DNA for C#, except that it is dynamic and imports your Python code while Excel is running – so there’s no add-in to build and no need to restart Excel when modifying your Python code.

Using PyXLL, Python code can be used to write:

Writing a user-defined function with PyXLL requires the ‘xl_func’ decorator to be applied to a normal Python function:

from pyxll import xl_func

@xl_func

def py_test(a, b, c):

return (a + b) * c

PyXLL has a config file (pyxll.cfg) which contains a list of all the modules that will be imported when Excel starts. By adding the module above to the list in that file, PyXLL will expose the ‘py_test’ function to Excel as a user-defined function to be called from a worksheet.

Some additional features of PyXLL are:

1. Array functions

PyXLL can work with arrays of data and has support for NumPy and Pandas types. Functions returning arrays can automatically resize to avoid errors when the dimensions of a result change.

2. Real-Time Data

Stream real-time data into Excel from Python with PyXLL’s Real-Time Data feature.

3. Object Cache

For functions that return Python objects, rather than simple types (strings, numbers etc) or arrays (NumPy arrays and Pandas DataFrames or Series) PyXLL has a clever ‘object cache’. Object identifiers are returned, and when passed into another function the identifier is used to find the original object. This allows objects to be passed between Python functions using Excel formulas. This can be very useful when dealing with large data sets where the whole dataset doesn’t need to be visible in Excel all at once, but instead is passed between Python functions – for example, loading a large data set and performing some aggregation operations and presenting the aggregate results in Excel.

4. Excel Object Model

PyXLL has integration with the main COM packages, pywin32, and comtypes, which allow the entire Excel Object Model to be used from Excel macros and functions written with PyXLL. This enables anything that could be done in VBA to be done in Python. It also integrates with xlwings so that the xlwings API can also be used to read and write from Excel.

For more features take a look at the feature matrix below.

Home Page | Download PyXLL | Documentation

pywin32 / comtypes

The entire Excel API (or Object Model) is exposed via COM. Everything that can be written as a VBA macro can also be written using the Excel COM API in Python by using pywin32 or comtypes.

The Excel COM API can be used from outside of Excel (e.g. from a running Python prompt, script or Jupyter notebook). If you already know how to do something in VBA then doing the equivalent task in Python via the COM API is generally quite straightforward. Calling a routine using pywin32 or comtypes from Excel (e.g. from a button on the ribbon bar, menu item or macro) can be done using PyXLL.

The Excel Object Model is documented here and once you understand the basic differences between VBA and Python you will find it’s fairly simple to translate between the two.

To demonstrate let’s go through an example. Suppose you had the following VBA code and want to translate it into Python:

Sub Macro1()

Range("B11:K11").Select

Selection.AutoFill Destination:=Range("B11:K16"), Type:=xlFillDefault

Columns("B:K").Select

Selection.ColumnWidth = 4

End Sub

First of all, we must get the Excel Application object in Python. This code can be run from an interactive Python prompt or a Jupyter notebook, or even run inside Excel itself using PyXLL.

from win32com.client.gencache import EnsureDispatch

# Get the Excel Application COM object

xl = EnsureDispatch("Excel.Application")

Now we have the Application object we can call the Range method in the same way as the VBA code above. The first important difference to notice is that in VBA simply calling ‘Range(). Select’ calls the Select method, but in Python, we need to use ‘()’ to call the method.

xl.Range("B11:K11").Select()

The next line requires a constant, ‘xlFillDefault’. To access the same constant in Python we use the ‘win32com.client.constants’ module. Also notice that in VBA no parentheses are used when calling an object method, but in Python there are.

from win32com.client import constants

xl.Selection.AutoFill(Destination=xl.Range("B11:K16"), Type=constants.xlFillDefault)

The rest of the code is similar to those lines we’re just translated, and so the entire function looks like:

from win32com.client.gencache import EnsureDispatch
from win32com.client import constants

def Macro1():
xl = EnsureDispatch("Excel.Application")
xl.Range("B11:K11").Select()
xl.Selection.AutoFill(Destination=xl.Range("B11:K16"),
Type=constants.xlFillDefault)

xl.Columns("B:K").Select()
xl.Selection.ColumnWidth = 4

The translated Python code looks very similar to the original VBA code! Automating tasks in Excel, or just calling it interactively in this way from a Jupyter notebook can be very powerful.

This Python code could be called from Excel as a macro using PyXLL’s “@xl_macro” decorator. Instead of using EnsureDispatch, pyxll.xl_app() should be used to ensure that if there are multiple Excel processes running the correct one is returned.

xlwings

xlwings provides a wrapper around the Excel COM API described above for simplifying many common tasks, such as writing Pandas DataFrames to an open Excel workbook. It uses pywin32’s COM wrappers and gives you access to those, so you can always drop down to using the normal Excel API should you need to.

In the same way as pywin32 and comtypes, xlwings can talk to Excel from a normal Python prompt or Jupyter notebook. For calling code using xlwings from Excel itself, PyXLL provides a convenient way of getting the Excel Application object as an xlwings object. This allows you to script Excel in Python and trigger running your code from a ribbon button or menu item. An example use-case could be a ribbon button for fetching data from a database, building a report, and writing it straight into the running Excel.

The following shows how values can be read and written to a running Excel workbook, including a Pandas DataFrame.

import xlwings as xw

wb = xw.Book('workbook.xlsx')  # Open an existing Workbook
sheet = wb.sheets['Sheet1']

# read and write values from the worksheet
sheet.range('A1').value = 'Foo'
print(sheet.range('A1').value)

# Write a Pandas DataFrames directly to the Excel sheet
import pandas as pd
df = pd.DataFrame([[1,2], [3,4]], columns=['a', 'b'])

sht.range('A1').value = df

# Read the DataFrame back, using the 'expand' option to read the whole table
sht.range('A1').options(pd.DataFrame, expand='table').value

xlwings includes a way of writing user defined functions (UDFs) or worksheet functions in Python that are called from a formula in Excel, similar to the user defined functions offered by PyXLL. These rely on a server process running outside of Excel and VBA wrappers to call into that server. It’s a simple solution with some drawbacks, such as poor performance and that those functions are only available from the workbook containing the VBA wrappers.

DataNitro

DataNitro is another API to control Excel from Python. It’s not clear what the advantage over its API and the existing and well understood Microsoft Excel COM API is, but it does allow you to write and run scripts without leaving Excel. It has rudimentary support for user-defined functions (worksheet functions), but they run outside of the Excel process and only work if there is only one Excel process running.

It’s currently unknown whether DataNitro is still under active development or not but was included here for completeness.

 

Reading and Writing Excel workbooks

For some tasks, you may need to read or write an Excel file directly. For batch processing or tasks running on a server, Excel may not be installed. The following packages allow you to read and write Excel files directly without needing to use Excel.

 

OpenPyXL

For working with Excel 2010 onwards, OpenPyXL is a great all-around choice. Using OpenPyXL you can read and write xlsx, xlsm, xltx, and xltm files. The following code shows how an Excel workbook can be written as an xlsx file with a few lines of Python.

from openpyxl import Workbook
wb = Workbook()

# grab the active worksheet
ws = wb.active

# Data can be assigned directly to cells
ws['A1'] = 42

# Rows can also be appended
ws.append([1, 2, 3])

# Save the file
wb.save('sample.xlsx')

Don’t confuse OpenPyXL with PyXLL. The two are completely different and serve different purposes. OpenPyXL is a package for reading and writing Excel files, whereas PyXLL is a tool for building fully-featured Excel Add-Ins for integrating Python code into Excel.

OpenPyXL covers more advanced features of Excel such as charts, styles, number formatting, and conditional formatting. It even includes a tokenizer for parsing Excel formulas!

One really nice feature for writing reports is its built-in support for NumPy and Pandas data. To write a Pandas DataFrame all that’s required is the included ‘dataframe_to_rows’ function:

from openpyxl.utils.dataframe import dataframe_to_rows

wb = Workbook()
ws = wb.active

for r in dataframe_to_rows(df, index=True, header=True):
ws.append(r)

wb.save('pandas_openpyxl.xlsx’)

If you need to read Excel files to extract data then OpenPyXL can do that too. The Excel file types are incredibly complicated and openpyxl does an amazing job of reading them into a form that’s easy to access in Python. There are some things that openpyxl can’t load though, such as charts and images, so if you open a file and save it with the same name then some elements may be lost.

from openpyxl import load_workbook

wb = load_workbook(filename = 'book.xlsx')
sheet_ranges = wb['range names']
print(sheet_ranges['D18'].value)

A possible downside of OpenPyXL is that it can be quite slow for handling large files. If you have to write reports with thousands of rows and your application is time-sensitive then XlsxWriter or PyExcelerate may be better choices.

XlsxWriter

If you only need to write Excel workbooks and not read them then XlsxWriter is an easy to use package that works well. If you are working with large files or are particularly concerned about speed then you may find XlsxWriter a better choice than OpenPyXL.

XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file. It supports features such as formatting and many more, including:

  • 100% compatible Excel XLSX files.
  • Full formatting.
  • Merged cells.
  • Defined names.
  • Charts.
  • Autofilters.
  • Data validation and drop down lists.
  • Conditional formatting.
  • Worksheet PNG/JPEG/BMP/WMF/EMF images.
  • Rich multi-format strings.
  • Cell comments.
  • Textboxes.
  • Integration with Pandas.
  • Memory optimization mode for writing large files.

Writing Excel workbooks using XlsxWriter is simple enough. Cells can be written to using the Excel address notation (eg ‘A1’) or row and column numbers. Below is a basic example that shows creating a workbook, adding some data and saving it as an xlsx file.

import xlsxwriter

workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()

worksheet.write('A1', 'Hello world')

workbook.close()

If you are using Pandas then you’ll want to use XlsxWriter’s Pandas integration. It takes the hard work out of writing Pandas DataFrames to Excel, and even creating charts.

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Create a chart object.
chart = workbook.add_chart({'type': 'column'})

# Configure the series of the chart from the dataframe data.
chart.add_series({'values': '=Sheet1!$B$2:$B$8'})

# Insert the chart into the worksheet.
worksheet.insert_chart('D2', chart)

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

When referencing the Pandas data in the worksheet (as the formula in the chart above does), you have to figure out where the data will be in the worksheet so that the formulas point to the correct cells. For reports involving a lot of formulas or charts this can become problematic as doing something as as simple as adding an extra row requires adjusting all affected formulas. For reports like that the package ‘xltable’ can help.

XLTable

XLTable is a higher level library for building Excel reports from pandas DataFrames. Rather than writing the workbook cell by cell or row by row, whole tables are added and can include formulas that reference other tables without having to know ahead of time where those tables will be. For more complex reports involving formulas, xltable can be very useful.

The main feature that makes xltable more useful than just writing the Excel files directly is that it can handle tables with formulas that relate to cells in the workbook without having to know in advance where those tables will be placed on a worksheet. Only when all the tables have been added to the workbook and the workbook is being written are formulas resolved to their final cell addresses.

If you need to write a report that includes formulas rather than just data, XLTable makes it easier by tracking the cell references so you don’t have to construct the formulas by hand and worry about references changing when tables grow or new rows or columns are added.

from xltable import *
import pandas as pd

# create a dataframe with three columns where the last is the sum of the first two
dataframe = pd.DataFrame({
"col_1": [1, 2, 3],
"col_2": [4, 5, 6],
"col_3": Cell("col_1") + Cell("col_2"),
}, columns=["col_1", "col_2", "col_3"])

# create the named xltable Table instance
table = Table("table", dataframe)

# create the Workbook and Worksheet objects and add table to the sheet
sheet = Worksheet("Sheet1")
sheet.add_table(table)

workbook = Workbook("example.xlsx")
workbook.add_sheet(sheet)

# write the workbook to the file using xlsxwriter
workbook.to_xlsx()

XLTable can use either XlsxWriter to write an xlsx file, or it can use pywin32 (win32com) to write directly to an open Excel application (Windows only). Writing directly to Excel is good for interactive reports. For example, you could have a button in the Excel ribbon that a user could press to query some data and produce a report. By writing it directly to Excel they can get that report immediately in Excel without having it written to a file first. For details of how to customize the Excel ribbon in Excel sees PyXLL: Customizing the Ribbon.

Pandas

For working with ranges of data and reading or writing them to Excel workbooks with no frills, using pandas can be a very quick and effective method. If you don’t need much in the way of formatting and just care about getting data into or out of Excel workbooks then the pandas functions “read_excel” and “to_excel” may be just what you need.

df = pd.DataFrame([('string1', 1),
('string2', 2),
('string3', 3)],
columns=['Name', 'Value'])

# Write dataframe to an xlsx file
df.to_excel('tmp.xlsx')

For more complex tasks, XlsxWriter, OpenPyXL, and XLTable all have Pandas integration and so many of those can also be used to write Pandas DataFrames to Excel. But, for just getting data into Excel using Pandas directly as above is very convenient.

pandas homepage

xlrd/xlwt

xlrd and xlwt read and write the old Excel .xls files respectively. These are included in this list for completeness but are now really only used when you are forced to deal with the legacy xls file format. They are both extremely mature packages that are very capable and stable, but xlwt will never be extended to support the newer xlsx/xlsm file formats and so for new code dealing with modern Excel file formats they are no longer the best choice.

  • Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Leave a Comment
    Next Post

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Software & UX/UI
    How to Become a Product Designer?

    Introduction Product design is an especially high priority for startups and businesses because of the constantly growing digital world. As a result, this creates an opportunity for creative individuals to become product designers. Listed below are several techniques that can help you start a career in product design as Product Designer. Understand design prototyping tools

    4 MINUTES READ Continue Reading »
    Software & UX/UI
    Your Body Is An Interface, Part 1: Three strategies

    TL;DR: Your body is the interface between you and the world around you. The part of your brain that is reading this gets all outside information through a series of interfaces. Everything you know about the world has been filtered and translated and turned into things you are ready to believe. Everything you remember is

    5 MINUTES READ Continue Reading »
    Software & UX/UI
    A Little (More) Evolutionary Neuroanatomy, Part 2

    TL;DR: Today we’re going to finish up our very brief look at evolution. We took this little detour so that we could eventually talk about the neuropsychology of interaction with the common understanding that the way humans experience the world is ridiculously complicated. Most of us  think that basic human senses like vision or hearing

    6 MINUTES READ Continue Reading »