Python in Excel: Python User Defined Functions

This post explores using Python in Excel to create Python User Defined Functions. The example used is a function to calculate Gini coefficients (see https://fastexcel.wordpress.com/2011/05/21/fast-gini/).

Defining the UDF

Python UDFs live in the workbook global Python space and must be defined before they can be used.
Because the Python global space of a workbook cannot be accessed by other open workbooks there does not seem to currently be a way of sharing UDFs from a library or addin.

Python UDFs start with a Python def statement that defines the name of the UDF and its parameters, and ends with a return statement. The body of the UDF must be indented from the def statement.

The function is called from a PY cell:

I had a lot of trouble getting this to work correctly until I discovered that multiplying 2 dataframes works by matching the index values of the dataframes, and that sorting a dataframe by default also sorts the index so that multiply effectively ignores the sort!
This problem is solved by using ignore_index=True, which means do not sort the index (the default is false).

I sorted in descending sequence to simplify multiplying by ascending sequence number as Rank.

Performance

This Python UDF is not optimised for performance. It would be faster to do the calculation using Numpy, but the limiting factor is probably the overhead of using Python in the cloud.

It takes somewhere between 0.5 and 2 seconds for 64000 data points, depending on caching. For comparison the FastExcel SpeedTools equivalent GINICOEFF function takes 8 milliseconds.

Conclusion

It can be convenient to package chunks of Python code as functions.
If the code is going to be used in multiple PY cells creating a function simplifies maintenance.
Performance is not great, particularly when the function will be called from many cells. But because it is simple to create array functions in Python you can often mimimise the overhead of multiple calls by using an array-capable version.

The major drawback is the inability to create and share a library of Python functions. A better performing and shareable methodology for Python worksheet functions would be to use the XLL interface. One such (commercial) product is PyXLL (https://www.pyxll.com).

This entry was posted in Calculation, Excel, Python, UDF and tagged , , , . Bookmark the permalink.

Leave a comment