Python in Excel: Round-Tripping Gotchas

Python is a full-blown programming language and so supports a rich variety of datatypes.
So what happens when you move data from Excel to Python and back to Excel (Round-Trip)?

Excel Datatypes

It is important to distinguish between formatted Excel values (what Excel shows you on the screen) and what the underlying value is that Excel formulas work with. These are the underlying value types (ignoring the new Rich datatypes).

  • Empty Cells
  • Floating point numbers
  • Text/Strings
  • Boolean
  • Error

Dates, times and datetimes are held in Excel as floating point numbers where the integer part represents the date as the number of days since January 1 1900, and the decimal part represents a fraction of 24 hours. These floating point numbers get converted to Dates and Times by the chosen formatting.

Floating point numbers may also be formatted as

  • Currency
  • Percentage
  • Fraction
  • Scientific
  • Different number of decimals
  • With thousands separators
  • Integers

Excel Tables are composed of basic Excel datatypes with some additional metadata to enable things like structured referencing.

Converting Excel Datatypes to Python Datatypes and Back to Excel

These examples show formatted Excel values converted to Python Objects and Types and then the Python objects are converted back to Excel values.
Most of these examples are self-explanatory and behave as expected.

1 The Excel value is a floating point number but gets converted to a Python Integer then back to a float.
3 The Excel value is a float but formatted as Text. Python ignores the formatting so it becomes a float.
4,5 Python ignores the Currency and Percentage Excel formatting.
7 Python does not have an Excel Error datatype.
8 Excel empty cells get converted to Python type None, which then returns to Excel as zero.

The problem case is cells formatted as Excel Dates and Times. These get converted to Python DateTime/Time objects and when returned to Excel the dates have lost the decimal part and the times have lost the integer part.

This breaks Excel rules: Formatting should not change Excel values.

And changing formatting does not trigger a recalculation, so changing the format to a Date/Time or from a Date/Time will not change the corresponding Python values.

Joe McDaid points out that if you format the cell as Date & Time as in example 14 round-tripping works correctly, but I still don’t like it.

Excel Tables

Excel Tables get converted to Python Dataframes and returned as spilled arrays:

Conclusion

Round-tripping data between Excel and Python looks mostly OK.
There are some potential gotchas with empty cells, Excel errors etc. but a significant gotcha with Excel Date and Time formatted cells. This looks wrong from an Excel point of view.

I think some additional method is needed to be able to pass Date/Times to Python as floating point numbers.

This entry was posted in Calculation, Python and tagged , . Bookmark the permalink.

Leave a comment