Excel and UDF Performance Stuff

VBA Corruption Registry Fix

fastexcel — Sat, 24 Feb 2024 17:31:57 +0000

There seem to be more cases recently where the VBA in a workbook gets corrupted and is removed when the workbook is reopened.

Circumstances

Nobody seems to know exactly why this happens but there there do seem to be one or more circumstances where it is more likely to happen:

The workbook was created in an older build of Excel.
The workbook is saved frequently.
The PC either has a fresh install of Windows and has been recently updated.

Sometimes the workbook can be repaired using Excel’s repair tool, but personally I don’t like using that because you don’t know what has been repaired or changed.

Preventing VBA Corruption

For performance optimisation reasons the VBA project is usually saved and loaded in compile form as well as source form: recompiling the VBA every time the workbook loads takes time (although these days I think that the Anti-Virus VBA scanners take much longer than the compilation). It seems that sometimes this compiled form is corrupted and this causes the VBA project to be discarded on load.

Registry Fix

One way to avoid this problem is to force Excel to load and recompile the VBA from source every time it loads. You can do this with a Registry change:

With RegEdit, locate the key HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Excel\Options\
Add a new DWORD value called ForceVBALoadFromSource
Assign the value 1 to this DWORD

This will slow down the workbook opening process but I think the slowdown will not be noticeable in the majority of cases.

Force a clean recompile before saving the VBA Project

I tend to use a VBA indenter every time I make a change to my VBA: because this alters the source code it will tend to force a clean VBA compile.
I either use Stephen Bullen’s Smart Indenter or the MZTools Indenter or Rob Bovey’s venerable Code Cleaner.

For XLAM addins that are going to be distributed for installation by users I recommend this as a preventative measure: the Addins will not be saved by the user so cumulative recompiles are unlikely.

Conclusion

It may be that today’s more frequent updates to Excel are a contributing factor to this problem. In any case hopefully the approaches outlined above will help alleviate it.

Let me know if this helps you.

Sheet Calculate Change

fastexcel — Sat, 24 Feb 2024 16:40:50 +0000

Recent builds of Excel 365 (version 2308 onwards) have introduced a change to the way Sheet Calculate (Shift-F9) works.

Previously Sheet Calculate did a recalculate of all changed and volatile cells on the selected worksheet(s), but did not marke these cells as having been calculated.

Note that this only gives you “correct” results if all precedent inter-sheet and inter-workbook dependencies have already been fully calculated, and that cells on other worksheets that are dependent on the sheet(s) are not recalculated.
After a sheet calculate the formulae in the sheet, the sheet and the workbook are flagged as uncalculated (CALCULATE shows in the statusbar), unless there were no uncalculated or volatile cells.

In Manual calculation mode this meant that if you did another Sheet Calculate the same cells got calculated again.

From Version 2308 Sheet Calculate resets the dirtied and uncalculated cells on the calculated worksheet(s) as having been calculated and hence clean.

Python in Excel: Python User Defined Functions

fastexcel — Mon, 06 Nov 2023 16:00:13 +0000

This post explores using Python in Excel to create Python User Defined Functions. The example used is a function to calculate Gini coefficients (see https://fastexcel.wordpress.com/2011/05/21/fast-gini/).

Defining the UDF

Python UDFs live in the workbook global Python space and must be defined before they can be used.
Because the Python global space of a workbook cannot be accessed by other open workbooks there does not seem to currently be a way of sharing UDFs from a library or addin.

Python UDFs start with a Python def statement that defines the name of the UDF and its parameters, and ends with a return statement. The body of the UDF must be indented from the def statement.

The function is called from a PY cell:

I had a lot of trouble getting this to work correctly until I discovered that multiplying 2 dataframes works by matching the index values of the dataframes, and that sorting a dataframe by default also sorts the index so that multiply effectively ignores the sort!
This problem is solved by using ignore_index=True, which means do not sort the index (the default is false).

I sorted in descending sequence to simplify multiplying by ascending sequence number as Rank.

Performance

This Python UDF is not optimised for performance. It would be faster to do the calculation using Numpy, but the limiting factor is probably the overhead of using Python in the cloud.

It takes somewhere between 0.5 and 2 seconds for 64000 data points, depending on caching. For comparison the FastExcel SpeedTools equivalent GINICOEFF function takes 8 milliseconds.

Conclusion

It can be convenient to package chunks of Python code as functions.
If the code is going to be used in multiple PY cells creating a function simplifies maintenance.
Performance is not great, particularly when the function will be called from many cells. But because it is simple to create array functions in Python you can often mimimise the overhead of multiple calls by using an array-capable version.

The major drawback is the inability to create and share a library of Python functions. A better performing and shareable methodology for Python worksheet functions would be to use the XLL interface. One such (commercial) product is PyXLL (https://www.pyxll.com).

Python in Excel: Controlling Python Calculation

fastexcel — Thu, 02 Nov 2023 15:57:15 +0000

Because Python calculations can be slow it is worth looking at ways of controlling Python calculation.

Partial Calculation

The calculation setting that used to be called “Automatic except Data Tables” has been renamed to “Partial”. Partial Calculation works like Automatic Calculation mode except that Data Tables and Python cells are NOT automatically calculated. Pressing F9 or the Calculate Now button will trigger recalculation of any Python cells and Data Tables that need to be calculated.

Python cells within a workbook are still treated as Co-Volatile – if one PY cell calculates then all the PY cells in the workbook recalculate.

Disabling/Enabling the Python Calculations on a Worksheet

If you have some PY cells on a worksheet that you do not want to calculate (perhaps because they are very slow) but you do want to enable PY calculation on other worksheets, you can use the Worksheet.EnableCalculation property. This could give you some control over Python Co-Volatility.

You can change this property using the Developer Tab and Properties, or by sheet properties from the VBE (select the worksheet in the VBE and press F4), or by using a VBA macro.

When this property is False none of the Excel formulas or Python cells on that worksheet can be calculated.
This also means that any Python variable definitions or other actions from this sheet that affect the Global Python space will not occur.

Toggling EnableCalculation from False to True dirties all the cells on the worksheet and triggers a recalculation.

The EnableCalculation property is not preserved in a saved workbook.

Conclusion

Partial calculation mode is a useful way of controlling slow Python calculations.

Finer control can be achieved by using the Worksheet.EnableCalculation property to isolate particular worksheets within a workbook.

Python in Excel: How do Excel and Python Formulas Work Together?

fastexcel — Wed, 01 Nov 2023 18:28:28 +0000

In the previous post I showed how Python and Excel calculation methods are completely different:

Python calculates Left-to-Right and Top-to-Bottom, ignoring dependencies, and when a PY calculates then ALL PY cells calculate – Co-Volatility.
Excel calculates in dependency sequence and only calculates the cells that need recalculating.

So when Excel formulas and Python formulas depend on one another how does this all work?

Excel->Python->Excel->Python

Lets look at a simple example of inter-dependent Excel and Python formulas.

Cell A10 is referenced by PY Cell E10
Then the Excel formula in E11 references PY cell E10 and multiples it by 10
Then PY Cell E12 references Excel cell E11 and adds 10

The workbook is in Automatic Mode so when I change cell A10 to 42 Excel and Python need to recalculate.

The pink cells show the sequence in which the cells calculate (using a calculation tracing function), and there is a function in D8 that shows how many times Excel has Recalculated – it has changed from 1 to 2.

So what steps do Excel and Python actually have to do to make this interaction work?

Excel started by doing a recalculation of all the dirty Excel cells.
A calculation event fired.
Cell A10 was the first cell to be calculated and the result returned : Calculation sequence 1

But then cells E10:E12 all got deferred down the calculation chain. This typically happens when formulas are dependent on uncalculated cells. E10:E12 are uncalculated because the Python calculation has not yet happened,

Next a Python calc is started and the calculation event shows calculation as pending.
The PY cell E10 is calculated (Calculation sequence 2).
Scanning down the calculation chain Excel can now calculate E11 (calculation sequence 3) because the cell it is dependent on has been calculated.
But E12 has to be deferred again because it needs another PY calculation.

So another PY calc is started and E12 is calculated (calculation sequence 4)

Finally another calculation event fires as Done because there are no more dirty cells waiting to be calculated.

Conclusion

Amazingly this all works!
By flip-flopping between Excel calculation and Python calculation, and making extensive use of deferring calculations further down the calculation chain Excel manages to correctly handle 2 completely different calculation methods.
Testing this with much longer Xl->PY->XL->PY->Xl chains it all works, but inevitably the longer chains start getting a bit slow.

Python in Excel: PY Calculation, Globals & Co-Volatility

fastexcel — Wed, 01 Nov 2023 16:03:15 +0000

Python Calculation

Python in Excel calculates all the PY cells in a workbook sheet by sheet, and row by row, starting at the top left PY cell.

If you break this calculation sequence rule, by for instance moving cell E5 to A6, you get a circular reference or Python error:

By contrast Excel does not really care about the positions of inter-dependent cells: cell C12 has a backwards dependency on C10. To achieve this Excel uses dependency trees that show how the cells depend on one another.

And you can freely move Excel cells around: move E12 to A12 and Excel happily copes.

The Python Global Space

When you define and use variables in Python they are stored and accessed in a Global space. This is the real reason for the left-to-right top-to-bottom PY calculation sequence rule.

Cell C4 defines a Python variable called DF and sets it to 1, then the PY cell in E4 adds 5 to the variable and the PY cell in C6 adds 10 to the variable.
There are no dependencies on Excel cells. (Note the dependency arrows are black).

Now if I move the PY cell in E4 to E6 Python recalculates, and the values change because the calculation sequence has changed to C4->C6->E6.

And if I move E6 to A1 I get a Python error because the PY formula references DF before it has been defined,

If I do the equivalent in Excel then moving cells does not change the values because they are calculated in dependency sequence regardless of position:

Python Co-Volatility

In this example I have values in Cells A3:A10 and simple Python references to each of these cells in C3:C10. The Python cells will calculate top-to bottom, row by row.

Bu when I change cell A10 to 99 then ALL the PY cells will recalculate in sequence even though only cell C10 is dependent on A10 (the image shows that PY cells C3 and C4 have calculated but C5:C10 are pending a Python recalculation)

Whenever a PY cell in a workbook recalculates then ALL the PY Cells recalculate.

I call this “Co-Volatility”.

Conclusion

Whilst Python Globals and Co-volatility make it easy to split a Python script between cells there is a resulting heavy performance penalty when there are several PY cells. This can be somewhat mitigated by using the new “Partial” calculation method which makes Excel calculations automatic but Python calculations only done on request.

But it would be nice to have an option to bypass the co-volatility when the Python Global space is not being used.

Python in Excel: Round-Tripping Gotchas

fastexcel — Tue, 24 Oct 2023 17:24:51 +0000

Python is a full-blown programming language and so supports a rich variety of datatypes.
So what happens when you move data from Excel to Python and back to Excel (Round-Trip)?

Excel Datatypes

It is important to distinguish between formatted Excel values (what Excel shows you on the screen) and what the underlying value is that Excel formulas work with. These are the underlying value types (ignoring the new Rich datatypes).

Empty Cells
Floating point numbers
Text/Strings
Boolean
Error

Dates, times and datetimes are held in Excel as floating point numbers where the integer part represents the date as the number of days since January 1 1900, and the decimal part represents a fraction of 24 hours. These floating point numbers get converted to Dates and Times by the chosen formatting.

Floating point numbers may also be formatted as

Currency
Percentage
Fraction
Scientific
Different number of decimals
With thousands separators
Integers
…

Excel Tables are composed of basic Excel datatypes with some additional metadata to enable things like structured referencing.

Converting Excel Datatypes to Python Datatypes and Back to Excel

These examples show formatted Excel values converted to Python Objects and Types and then the Python objects are converted back to Excel values.
Most of these examples are self-explanatory and behave as expected.

1 The Excel value is a floating point number but gets converted to a Python Integer then back to a float.
3 The Excel value is a float but formatted as Text. Python ignores the formatting so it becomes a float.
4,5 Python ignores the Currency and Percentage Excel formatting.
7 Python does not have an Excel Error datatype.
8 Excel empty cells get converted to Python type None, which then returns to Excel as zero.

Dates and Times

The problem case is cells formatted as Excel Dates and Times. These get converted to Python DateTime/Time objects and when returned to Excel the dates have lost the decimal part and the times have lost the integer part.

Round-Tripping has failed!

This breaks Excel rules: Formatting should not change Excel values.

And changing formatting does not trigger a recalculation, so changing the format to a Date/Time or from a Date/Time will not change the corresponding Python values.

Joe McDaid points out that if you format the cell as Date & Time as in example 14 round-tripping works correctly, but I still don’t like it.

Excel Tables

Excel Tables get converted to Python Dataframes and returned as spilled arrays:

Conclusion

Round-tripping data between Excel and Python looks mostly OK.
There are some potential gotchas with empty cells, Excel errors etc. but a significant gotcha with Excel Date and Time formatted cells. This looks wrong from an Excel point of view.

I think some additional method is needed to be able to pass Date/Times to Python as floating point numbers.

Manual Calculation No Longer Viral!

fastexcel — Tue, 14 Mar 2023 13:40:11 +0000

Excel has 3 basic calculation modes: Automatic, Manual and Automatic except Tables.

But an Excel session can currently only have one Calculation mode (unless you are using FastExcel which allows open workbooks and their worksheets to have different Calculation Modes).
Excel chooses the session calculation mode from the first workbook opened (the calculation mode is stored in every saved workbook).

Viral Manual Calculation Mode in previous Excel Versions

If the first (non-Addin) workbook you open was saved in Manual mode then the Excel session stays in Manual mode even when you open another workbook that was saved in Automatic mode (Of course you can change Excel’s calculation mode using the Ribbon or VBA).

In Excel versions before Office 365 or Excel 2021:

Start Excel
Open a Manual Workbook
Open an Automatic workbook
Save the Automatic Workbook
Close Excel and then re-open the saved Automatic Workbook
Excel is now in Manual Mode, the saved workbook has been changed to Manual!

If your users are in the habit of having multiple workbooks open at the same time and one of these workbooks is in Manual mode then Manual mode will soon spread virally to the other workbooks.

Excel 365 and Excel 2021

The change in Excel 365 and Excel 2021 is not to change the calculation mode of a workbook when saving it unless the user (or VBA) has changed Excel’s calculation mode before the save.

Start Excel 365 or 2021
Open a Manual Workbook
Open an Automatic workbook
Save the Automatic Workbook
Close Excel and then re-open the saved Automatic Workbook
Excel is now in Automatic Mode, the saved workbook has NOT been changed to Manual!

Only Calculating the Active Workbook

fastexcel — Wed, 01 Mar 2023 14:39:21 +0000

When you have more than one workbook open Excel has always recalculated all of the workbooks at each recalculation. That can be very painful if one or more of the workbooks are slow to calculate.

Now the Excel Dev Team has made a significant change to this process. When in Automatic calculation mode only the active workbook and any workbooks that it links to/from will be recalculated.

This change is currently only available to Insiders in the Beta channel running Windows version 2208 build 15529 or later and Mac version 16.64 build 22081401.

Testing Active Workbook Calc

I ran some tests using Insider Excel build 16224.20000.

Workbook 1 has a volatile formula =NOW() in D1 that shows when the formula was last calculated.

Workbook 2 has a formula in A2 =A1+1 and a volatile formula =NOW() in cell C1.

Both workbooks are visible in their separate windows.

With Workbook 2 active I changed cell A1 several times over the course of a few minutes and checked that the time shown in cell C1 changed – it did. But the time shown in Workbook 1 cell D1 did not change. And when I switched the active window to Workbook 1 then cell D1 did not change: just changing the active workbook does not trigger a recalc.

Next I added a FastExcel UDF CalcSeqCountRef(C1) to book 2 and CalcSeqCountRef(D1) to book 1. CalcSeqCountRef counts the number of times the referenced cell has been recalculated.

Now the time shown in book 1 always remains in sync with book2. This is because VBA UDFs are considered as workbook links to the workbook (in this case an XLAM addin) that contains the VBA code, so Excel considered Book1 and Book2 to be interlinked.
(By the way – UDFs in an XLL addin as opposed to an XLAM addin are not considered as links)

Exceptions to Active Workbook Calc

Active Workbook Only calc will not happen if:

The workbook uses real-time streaming.
Excel is in Manual Calc mode.
You press F9 or Calculate Now.
Your VBA triggers a recalculation.

Comparison with FastExcel Active Workbook Mode

For my FastExcel product I chose to focus more on Manual Calc mode, because when you have large slow workbooks you will tend to be using Manual.

FastExcel allows you to set Active Workbook Mode so that Excel will only calculate the active workbook.
When you set Active Workbook Mode the default is that each open workbook is in Manual calculation mode, so that you have to press F9 to calculate the Active Workbook but not any other workbooks.
(For more complicated situations FastExcel allows you to set different calculation modes for each open workbook.)

Conclusion

The beauty of the new Excel Active Workbook Calc is that it is automatic – you don’t have to do anything and it works well.
But the downside is that it does not work in Manual mode, which is arguably when you need it most!

Excel Screen-updating Performance Comparison.

fastexcel — Sat, 14 Jan 2023 19:10:41 +0000

The time that Excel takes to refresh the screen display can be a significant factor in overall processing speed.

Things that affect screen updating Time

Excel only updates the visible part of the Excel windows – changing Zoom, the size of the visible window or changing to a different resolution monitor will change screen updating time.
The more cells that change within the visible windows the slower screen updating will be. So switching to an empty worksheet minimises screen update time.
Complexity of formatting – the more work Excel has to do to format visible cells the slower screen updating will be. Using heavy conditional formatting rules is the most severe cause of slow screen updating.
Updating the statusbar – frequently updating statusbar text can be slow.

My benchmarking workbook

The workbook contains one sheet (Sheet1) with 4895 formulas and 1000 constants. Sheet2 and Sheet3 are empty. The workbook does not contain any Conditional Formatting or VBA.

You can down load the workbook from here

For Range Calculate timings select Sheet1!A1:BE89 and then View – Zoom to selection so that all the 5073 cells are visible.

Excel should be in Manual calculation mode.

Screen Updating times for XL 2010, XL 2013, and XL 2021

Why are screen updating times so different by Excel Version? Some possible explanations:

In early versions of Excel parts of Excel’s rendering engine were coded in Assembler so as to maximise performance. As more function was built-in to the engine and Excel got supported on multiple end-points (Windows, IOS, Android) the use of Assembler became impractical and it became necessary to migrate as far as possible to a common cross-platform code-base.

But these changes had a drastic impact on screen-updating time, and the team decided that the only way to minimise this slow-down was to minimise the number of times the screen was updated. This turns out to be a complex task with potentially unfortunate side effects. Long-running tasks (calculation, VBA etc) running on the main Excel thread tended to show crazy looking or empty screens: Excel had cleared screen components but not yet redrawn them.

Excel 2010 has not fully implemented the minimisation of the number of screen redraws: screen update time is large.

Excel 2013 and 2021 have implemented minimising screen redraws.

I think Excel 2021 has fixes for some of the screen updating glitches, and also some features require additional work in the screen refresh, so 2021 is slightly slower than 2013.

Benchmarking screen updating time

I use 2 different methods to isolate and measure screen updating time. Both methods require the use of the Windows high-resolution timer. See https://fastexcel.wordpress.com/2011/05/31/a-quantum-of-time/

The Range Calculate method

The VBA Range Calculate method does a single threaded calculate of a range. If your VBA code switches ScreenUpdating to false then calls a timed Range.Calculate the time taken will exclude screen refresh time. Subtracting that from the Range Calculate time with ScreenUpdating true gives you screen refresh time. (Note you should be in Manual calculation mode. to avoid including the automatic recalc after Range Calculate).

First select and make visible the range of cells for which you want to measure screen update time then use the timed Range Calculate on the selection.

The CalculateFull method

The VBA Calculate Full method forces Excel to calculate every formula in the active workbooks.
Activate the worksheet that contains the formulas you want measure screen update time for. Use the high-resolution timer to measure the full calction, then switch to a completely empty sheet and repeat the Calculate Full timing. The difference between the 2 timings is the screen refresh time.

With this method you do not need to be in manual calculation mode or switch ScreenUpdating to False, but you need to do several successive full calculations to allow the multi-threaded recalc to self optimise.