I was looking at some multi-cell array formula UDFs with John and Rich and could not understand why they seemed so much slower than I expected.
Each UDF (written as C++ XLLs) read data from around 200 cells using around 40 parameters, did some very simple logic and arithmetic and then returned values to a different 200 cells. Each of these UDF was taking around 16-20 milliseconds and the workbook contained several thousand calls to these UDFs.
So I started to investigate:
First Hypothesis: Maybe marshalling input data from multiple parameters is slow?
We know that the way to speed up reading data from Excel Cells is to read data in as large a block as possible rather than in many small chunks. So maybe the number of parameters was the problem. To test this I wrote a couple of XLL functions.
MarshallAll takes a single parameter of a range of cells and returns the number of columns. The parameter is type Q so arrives in the function as values rather than a reference.
MarshallMany takes 40 parameters (also type Q) and returns a constant value of 88.
CXlOper* MarshallAll_Impl(CXlOper& xloResult, const CXlOper* Arg1) { long nCols=0; nCols=Arg1->GetWidth2(); xloResult=(double)nCols; return xloResult.Ret(); } CXlOper* MarshallMany_Impl(CXlOper& xloResult, const CXlOper* Arg1, const CXlOper* Arg2, const CXlOper* Arg3, const CXlOper* Arg4, const CXlOper* Arg5, const CXlOper* Arg6, const CXlOper* Arg7, const CXlOper* Arg8, const CXlOper* Arg9, const CXlOper* Arg10, const CXlOper* Arg11, const CXlOper* Arg12, const CXlOper* Arg13, const CXlOper* Arg14, const CXlOper* Arg15, const CXlOper* Arg16, const CXlOper* Arg17, const CXlOper* Arg18, const CXlOper* Arg19, const CXlOper* Arg20, const CXlOper* Arg21, const CXlOper* Arg22, const CXlOper* Arg23, const CXlOper* Arg24, const CXlOper* Arg25, const CXlOper* Arg26, const CXlOper* Arg27, const CXlOper* Arg28, const CXlOper* Arg29, const CXlOper* Arg30, const CXlOper* Arg31, const CXlOper* Arg32, const CXlOper* Arg33, const CXlOper* Arg34, const CXlOper* Arg35, const CXlOper* Arg36, const CXlOper* Arg37, const CXlOper* Arg38, const CXlOper* Arg39, const CXlOper* Arg40) { xloResult=88.0; return xloResult.Ret(); }
But when I compared the execution times of these functions they were both fast and there was not much difference in timings, although it was faster to read as many cells as possible with each parameter.
So hypothesis 1 failed.
I checked how the time taken varied with the number of cells read and their datatype.
As expected large strings take longer than small strings which take more time than numbers. And it is more efficient to read as many cells as possible for each parameter.
Second Hypothesis: Maybe returning results to multiple cells is slow?
We know that writing data back to Excel cells from VBA is significantly slower than reading data from cells.
(see VBA read-write speed and Getting cell data with VBA and C++)
I wrote another simple XLL UDF: MarshallOut. This took a single argument of a range and returned it.
CXlOper* MarshallOut_Impl(CXlOper& xloResult, const CXlOper* Arg1) { xloResult=*Arg1; return xloResult.Ret(); }
Bingo: returning data to multiple cells is comparatively slow.
I used FastExcel Profile Formulas to time the tests:
Reading and returning a 255 character string to each of 100 cells takes 13 milliseconds.
Notice that the time taken is NOT linear with respect to the number of cells.
Multi-Threading Effects
I also noticed that FastExcel Workbook Profiler showed that these functions were making inefficient use of multi-threaded recalculation. Presumably this is because they need an exclusive lock on Excel’s results table whilst writing out the results, and most of the time used is doing exactly that.
By contrast, the first set of “read-many cells but write one cell” functions made efficient use of multi-threading.
Comparing XLL functions with VBA Functions.
I did a small number of comparison with VBA equivalents of these XLL functions. The VBA functions showed the same kind of performance behaviour as the XLL functions and were slightly slower. Of course VBA functions cannot multi-thread.
Conclusions
- Large numbers of array formulas that return results to multiple cells can be slow.
- Multi-cell array formulas do not multi-thread efficiently whilst writing their results back.
- It is more efficient to have larger ranges and fewer parameters than many parameters referencing small ranges.