Just in case you thought the previous posts on writing efficient VBA UDfs (Part1, Part2, Part3, Part 4) meant we had finished making UDFs run faster, think again – its time to explore UDF Array Formulas.
Single and Multi-Cell Array Formulas
Excel array formulas can do amazing things. They are like ordinary formulas except that you enter them with Control/Shift/Enter rather than just enter.
There are two kinds of array formulae:
- Single cell array formulae are entered into a single cell, loop through their arguments (which are often calculated arguments) and return a single answer.
- Multi-cell array formulae are entered into multiple cells and return an answer to each of the cells.
With this power comes a cost: because array formulae are doing a lot of work they can be slow to calculate (particularly single-cell array formulas).
UDF Multi-cell Array Formulas go Faster!
You can break down the time taken by a VBA UDF into these components:
- Overhead time to call the UDF.
- Time to fetch the data thats going to be used by the UDF.
- Time to do the calculations.
- Overhead time to return the answer(s).
In the post on Excel VBA Read/Write timeings you could see that there was quite a significant overhead on each VBA read and write call, so that its usually much faster to read and write large blocks of data at a time.
So it sounds like a good idea to make your VBA UDF read as much data as possible in a single block and return data to Excel in as large a block as possible.
Enter the Multi-cell array formula – it does exactly that – and also minimises the calling overhead – and often it can read the data once and re-use it lots of times.
So how do you make a Multi-Cell Array formula?
Lets create an array version of the AverageTolE function shown in the first Writing Efficient VBA UDFs post.
The scenario is that you want to find the Averages of the data excluding a number of different tolerances rather just one tolerance.
To keep things simple I am assuming that
- the tolerances are all in one row
- both the data and the tolerances will be supplied as ranges
- error-handling is largely ignored
- the function returns a row of answers that correspond to the row of tolerances.
Public Function AverageTolM(theRange As Range, theTols As Range) As Variant
Dim vArr As Variant
Dim vArrTols As Variant
Dim v As Variant
Dim d As Double
Dim r As Double
Dim k As Long
Dim vOut() As Variant
Dim dTol As Double
Dim lCount As Long
On Error GoTo FuncFail
vArr = theRange.Value2
vArrTols = theTols.Value2
ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
On Error GoTo skip
For k = 1 To UBound(vArrTols, 2)
dTol = CDbl(vArrTols(1, k))
r = 0#
lCount = 0
For Each v In vArr
d = CDbl(v)
If Abs(d) > dTol Then
r = r + d
lCount = lCount + 1
End If
skip:
Next v
vOut(1, k) = r / lCount
Next k
AverageTolM = vOut
Exit Function
FuncFail:
AverageTolM = CVErr(xlErrNA)
End Function
The changes to the UDF are quite simple:
- theTols range is coerced into a variant array: vArrTols = theTols.Value2
- an output array of the same size is created: ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
- The UDF loops on the tolerance array and populates the output array
- The output array is assigned to the function variable: AverageTolM = vOut
Note that the Function is declared as returning a variant (which will contain an array) rather than being declared as returning an array of variants.
Assuming that the data is in H27:AA27 then enter the array function with Ctrl/Shift/Enter as
{=AVERAGETOLM(Data!$A$1:$A$32000,$H$27:$AA$27)} into 20 rows (so we will get 20 x 20 = 400 cells of answers).
Calculating this 20 formulas takes 975 milliseconds.
Using the original AVERAGETOLE formula for the 400 cells takes 1660 milliseconds, an improvement factor of 1.7
Summary
- In many real-life cases using multi-cell array UDFs can be the fastest way to calculate.
- Converting a conventional UDF to a multi-cell array UDf is straightforward.
October 20, 2011 at 3:28 pm |
So I did an array UDF and I’m getting different results, time wise, when I do Application.Calculation=manual vs automatic.
When I test the time for automatic my array one goes faster, but when I don’t it goes slower than the non arrayed formulas. Also, when I compare it to the SumProduct function (the one I’m replacing) it goes twice as fast as it in automatic mode, but goes slower in manual mode.
I’m thinking it might be because one of the dependents is a volatile function (Today()). Have you experienced this? Using my own function does seem like it is faster than SumProduct but (the delay doesn’t seem quite as long now when calculating) just wondering why the difference.
October 20, 2011 at 4:43 pm |
There is a small bug with UDF array formulas that I have not yetmentioned: as it says on my website:
http://www.decisionmodels.com/calcsecretsj.htm
–
Note that Excel behaves unexpectedly when a multi-cell UDF is entered or modified and depends on volatile formulae: the UDF is evaluated once for each cell it occupies. This does not happen when the UDF is recalculated, only when it is entered or changed.
–
Maybe thats whats happening to you?
October 20, 2011 at 6:40 pm |
Thanks, that page was very useful.
January 31, 2012 at 12:11 am |
Hi! First of all congratulations for your great job in this page!
I did an UDF array formula but I don’t know how to improve this in order to enhance time in calculation.
Basically, I need to compare one specific cell with an array that I don’t know previously its dimensions (I just know where it starts and that just have only one column). The minimum difference between these values (the cell and all array values) will be the final answer for me. The problem here is that I need to use this UDF array formula in 35040 cells (every 15 minutes of a year) and the calculation time in a normal pc is too slow.
The code of that UDF that I am talking is following:
Function MinofDiff(r1 As Long) As Variant
Dim r2 As Range
Dim TempDif As Variant
Dim TempDif1 As Variant
Dim j As Long
Dim LastRow As Long
On Error GoTo FuncFail
If r1 = 0 Then GoTo skip
With Sheets(“Dados”)
LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
Set r2 = .Range(“P8″, “P” & LastRow)
End With
TempDif1 = Application.Max(r2)
For j = 1 To LastRow – 7
If r1 >= r2(j) Then
TempDif = r1 – r2(j)
Else
TempDif = r1
End If
MinofDiff = Application.Min(TempDif, TempDif1)
TempDif1 = MinofDiff
Next j
skip:
Exit Function
FuncFail:
AverageTolM = CVErr(xlErrNA)
End Function
Please, could you help me to find some problems in writing of this short code and how can I gain time with that? I will really appreciate that. Thanks in advance.
Best Regards,
Pedro
January 31, 2012 at 2:12 pm |
Pedro, Good question for StackOverflow. You need to use arrays instead of ranges to loop through the data. Using ranges goes really slow. Also, you should look at more material written by Charles he has a ton of good information and if you follow his advice (i.e., read what he has written) you’ll see.
Here’s an answer I gave that shows you how to use arrays that you grab from excel:
http://stackoverflow.com/a/7263655/632495
January 31, 2012 at 5:30 pm |
Hi Pedro,
I thought this made a nice example using the techniques outlined in a number of previous posts, so I made another post for you!
January 31, 2012 at 5:19 pm |
Try this…
‘—
Function MinofDiff_R(r1 As Long) As Variant
On Error GoTo FuncFail
Dim r2 As Excel.Range
Dim TempDif As Long
Dim TempDif1 As Long
Dim j As Long
Dim LastRow As Long
Dim vRngValues As Variant
If r1 = 0 Then GoTo skip
With ActiveSheet
LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
Set r2 = .Range(“P8″, .Cells(LastRow, 16))
End With
vRngValues = r2.Value2
TempDif1 = Excel.WorksheetFunction.Max(r2)
For j = 1 To LastRow – 7
If r1 >= vRngValues(j, 1) Then
TempDif = r1 – vRngValues(j, 1)
Else
TempDif = r1
End If
If TempDif < TempDif1 Then MinofDiff_R = TempDif Else MinofDiff_R = TempDif1
TempDif1 = MinofDiff_R
Next 'j
skip:
Exit Function
FuncFail:
MinofDiff_R = CVErr(xlErrNA)
End Function
'—
January 31, 2012 at 5:32 pm |
Hi Jim,
Looks good to me: I made an array formula version for my latest post as an example.
February 1, 2012 at 8:39 pm |
Thank you so much Jim! It works fine as the array formula version that “fastexcel” did in his latest post. At least I solved the overflow problem that I had in my version.
February 29, 2012 at 10:43 pm |
Can we import an access data table/query through EXCEL UDF
February 29, 2012 at 11:27 pm |
Yes you can: use ADO or DAO
March 29, 2012 at 3:28 pm |
Do you why array formulas for Excel functions are slower than regular formulas for Excel? Is this always the case? Chandoo.org was doing a fast Excel week and that is one of the suggestions, don’t use array formulas for Excel functions. I tested one formula and came up with it being slower for built in Excel functions, of course, for UDFs it was much faster to do the arrays.
March 29, 2012 at 3:37 pm |
[...] at FastExcel the answer was given why UDF arrays go faster to the question I had on why UDF array formulas are faster than regular formulas. I had forgotten [...]