Just in case you thought the previous posts on writing efficient VBA UDfs (Part1, Part2, Part3, Part 4) meant we had finished making UDFs run faster, think again – its time to explore UDF Array Formulas.
Single and Multi-Cell Array Formulas
Excel array formulas can do amazing things. They are like ordinary formulas except that you enter them with Control/Shift/Enter rather than just enter.
There are two kinds of array formulae:
- Single cell array formulae are entered into a single cell, loop through their arguments (which are often calculated arguments) and return a single answer.
- Multi-cell array formulae are entered into multiple cells and return an answer to each of the cells.
With this power comes a cost: because array formulae are doing a lot of work they can be slow to calculate (particularly single-cell array formulas).
UDF Multi-cell Array Formulas go Faster!
You can break down the time taken by a VBA UDF into these components:
- Overhead time to call the UDF.
- Time to fetch the data thats going to be used by the UDF.
- Time to do the calculations.
- Overhead time to return the answer(s).
In the post on Excel VBA Read/Write timeings you could see that there was quite a significant overhead on each VBA read and write call, so that its usually much faster to read and write large blocks of data at a time.
So it sounds like a good idea to make your VBA UDF read as much data as possible in a single block and return data to Excel in as large a block as possible.
Enter the Multi-cell array formula – it does exactly that – and also minimises the calling overhead – and often it can read the data once and re-use it lots of times.
So how do you make a Multi-Cell Array formula?
Lets create an array version of the AverageTolE function shown in the first Writing Efficient VBA UDFs post.
The scenario is that you want to find the Averages of the data excluding a number of different tolerances rather just one tolerance.
To keep things simple I am assuming that
- the tolerances are all in one row
- both the data and the tolerances will be supplied as ranges
- error-handling is largely ignored
- the function returns a row of answers that correspond to the row of tolerances.
Public Function AverageTolM(theRange As Range, theTols As Range) As Variant Dim vArr As Variant Dim vArrTols As Variant Dim v As Variant Dim d As Double Dim r As Double Dim k As Long Dim vOut() As Variant Dim dTol As Double Dim lCount As Long On Error GoTo FuncFail vArr = theRange.Value2 vArrTols = theTols.Value2 ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2)) On Error GoTo skip For k = 1 To UBound(vArrTols, 2) dTol = CDbl(vArrTols(1, k)) r = 0# lCount = 0 For Each v In vArr d = CDbl(v) If Abs(d) > dTol Then r = r + d lCount = lCount + 1 End If skip: Next v vOut(1, k) = r / lCount Next k AverageTolM = vOut Exit Function FuncFail: AverageTolM = CVErr(xlErrNA) End Function
The changes to the UDF are quite simple:
- theTols range is coerced into a variant array: vArrTols = theTols.Value2
- an output array of the same size is created: ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
- The UDF loops on the tolerance array and populates the output array
- The output array is assigned to the function variable: AverageTolM = vOut
Note that the Function is declared as returning a variant (which will contain an array) rather than being declared as returning an array of variants.
Assuming that the data is in H27:AA27 then enter the array function with Ctrl/Shift/Enter as
{=AVERAGETOLM(Data!$A$1:$A$32000,$H$27:$AA$27)} into 20 rows (so we will get 20 x 20 = 400 cells of answers).
Calculating this 20 formulas takes 975 milliseconds.
Using the original AVERAGETOLE formula for the 400 cells takes 1660 milliseconds, an improvement factor of 1.7
Summary
- In many real-life cases using multi-cell array UDFs can be the fastest way to calculate.
- Converting a conventional UDF to a multi-cell array UDf is straightforward.
So I did an array UDF and I’m getting different results, time wise, when I do Application.Calculation=manual vs automatic.
When I test the time for automatic my array one goes faster, but when I don’t it goes slower than the non arrayed formulas. Also, when I compare it to the SumProduct function (the one I’m replacing) it goes twice as fast as it in automatic mode, but goes slower in manual mode.
I’m thinking it might be because one of the dependents is a volatile function (Today()). Have you experienced this? Using my own function does seem like it is faster than SumProduct but (the delay doesn’t seem quite as long now when calculating) just wondering why the difference.
There is a small bug with UDF array formulas that I have not yetmentioned: as it says on my website:
http://www.decisionmodels.com/calcsecretsj.htm
—
Note that Excel behaves unexpectedly when a multi-cell UDF is entered or modified and depends on volatile formulae: the UDF is evaluated once for each cell it occupies. This does not happen when the UDF is recalculated, only when it is entered or changed.
—
Maybe thats whats happening to you?
Thanks, that page was very useful.
Hi! First of all congratulations for your great job in this page!
I did an UDF array formula but I don’t know how to improve this in order to enhance time in calculation.
Basically, I need to compare one specific cell with an array that I don’t know previously its dimensions (I just know where it starts and that just have only one column). The minimum difference between these values (the cell and all array values) will be the final answer for me. The problem here is that I need to use this UDF array formula in 35040 cells (every 15 minutes of a year) and the calculation time in a normal pc is too slow.
The code of that UDF that I am talking is following:
Function MinofDiff(r1 As Long) As Variant
Dim r2 As Range
Dim TempDif As Variant
Dim TempDif1 As Variant
Dim j As Long
Dim LastRow As Long
On Error GoTo FuncFail
If r1 = 0 Then GoTo skip
With Sheets(“Dados”)
LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
Set r2 = .Range(“P8”, “P” & LastRow)
End With
TempDif1 = Application.Max(r2)
For j = 1 To LastRow – 7
If r1 >= r2(j) Then
TempDif = r1 – r2(j)
Else
TempDif = r1
End If
MinofDiff = Application.Min(TempDif, TempDif1)
TempDif1 = MinofDiff
Next j
skip:
Exit Function
FuncFail:
AverageTolM = CVErr(xlErrNA)
End Function
Please, could you help me to find some problems in writing of this short code and how can I gain time with that? I will really appreciate that. Thanks in advance.
Best Regards,
Pedro
Pedro, Good question for StackOverflow. You need to use arrays instead of ranges to loop through the data. Using ranges goes really slow. Also, you should look at more material written by Charles he has a ton of good information and if you follow his advice (i.e., read what he has written) you’ll see.
Here’s an answer I gave that shows you how to use arrays that you grab from excel:
http://stackoverflow.com/a/7263655/632495
Hi Pedro,
I thought this made a nice example using the techniques outlined in a number of previous posts, so I made another post for you!
Try this…
‘—
Function MinofDiff_R(r1 As Long) As Variant
On Error GoTo FuncFail
Dim r2 As Excel.Range
Dim TempDif As Long
Dim TempDif1 As Long
Dim j As Long
Dim LastRow As Long
Dim vRngValues As Variant
If r1 = 0 Then GoTo skip
With ActiveSheet
LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
Set r2 = .Range(“P8”, .Cells(LastRow, 16))
End With
vRngValues = r2.Value2
TempDif1 = Excel.WorksheetFunction.Max(r2)
For j = 1 To LastRow – 7
If r1 >= vRngValues(j, 1) Then
TempDif = r1 – vRngValues(j, 1)
Else
TempDif = r1
End If
If TempDif < TempDif1 Then MinofDiff_R = TempDif Else MinofDiff_R = TempDif1
TempDif1 = MinofDiff_R
Next 'j
skip:
Exit Function
FuncFail:
MinofDiff_R = CVErr(xlErrNA)
End Function
'—
Hi Jim,
Looks good to me: I made an array formula version for my latest post as an example.
Thank you so much Jim! It works fine as the array formula version that “fastexcel” did in his latest post. At least I solved the overflow problem that I had in my version.
Can we import an access data table/query through EXCEL UDF
Yes you can: use ADO or DAO
Do you why array formulas for Excel functions are slower than regular formulas for Excel? Is this always the case? Chandoo.org was doing a fast Excel week and that is one of the suggestions, don’t use array formulas for Excel functions. I tested one formula and came up with it being slower for built in Excel functions, of course, for UDFs it was much faster to do the arrays.
Pingback: » Why UDF Array Functions are Faster Spreadsheet Budget and Consulting
Thanks Charles, these are great tutorials, but I have not quite found what I am HOPING to find (but suspect it isn’t possible).
My goal is to create a general purpose multi-key lookup UDF.
I have one, and it works fine except it is SLOW, because I have large tables to search, and up to 6-field keys to support, and I need to do MANY lookups!
Here is the code. It uses Evaluate() as many solutions that I have found, use. Here’s the code:
Function KeyLookup(datatable As Variant, datacol As String, _
key1table As Variant, key1 As String, _
Optional key2table As Variant, Optional key2 As String, _
Optional key3table As Variant, Optional key3 As String, _
Optional key4table As Variant, Optional key4 As String, _
Optional key5table As Variant, Optional key5 As String, _
Optional key6table As Variant, Optional key6 As String) As Variant
Dim cmd As String
cmd = “INDEX(” & datatable.Address & “,MATCH(1,(”
cmd = cmd & key1table.Address & “=””” & key1 & “””)”
If Not IsMissing(key2table) Then cmd = cmd & “*(” & key2table.Address & “=””” & key2 & “””)”
If Not IsMissing(key3table) Then cmd = cmd & “*(” & key3table.Address & “=””” & key3 & “””)”
If Not IsMissing(key4table) Then cmd = cmd & “*(” & key4table.Address & “=””” & key4 & “””)”
If Not IsMissing(key5table) Then cmd = cmd & “*(” & key5table.Address & “=””” & key5 & “””)”
If Not IsMissing(key6table) Then cmd = cmd & “*(” & key6table.Address & “=””” & key6 & “””)”
cmd = cmd & “,0),” & datacol & “)”
KeyLookup = Evaluate(cmd)
End Function
This generates cmd values that look like this:
INDEX($K$3:$L$8993,MATCH(1,($B$3:$B$8993=”a1-5″)*($C$3:$C$8993=”Tarp”)*($E$3:$E$8993=”Sydney”)*($F$3:$F$8993=”Highest Reach”)*($G$3:$G$8993=”1+”)*($J$3:$J$8993=”T0″),1),1)
Unfortunately, my test takes 3 minutes to run (I worked out there are 970 million calcs involved so of course it would be slow).
I have implemented other methods but I want to use INDEX/MATCH WITHOUT Evaluate() because I know this adds a significant overhead.
It seems the only way to use INDEX/MATCH natively in VBA is with a single range and single key unless someone can tell me how to implement that lovely (range1=key1)*(range2=key2)… notation that worksheets are blessed with, but VBA somehow isn’t.
I need this to be as generic and as fast as possible so any other suggestions for optimisation would be grateful. I want to avoid VLOOKUP() based solutions as I want to avoid having to pre-calculate the combined keys, and avoid SUMPRODUCT solutions which only work with numbers and don’t return the first value but sum all values if multiple matches are found.
Many Thanks,
Gregg.
Gregg,
Well I don’t think there is a simple but fast solution to the general problem. I use a combination of cascading hi-lo binary search and multi-threaded memory written as a C++ XLL UDF, (and well over a thousand lines of code).
I suggest you try out my SpeedTools Lookups to see if yjet arec fast enough:
AVLOOKUP2, AVLOOKUPS2 & AVLOOKUPNTH Functions
Search for values in one or more columns of a table, and return values from the rows where a match is found.
Advanced Lookup functions returning:
· either the first value found (AVLOOKUP2)
· or all the values found (AVLOOKUPS2)
· or the Nth value found (AVLOOKUPNTH)
AVLOOKUP2, AVLOOKUPS2 and AVLOOKUPNTH are NOT case-sensitive.
The AVLOOKUP functions are multi-threaded, non-volatile, array functions.
AVLOOKUP Family Syntax
AVLOOKUP2(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
Exact_Match, Lookup_Columns,MemType_Name)
The first 3 parameters are required; the last 4 parameters are optional.
AVLOOKUPS2(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
Exact_Match, Lookup_Columns)
The first 3 parameters are required; the last 3 parameters are optional.
AVLOOKUPNTH(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
Exact_Match, Lookup_Columns, Position)
The first 3 parameters are required; the last 4 parameters are optional.
Lookup_Values (required)
The value(s) to be found in the Lookup_Columns.
Can be a single value or multiple values arranged in columns (lookup in multiple lookup columns) and rows (return multiple answer Rows). A single value can be a constant or a cell reference.
Multiple lookup values can be specified either as an array of constants or as a range referring to multiple cells. There should be the same number of columns of lookup values as there are columns in the Lookup_Columns.
When doing multi-column lookups (multiple columns of both lookup values and lookup columns) the AVLOOKUP functions will look for a row where ALL the lookup values are matched in the corresponding columns (Columns are treated as AND).
When doing multi-row lookups (multiple rows of lookup values) the AVLOOKUP functions will look for a row separately for each row of lookup values (Rows are treated as OR).
A single AVLOOKUP2 or AVLOOKUPNTH statement will return the same number of rows and columns of result values as there are rows in Lookup_Values and columns in Answer_Columns.
Lookup_Values can contain the wildcard characters ? and * for exact matches on unsorted text data. To find actual question marks or asterisks add a tilde (~) preceding the character.
Lookup_Table (required)
The rectangular range of cells or array or expression yielding an array to be used for the lookup table. The array/range must resolve to a single contiguous rectangular array.
Answer_Columns (required)
Specifies the column or columns in Lookup_Table that the AVLOOKUP functions will return values from for the row or rows that are found in the lookup operation.
Answer_Columns can be a constant, an array of constants, an expression or a reference.
· If Answer_Columns evaluates to a number it will be treated as relative column number(s) within Lookup_Table.
· If Answer_Columns evaluates to text then the text will be treated as column labels to be found in the first row of Lookup_Table.
If the column labels are not found the AVLOOKUP functions return #REF.
Thanks for your speedy reply. I figured this might be the case. It’s a shame they didn’t expose the (range1=key1)*… notation for us VBA people.
I am developing this product for a client and I have a requirement to use vanilla Excel/VBA, but I’ll see if I can get approval to try SpeedTools in the app. Many thanks again.
What about using Excel’s DSUM function to do this? It’s like a cross between the Advanced Filter and VLOOKUP.
Thanks Jeff, I looked at this one and my understanding is that the filter criteria cannot be specified in the function call itself. It needs to be in a range. I need the criteria in the function call as I need many lookups, all with different criteria.
Or write a function that leverages SQL?
Thanks again. In our case, where we might have 10,000+ lookup UDFs, this would probably take even more time. It’s a curly one, isn’t it. The lookups with acceptable performance are SpeedTools and VLOOKUP, but I’m not supposed to use add-ins (must be vanilla Excel) and VLOOKUP requires some preprocessing to create the single key column, which I am trying to avoid if possible.
Rather than doing it as a UDF, just do the one SQL call and process the entire table in one go?
Well yes, there is that. In fact, we’re kind of doing that. We are retrieving the raw data table using a REST call to our server (and even this table is a tiny subset of all of the available data). Once the data is in there, we want to give our clients the freedom to enter our lookup formulas anywhere and copy across and down as required. Ideally, we don’t want to force them to have to turn auto-recalc off due to slow performance.
You mention not using a UDF. I’m not sure then, what native Excel function would take its place.
I wasn’t thinking about a function, just a routine that they woud trigger somehow, and that would simply append the lookup table to the existing table via ADO/SQL wherever there was a match.
But that approach forces them to do the lookups in a designated column, and not just anywhere in the spreadsheet.
This would be an interesting question to post on a forum like http://www.chandoo.org/forums (where I’m a moderator) because the hive mind that is a forum knows lots more than you and me individually. You’d have to whip up some sample data to give an idea of the challenge though. It needent be a big dataset.
Thanks Jeff, have done that.
Hello! I am trying to solve a related problem in a workbook I am assembling. I have two columns Latitude, Longitude – approximately 40K records. I would like to alter the UDF below to compare Lat1 & Lat2 to ALL of the other Lat/Long combinations and return the distance to the closest lat/long.
Public Function getDistance(latitude1, longitude1, latitude2, longitude2)
earth_radius = 6371
Pi = 3.14159265
deg2rad = Pi / 180
dLat = deg2rad * (latitude2 – latitude1)
dLon = deg2rad * (longitude2 – longitude1)
a = Sin(dLat / 2) * Sin(dLat / 2) + Cos(deg2rad * latitude1) * Cos(deg2rad * latitude2) * Sin(dLon / 2) * Sin(dLon / 2)
c = 2 * WorksheetFunction.Asin(Sqr(a))
d = earth_radius * c
getDistance = d
End Function
I know that I will need to use an array in my UDF to accomplish this task – can anyone give me some assistance?
Thanks in advance,
-Bill
I’m building a excel file to do complex engineering calculations. I’m not skilled at Excel, but I wrote hundreds of fortran programs from ’83 to ’99. I need a little help to understand how to write a UDF that will return a one-dimensional array (always numerical values) as a result of multiple array arguments.
The calling statement in Excel with look like this:
=EqK ( $E$3,$F$3,$G3:$G$7,$H$3:$H$7,$I$3:$I$7 )
The UDF module will look something like this
FUNCTION EqK( Temperature, Pressure, Cp, Eta, Beta )
FOR I = 1 TO 5
EqK(I) = log(Temperature)+Pressure*(1/Cp(I) + Eta(I)^Beta(I))
NEXT I
END FUNCTION
The real functions I’m writing are about 100 times more complex, but if I see how to do this, then I might be able to figure the rest out.
I want the values EqK(1) through EqK(5) to be placed in cells A11,A12,A13,A14,A15
I have already learned how to pass arrays into a UDF and have the UDF return a single value. I have not figured out the protocol for calculating a whole array yet.
Hi Dan,
try this: you need to select cells A11:A15 and enter the function formula using Control-Shit-Enter to make it an array formula.
Note its usually a good idea to always work with 2-dimenionsal arrays even if one of the dimensions is single
Function EqK(Temperature, Pressure, Cp, Eta, Beta)
Dim ArrayOut(1 To 5, 1 To 1) As Double
Dim I As Long
For I = 1 To 5
ArrayOut(I, 1) = Log(Temperature) + Pressure * (1 / Cp(I) + Eta(I) ^ Beta(I))
Next I
EqK = ArrayOut
End Function
My cells (A11:A15) for the results show #Name?
I have excel 2007.. will that matter?
This is a cut and paste of what I entered into the UDF. Looks identical to what you suggested.
Function EqK(Temperature, Pressure, Cp, Eta, Beta)
Dim ArrayOut(1 To 5, 1 To 1) As Double
Dim I As Long
For I = 1 To 5
ArrayOut(I, 1) = Log(Temperature) + Pressure * (1 / Cp(I) + Eta(I) ^ Beta(I))
Next I
EqK = ArrayOut
End Function
I used Ctrl-Shift-Enter, and I tried it while just one cell (A11) was highlighted and also while highlighting all 5 cells (A11:A15)
I forgot to enable macros… seems to be fine now! Thank you
Accessing ranges on different pages
Hi
Thank you very much for your valuable guidance.
I’m an infrequent Excel programmer and have written an Excel Array Formula function with the following parameters:
Public Function Test(Params As Range, Dates As Range, Rain_pc As Range, Evap As Range, OutputPeriod As String) As Variant
Inside the function I have statements like:
Datesv = Dates.Value2
i1 = LBound(Datesv, 1)
i2 = UBound(Datesv, 1)
ReDim EndDates_plus_startv(0 To i2, 1 To 1)
It works fine when, for instance, the ranges for the “Dates” parameters are from the same page as the formula but when I use ranges from different pages the program does not work. Must the parameter passing be adjusted to allow the use of parameters from different pages?
I know this is not the most efficient calculation method but it allows the data and the program results to be on different pages which is convenient in this application
Many thanks
Anton
Should work OK – the most likely reason is that somewhere in your UDF the VBA is referring directly or indirectly to a range that is not a parameter of the function