Writing efficient VBA UDFs (Part5) – UDF Array Formulas go faster!

Just in case you thought the previous posts on writing efficient VBA UDfs (Part1, Part2, Part3, Part 4) meant we had finished making UDFs run faster, think again – its time to explore UDF Array Formulas.

Single and Multi-Cell Array Formulas

Excel array formulas can do amazing things. They are like ordinary formulas except that you enter them with Control/Shift/Enter rather than just enter.

There are two kinds of array formulae:

  • Single cell array formulae are entered into a single cell, loop through their arguments (which are often calculated arguments) and return a single answer.
  • Multi-cell array formulae are entered into multiple cells and return an answer to each of the cells.

With this power comes a cost: because array formulae are doing a lot of work they can be slow to calculate (particularly single-cell array formulas).

UDF Multi-cell Array Formulas go Faster!

You can break down the time taken by a VBA UDF into these components:

  • Overhead time to call the UDF.
  • Time to fetch the data thats going to be used by the UDF.
  • Time to do the calculations.
  • Overhead time to return the answer(s).

In the post on Excel VBA Read/Write timeings you could see that there was quite a significant overhead on each VBA read and write call, so that its usually much faster to read and write large blocks of data at a time.

So it sounds like a good idea to make your VBA UDF read as much data as possible in a single block and return data to Excel in as large a block as possible.
Enter the Multi-cell array formula – it does exactly that – and also minimises the calling overhead – and often it can read the data once and re-use it lots of times.

So how do you make a Multi-Cell Array formula?

Lets create an array version of the AverageTolE function shown in the first Writing Efficient VBA UDFs post.
The scenario is that you want to find the Averages of the data excluding a number of different tolerances rather just one tolerance.
To keep things simple I am assuming that

  • the tolerances are all in one row
  • both the data and the tolerances will be supplied as ranges
  • error-handling is largely ignored
  • the function returns a row of answers that correspond to the row of tolerances.
Public Function AverageTolM(theRange As Range, theTols As Range) As Variant
Dim vArr As Variant
Dim vArrTols As Variant
Dim v As Variant
Dim d As Double
Dim r As Double
Dim k As Long
Dim vOut() As Variant
Dim dTol As Double
Dim lCount As Long
On Error GoTo FuncFail
vArr = theRange.Value2
vArrTols = theTols.Value2
ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
On Error GoTo skip
For k = 1 To UBound(vArrTols, 2)
dTol = CDbl(vArrTols(1, k))
r = 0#
lCount = 0
For Each v In vArr
d = CDbl(v)
If Abs(d) > dTol Then
r = r + d
lCount = lCount + 1
End If
skip:
Next v
vOut(1, k) = r / lCount
Next k
AverageTolM = vOut
Exit Function
FuncFail:
AverageTolM = CVErr(xlErrNA)
End Function

The changes to the UDF are quite simple:

  • theTols range is coerced into a variant array: vArrTols = theTols.Value2
  • an output array of the same size is created: ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
  • The UDF loops on the tolerance array and populates the output array
  • The output array is assigned to the function variable: AverageTolM = vOut

Note that the Function is declared as returning a variant (which will contain an array) rather than being declared as returning an array of variants.

Assuming that the data is in H27:AA27 then enter the array function with Ctrl/Shift/Enter as

{=AVERAGETOLM(Data!$A$1:$A$32000,$H$27:$AA$27)} into 20 rows (so we will get 20 x 20 = 400 cells of answers).

Calculating this 20 formulas takes 975 milliseconds.

Using the original AVERAGETOLE formula for the 400 cells takes 1660 milliseconds, an improvement factor of 1.7

Summary

  • In many real-life cases using multi-cell array UDFs can be the fastest way to calculate.
  • Converting a conventional UDF to a multi-cell array UDf is straightforward.
This entry was posted in UDF, VBA. Bookmark the permalink.

31 Responses to Writing efficient VBA UDFs (Part5) – UDF Array Formulas go faster!

  1. Jon says:

    So I did an array UDF and I’m getting different results, time wise, when I do Application.Calculation=manual vs automatic.

    When I test the time for automatic my array one goes faster, but when I don’t it goes slower than the non arrayed formulas. Also, when I compare it to the SumProduct function (the one I’m replacing) it goes twice as fast as it in automatic mode, but goes slower in manual mode.

    I’m thinking it might be because one of the dependents is a volatile function (Today()). Have you experienced this? Using my own function does seem like it is faster than SumProduct but (the delay doesn’t seem quite as long now when calculating) just wondering why the difference.

  2. fastexcel says:

    There is a small bug with UDF array formulas that I have not yetmentioned: as it says on my website:
    http://www.decisionmodels.com/calcsecretsj.htm

    Note that Excel behaves unexpectedly when a multi-cell UDF is entered or modified and depends on volatile formulae: the UDF is evaluated once for each cell it occupies. This does not happen when the UDF is recalculated, only when it is entered or changed.

    Maybe thats whats happening to you?

  3. Jon says:

    Thanks, that page was very useful.

  4. Pedro says:

    Hi! First of all congratulations for your great job in this page!
    I did an UDF array formula but I don’t know how to improve this in order to enhance time in calculation.
    Basically, I need to compare one specific cell with an array that I don’t know previously its dimensions (I just know where it starts and that just have only one column). The minimum difference between these values (the cell and all array values) will be the final answer for me. The problem here is that I need to use this UDF array formula in 35040 cells (every 15 minutes of a year) and the calculation time in a normal pc is too slow.

    The code of that UDF that I am talking is following:

    Function MinofDiff(r1 As Long) As Variant

    Dim r2 As Range
    Dim TempDif As Variant
    Dim TempDif1 As Variant
    Dim j As Long
    Dim LastRow As Long

    On Error GoTo FuncFail
    If r1 = 0 Then GoTo skip

    With Sheets(“Dados”)
    LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
    Set r2 = .Range(“P8”, “P” & LastRow)
    End With

    TempDif1 = Application.Max(r2)
    For j = 1 To LastRow – 7

    If r1 >= r2(j) Then
    TempDif = r1 – r2(j)
    Else
    TempDif = r1
    End If

    MinofDiff = Application.Min(TempDif, TempDif1)
    TempDif1 = MinofDiff
    Next j

    skip:
    Exit Function
    FuncFail:
    AverageTolM = CVErr(xlErrNA)

    End Function

    Please, could you help me to find some problems in writing of this short code and how can I gain time with that? I will really appreciate that. Thanks in advance.

    Best Regards,
    Pedro

    • Jon says:

      Pedro, Good question for StackOverflow. You need to use arrays instead of ranges to loop through the data. Using ranges goes really slow. Also, you should look at more material written by Charles he has a ton of good information and if you follow his advice (i.e., read what he has written) you’ll see.

      Here’s an answer I gave that shows you how to use arrays that you grab from excel:
      http://stackoverflow.com/a/7263655/632495

    • fastexcel says:

      Hi Pedro,
      I thought this made a nice example using the techniques outlined in a number of previous posts, so I made another post for you!

  5. Jim Cone says:

    Try this…
    ‘—
    Function MinofDiff_R(r1 As Long) As Variant
    On Error GoTo FuncFail

    Dim r2 As Excel.Range
    Dim TempDif As Long
    Dim TempDif1 As Long
    Dim j As Long
    Dim LastRow As Long
    Dim vRngValues As Variant

    If r1 = 0 Then GoTo skip

    With ActiveSheet
    LastRow = .Cells(.Rows.Count, “P”).End(xlUp).Row
    Set r2 = .Range(“P8”, .Cells(LastRow, 16))
    End With
    vRngValues = r2.Value2
    TempDif1 = Excel.WorksheetFunction.Max(r2)

    For j = 1 To LastRow – 7
    If r1 >= vRngValues(j, 1) Then
    TempDif = r1 – vRngValues(j, 1)
    Else
    TempDif = r1
    End If
    If TempDif < TempDif1 Then MinofDiff_R = TempDif Else MinofDiff_R = TempDif1
    TempDif1 = MinofDiff_R
    Next 'j

    skip:
    Exit Function
    FuncFail:
    MinofDiff_R = CVErr(xlErrNA)
    End Function
    '—

  6. Ram says:

    Can we import an access data table/query through EXCEL UDF

  7. Jon says:

    Do you why array formulas for Excel functions are slower than regular formulas for Excel? Is this always the case? Chandoo.org was doing a fast Excel week and that is one of the suggestions, don’t use array formulas for Excel functions. I tested one formula and came up with it being slower for built in Excel functions, of course, for UDFs it was much faster to do the arrays.

  8. Pingback: » Why UDF Array Functions are Faster Spreadsheet Budget and Consulting

  9. Gregg says:

    Thanks Charles, these are great tutorials, but I have not quite found what I am HOPING to find (but suspect it isn’t possible).

    My goal is to create a general purpose multi-key lookup UDF.

    I have one, and it works fine except it is SLOW, because I have large tables to search, and up to 6-field keys to support, and I need to do MANY lookups!

    Here is the code. It uses Evaluate() as many solutions that I have found, use. Here’s the code:

    Function KeyLookup(datatable As Variant, datacol As String, _
    key1table As Variant, key1 As String, _
    Optional key2table As Variant, Optional key2 As String, _
    Optional key3table As Variant, Optional key3 As String, _
    Optional key4table As Variant, Optional key4 As String, _
    Optional key5table As Variant, Optional key5 As String, _
    Optional key6table As Variant, Optional key6 As String) As Variant
    Dim cmd As String

    cmd = “INDEX(” & datatable.Address & “,MATCH(1,(”
    cmd = cmd & key1table.Address & “=””” & key1 & “””)”
    If Not IsMissing(key2table) Then cmd = cmd & “*(” & key2table.Address & “=””” & key2 & “””)”
    If Not IsMissing(key3table) Then cmd = cmd & “*(” & key3table.Address & “=””” & key3 & “””)”
    If Not IsMissing(key4table) Then cmd = cmd & “*(” & key4table.Address & “=””” & key4 & “””)”
    If Not IsMissing(key5table) Then cmd = cmd & “*(” & key5table.Address & “=””” & key5 & “””)”
    If Not IsMissing(key6table) Then cmd = cmd & “*(” & key6table.Address & “=””” & key6 & “””)”
    cmd = cmd & “,0),” & datacol & “)”

    KeyLookup = Evaluate(cmd)
    End Function

    This generates cmd values that look like this:

    INDEX($K$3:$L$8993,MATCH(1,($B$3:$B$8993=”a1-5″)*($C$3:$C$8993=”Tarp”)*($E$3:$E$8993=”Sydney”)*($F$3:$F$8993=”Highest Reach”)*($G$3:$G$8993=”1+”)*($J$3:$J$8993=”T0″),1),1)

    Unfortunately, my test takes 3 minutes to run (I worked out there are 970 million calcs involved so of course it would be slow).

    I have implemented other methods but I want to use INDEX/MATCH WITHOUT Evaluate() because I know this adds a significant overhead.

    It seems the only way to use INDEX/MATCH natively in VBA is with a single range and single key unless someone can tell me how to implement that lovely (range1=key1)*(range2=key2)… notation that worksheets are blessed with, but VBA somehow isn’t.

    I need this to be as generic and as fast as possible so any other suggestions for optimisation would be grateful. I want to avoid VLOOKUP() based solutions as I want to avoid having to pre-calculate the combined keys, and avoid SUMPRODUCT solutions which only work with numbers and don’t return the first value but sum all values if multiple matches are found.

    Many Thanks,

    Gregg.

    • fastexcel says:

      Gregg,

      Well I don’t think there is a simple but fast solution to the general problem. I use a combination of cascading hi-lo binary search and multi-threaded memory written as a C++ XLL UDF, (and well over a thousand lines of code).
      I suggest you try out my SpeedTools Lookups to see if yjet arec fast enough:
      AVLOOKUP2, AVLOOKUPS2 & AVLOOKUPNTH Functions
      Search for values in one or more columns of a table, and return values from the rows where a match is found.

      Advanced Lookup functions returning:

      · either the first value found (AVLOOKUP2)

      · or all the values found (AVLOOKUPS2)

      · or the Nth value found (AVLOOKUPNTH)

      AVLOOKUP2, AVLOOKUPS2 and AVLOOKUPNTH are NOT case-sensitive.

      The AVLOOKUP functions are multi-threaded, non-volatile, array functions.

      AVLOOKUP Family Syntax

      AVLOOKUP2(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
      Exact_Match, Lookup_Columns,MemType_Name)

      The first 3 parameters are required; the last 4 parameters are optional.

      AVLOOKUPS2(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
      Exact_Match, Lookup_Columns)

      The first 3 parameters are required; the last 3 parameters are optional.

      AVLOOKUPNTH(Lookup_Values, Lookup_Table, Answer_Columns, Sorted,
      Exact_Match, Lookup_Columns, Position)

      The first 3 parameters are required; the last 4 parameters are optional.

      Lookup_Values (required)

      The value(s) to be found in the Lookup_Columns.
      Can be a single value or multiple values arranged in columns (lookup in multiple lookup columns) and rows (return multiple answer Rows). A single value can be a constant or a cell reference.
      Multiple lookup values can be specified either as an array of constants or as a range referring to multiple cells. There should be the same number of columns of lookup values as there are columns in the Lookup_Columns.

      When doing multi-column lookups (multiple columns of both lookup values and lookup columns) the AVLOOKUP functions will look for a row where ALL the lookup values are matched in the corresponding columns (Columns are treated as AND).
      When doing multi-row lookups (multiple rows of lookup values) the AVLOOKUP functions will look for a row separately for each row of lookup values (Rows are treated as OR).
      A single AVLOOKUP2 or AVLOOKUPNTH statement will return the same number of rows and columns of result values as there are rows in Lookup_Values and columns in Answer_Columns.

      Lookup_Values can contain the wildcard characters ? and * for exact matches on unsorted text data. To find actual question marks or asterisks add a tilde (~) preceding the character.

      Lookup_Table (required)

      The rectangular range of cells or array or expression yielding an array to be used for the lookup table. The array/range must resolve to a single contiguous rectangular array.

      Answer_Columns (required)

      Specifies the column or columns in Lookup_Table that the AVLOOKUP functions will return values from for the row or rows that are found in the lookup operation.
      Answer_Columns can be a constant, an array of constants, an expression or a reference.

      · If Answer_Columns evaluates to a number it will be treated as relative column number(s) within Lookup_Table.

      · If Answer_Columns evaluates to text then the text will be treated as column labels to be found in the first row of Lookup_Table.

      If the column labels are not found the AVLOOKUP functions return #REF.

  10. Gregg says:

    Thanks for your speedy reply. I figured this might be the case. It’s a shame they didn’t expose the (range1=key1)*… notation for us VBA people.

    I am developing this product for a client and I have a requirement to use vanilla Excel/VBA, but I’ll see if I can get approval to try SpeedTools in the app. Many thanks again.

  11. Jeff Weir says:

    What about using Excel’s DSUM function to do this? It’s like a cross between the Advanced Filter and VLOOKUP.

    • Gregg says:

      Thanks Jeff, I looked at this one and my understanding is that the filter criteria cannot be specified in the function call itself. It needs to be in a range. I need the criteria in the function call as I need many lookups, all with different criteria.

  12. Jeff Weir says:

    Or write a function that leverages SQL?

    • Gregg says:

      Thanks again. In our case, where we might have 10,000+ lookup UDFs, this would probably take even more time. It’s a curly one, isn’t it. The lookups with acceptable performance are SpeedTools and VLOOKUP, but I’m not supposed to use add-ins (must be vanilla Excel) and VLOOKUP requires some preprocessing to create the single key column, which I am trying to avoid if possible.

  13. Jeff Weir says:

    Rather than doing it as a UDF, just do the one SQL call and process the entire table in one go?

    • Gregg says:

      Well yes, there is that. In fact, we’re kind of doing that. We are retrieving the raw data table using a REST call to our server (and even this table is a tiny subset of all of the available data). Once the data is in there, we want to give our clients the freedom to enter our lookup formulas anywhere and copy across and down as required. Ideally, we don’t want to force them to have to turn auto-recalc off due to slow performance.

      You mention not using a UDF. I’m not sure then, what native Excel function would take its place.

      • Jeff Weir says:

        I wasn’t thinking about a function, just a routine that they woud trigger somehow, and that would simply append the lookup table to the existing table via ADO/SQL wherever there was a match.
        But that approach forces them to do the lookups in a designated column, and not just anywhere in the spreadsheet.

        This would be an interesting question to post on a forum like http://www.chandoo.org/forums (where I’m a moderator) because the hive mind that is a forum knows lots more than you and me individually. You’d have to whip up some sample data to give an idea of the challenge though. It needent be a big dataset.

  14. Gregg says:

    Thanks Jeff, have done that.

  15. William (Bill) Copeland says:

    Hello! I am trying to solve a related problem in a workbook I am assembling. I have two columns Latitude, Longitude – approximately 40K records. I would like to alter the UDF below to compare Lat1 & Lat2 to ALL of the other Lat/Long combinations and return the distance to the closest lat/long.

    Public Function getDistance(latitude1, longitude1, latitude2, longitude2)

    earth_radius = 6371
    Pi = 3.14159265
    deg2rad = Pi / 180

    dLat = deg2rad * (latitude2 – latitude1)
    dLon = deg2rad * (longitude2 – longitude1)

    a = Sin(dLat / 2) * Sin(dLat / 2) + Cos(deg2rad * latitude1) * Cos(deg2rad * latitude2) * Sin(dLon / 2) * Sin(dLon / 2)
    c = 2 * WorksheetFunction.Asin(Sqr(a))

    d = earth_radius * c

    getDistance = d

    End Function

    I know that I will need to use an array in my UDF to accomplish this task – can anyone give me some assistance?
    Thanks in advance,
    -Bill

  16. Dan H says:

    I’m building a excel file to do complex engineering calculations. I’m not skilled at Excel, but I wrote hundreds of fortran programs from ’83 to ’99. I need a little help to understand how to write a UDF that will return a one-dimensional array (always numerical values) as a result of multiple array arguments.

    The calling statement in Excel with look like this:
    =EqK ( $E$3,$F$3,$G3:$G$7,$H$3:$H$7,$I$3:$I$7 )

    The UDF module will look something like this

    FUNCTION EqK( Temperature, Pressure, Cp, Eta, Beta )
    FOR I = 1 TO 5
    EqK(I) = log(Temperature)+Pressure*(1/Cp(I) + Eta(I)^Beta(I))
    NEXT I

    END FUNCTION

    The real functions I’m writing are about 100 times more complex, but if I see how to do this, then I might be able to figure the rest out.

    I want the values EqK(1) through EqK(5) to be placed in cells A11,A12,A13,A14,A15

    I have already learned how to pass arrays into a UDF and have the UDF return a single value. I have not figured out the protocol for calculating a whole array yet.

    • fastexcel says:

      Hi Dan,
      try this: you need to select cells A11:A15 and enter the function formula using Control-Shit-Enter to make it an array formula.
      Note its usually a good idea to always work with 2-dimenionsal arrays even if one of the dimensions is single
      Function EqK(Temperature, Pressure, Cp, Eta, Beta)
      Dim ArrayOut(1 To 5, 1 To 1) As Double
      Dim I As Long
      For I = 1 To 5
      ArrayOut(I, 1) = Log(Temperature) + Pressure * (1 / Cp(I) + Eta(I) ^ Beta(I))
      Next I
      EqK = ArrayOut
      End Function

      • Dan H says:

        My cells (A11:A15) for the results show #Name?

        I have excel 2007.. will that matter?

        This is a cut and paste of what I entered into the UDF. Looks identical to what you suggested.

        Function EqK(Temperature, Pressure, Cp, Eta, Beta)
        Dim ArrayOut(1 To 5, 1 To 1) As Double
        Dim I As Long
        For I = 1 To 5
        ArrayOut(I, 1) = Log(Temperature) + Pressure * (1 / Cp(I) + Eta(I) ^ Beta(I))
        Next I
        EqK = ArrayOut
        End Function

        I used Ctrl-Shift-Enter, and I tried it while just one cell (A11) was highlighted and also while highlighting all 5 cells (A11:A15)

      • Dan H says:

        I forgot to enable macros… seems to be fine now! Thank you

  17. Anton says:

    Accessing ranges on different pages

    Hi

    Thank you very much for your valuable guidance.

    I’m an infrequent Excel programmer and have written an Excel Array Formula function with the following parameters:

    Public Function Test(Params As Range, Dates As Range, Rain_pc As Range, Evap As Range, OutputPeriod As String) As Variant

    Inside the function I have statements like:
    Datesv = Dates.Value2
    i1 = LBound(Datesv, 1)
    i2 = UBound(Datesv, 1)
    ReDim EndDates_plus_startv(0 To i2, 1 To 1)

    It works fine when, for instance, the ranges for the “Dates” parameters are from the same page as the formula but when I use ranges from different pages the program does not work. Must the parameter passing be adjusted to allow the use of parameters from different pages?

    I know this is not the most efficient calculation method but it allows the data and the program results to be on different pages which is convenient in this application

    Many thanks

    Anton

    • fastexcel says:

      Should work OK – the most likely reason is that somewhere in your UDF the VBA is referring directly or indirectly to a range that is not a parameter of the function

Leave a comment