Writing Efficient UDFs Part 9 – An Example – Updated

Pedro wants to know how to speed up his UDF, which needs to calculate results for 35040 cells the minimum difference between the cell and a column of values of unknown length.

Pedro’s UDF


Function MinofDiff(r1 As Long) As Variant
 Dim r2 As Range
 Dim TempDif As Variant
 Dim TempDif1 As Variant
 Dim j As Long
 Dim LastRow As Long
 On Error GoTo FuncFail
 If r1 = 0 Then GoTo skip
 With Sheets("Dados")
 LastRow = .Cells(.Rows.Count, "P").End(xlUp).Row
 Set r2 = .Range("P8", "P" & LastRow)
 End With
 TempDif1 = Application.Max(r2)
 For j = 1 To LastRow – 7
 If r1 >= r2(j) Then
 TempDif = r1 – r2(j)
 Else
 TempDif = r1
 End If
 MinofDiff = Application.Min(TempDif, TempDif1)
 TempDif1 = MinofDiff
 Next j
 skip:
 Exit Function
 FuncFail:
 MinofDiff = CVErr(xlErrNA)
 End Function

There is a fundamental problem with Pedro’s UDF: it is referencing a range in column P without passing it in as a parameter, so if anything changes in column P the UDF could give the wrong answer because Excel will not recalculate it. Pedro has done this so that the UDF can dynamically adjust to the number of entries in column P.

On test data with 60000 entries in column P 20 calls to the UDF take 18.5 seconds on my laptop, so 34K calls would take about 9 hours to calculate! So why is it so slow?

  • Every time the function is called (35K times) it finds the last row and the MAX value in column P: but this only needs to be done once.
  • 35040 calls will hit the VBE refresh slowdown bug: so we need to bypass that.
  • The For loop is referencing each cell value in column P (using R2(j) ) twice. Each reference to a cell is slow because there is a large overhead for each call out to the Excel object model.
  • The UDF uses Worksheetfunction.Min to find out which of 2 values is smaller: its much quicker to compare the values using VBA If than invoking a worksheet function.

The revised UDF

To solve the fundamental problem with the UDF I will pass it an additional parameter: a whole column reference to column P. Then the UDF can resize the range to the last cell containing data. (Another alternative would be to create a Dynamic Named Range for column P and pass that as a parameter.

To solve the first 2 slowdown problems the UDF will be made into an array formula UDF that returns an array of 35040 results.

To avoid referencing each cell in column P twice inside the loop, the UDF will get all the values from column P once, into a variant array and then loop on the variant array.

 Function MinofDiff2(R1 As Range, R2 As Range) As Variant
 Dim R2Used As Range
 Dim vArr2 As Variant
 Dim vArr1 As Variant
 Dim vOut() As Double
 Dim TempDif As Double
 Dim TempDif1 As Double
 Dim D1 As Double
 Dim D2 As Double
 Dim TMax As Double
 Dim j1 As Long
 Dim j2 As Long
 Dim LastRow As Long
 '
 On Error GoTo FuncFail
 '
 ' handle full column
 '
 LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row
 Set R2Used = R2.Resize(LastRow - 7, 1).Offset(7, 0)
 '
 ' get values into arrays
 '
 vArr2 = R2Used.Value2
 vArr1 = R1.Value2
 '
 ' find max
 '
 TMax = Application.Max(R2Used)
 '
 ' set output array to same size as R1
 '
 ReDim vOut(1 To UBound(vArr1), 1)
 '
 ' loop on R1
 '
 For j1 = 1 To UBound(vArr1)
 TempDif1 = TMax
 D1 = vArr1(j1, 1)
 '
 ' loop on R2
 '
 For j2 = 1 To (LastRow - 7)
 D2 = vArr2(j2, 1)
 If D1 >= D2 Then
 TempDif = D1 - D2
 Else
 TempDif = D1
 End If
 If TempDif < TempDif1 Then
 vOut(j1, 1) = TempDif
 Else
 vOut(j1, 1) = TempDif1
 End If
 TempDif1 = vOut(j1, 1)
 Next j2
 Next j1
 MinofDiff2 = vOut
 skip:
 Exit Function
 FuncFail:
 MinofDiff2 = CVErr(xlErrNA)
 End Function

Because this is an array function you need to select the 35040 cells that you want to contain the answer, then type the formula into the formula bar =MinofDiff2(A1:A35040,P:P) and then press Ctrl/Shift/Enter to enter the formula as an array formula into the 35040 cells.

This revised UDF takes .222 seconds for 20 values, and completes the 35040 UDF calculations in 6.25 minutes, a speedup factor of over 80.

Updated with Harlan Grove’s suggestions

Harlan Grove has pointed out several ways of speeding up the UDF. Here is a revised version implementing most of his suggestions. It is about 17% faster than my original version.

 Function MinofDiff3(R1 As Range, R2 As Range) As Variant
 Dim R2Used As Range
 Dim vArr2 As Variant
 Dim vArr1 As Variant
 Dim vOut() As Double
 Dim TempDif As Double
 Dim TempDif1 As Double
 Dim D1 As Double
 Dim D2 As Double
 Dim TMax As Double
 Dim TMin As Double
 Dim j1 As Long
 Dim j2 As Long
 Dim LastRow As Long
 '
 On Error GoTo FuncFail
 '
 ' handle full column
 '
 LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row - 7
 Set R2Used = R2.Resize(LastRow, 1).Offset(7, 0)
 '
 ' get values into arrays
 '
 vArr2 = R2Used.Value2
 vArr1 = R1.Value2
 '
 ' find max & Min
 '
 TMax = Application.Max(R2Used)
 TMin = Application.Min(R2Used)
 '
 ' set output array to same size as R1
 '
 ReDim vOut(1 To UBound(vArr1), 1)
 '
 ' loop on R1
 '
 For j1 = 1 To UBound(vArr1)
 TempDif1 = TMax
 D1 = vArr1(j1, 1)
 TempDif = D1 - TMax
 If D1 > TMax Then
 If TempDif < TMax Then
 vOut(j1, 1) = TempDif
 Else
 vOut(j1, 1) = TMax
 End If
 Else
 If D1 < TMin Then
 vOut(j1, 1) = D1
 Else
 '
 ' loop on R2
 '
 For j2 = 1 To LastRow
 D2 = vArr2(j2, 1)
 If D1 >= D2 Then
 TempDif = D1 - D2
 Else
 TempDif = D1
 End If
 If TempDif < TempDif1 Then TempDif1 = TempDif
 vOut(j1, 1) = TempDif1
 Next j2
 End If
 End If
 Next j1
 MinofDiff3 = vOut
 skip:
 Exit Function
 FuncFail:
 MinofDiff3 = CVErr(xlErrNA)
 End Function

Harlan also points out that a version using QuickSort to sort R2 and Binary Search instead of the loop would be an order of magnitude faster!

This entry was posted in Calculation, UDF, Uncategorized, VBA. Bookmark the permalink.

12 Responses to Writing Efficient UDFs Part 9 – An Example – Updated

  1. Jim Cone says:

    Charles,
    I read Pedro’s post as wanting only ONE answer: Determine the difference between r1 and each value in the column and return the smallest difference (with some overriding criteria).
    His main concern was speed.
    I had difficulty visualizing the concept at first, but it looks now like I saw a different movie.
    ‘—
    Jim Cone

  2. fastexcel says:

    Jim,
    I agree: but he has 35040 of these formulas so it makes sense to replace the 35040 formula with a single array formula that returms 35040 results (and his comment is to a post about aarry formulas…)

  3. Harlan Grove says:

    If d1 in range1 MAX(range2), the udf returns MIN(d1-MAX(range2),MAX(range2)).

    That so, I’d also calculate TMin and add special case logic to test whether D1 TMax, and set the vOut entry without iterating through the range2 values.

    If range1 were D5:D1004, ‘minimum difference’ results were to be in E5:E1004, a cell named TMax had the formula =MAX(P8:P65536) and a cell named TMin had the formula =MIN(P8:P65536), the single-cell array formula

    =IF(D5>TMax,MIN(D5-TMax,TMax),IF(D5<TMin,D5,
    D5-MAX(IF(P$8:P$65536<D5,P$8:P$65536))))

    returns the result for cell E5. I'd be surprised if these array formulas didn't recalculate faster than the udf.

    • fastexcel says:

      @Harlan “The XLM Master”

      I tried your array formula – seems to be about twice as slow as the array UDF using hardcoded P8:P60000 in your formula (I have not tried your MAX and MIN improvements in the UDF).

      In XL2007/XL2010 the array formula would have the advantage of calculating multithreaded, but would have to handle the 1 million rows somehow (probably use a Dynamic Range).

  4. Harlan Grove says:

    Ate some angle brackets. My 1st paragraph should have been

    If d1 in range1 were less than MIN(range2), the udf returns d1. If d1 in range1 were greater than MAX(range2), the udf returns MIN(d1-MAX(range2),MAX(range2)).

  5. Eric says:

    “Another alternative would be to create a Dynamic Named Range for column P and pass that as a parameter.”

    Is that really a good idea? My personal experience with named ranges – especially dynamic named ranges – is that they tend to slow things down even more. I’d be interested in seeing you test this idea!

    • fastexcel says:

      Named Ranges are slower than direct references but the slowdown is not noticeable unless you have an extremely large number of formulas using them. Dynamic Named Ranges can be slow if you using the Offset and CountA method, because the Offset makes them volatile and CountA is expensive if used frequently. Using Index rather than Offset is good, and referencing a Counta formula stored in a cell somewhere is also good.

  6. Harlan Grove says:

    Another possible improvement would be sorting r2used and looking up the largest value less than or equal to the current d1 value using binary search.

    • fastexcel says:

      I agree that would probably be the fastest solution. I considered programming the UDF that way but decided that the post would be better if it concentrated on just speeding up the VBA rather than improving the algorithm as well.

      • Pedro says:

        Thank you for all comments and help.. particularly to fastexcel for attention and availability. 🙂 It was a great help!

      • Harlan Grove says:

        Change to

        LastRow = r2.Cells(r2.Rows.Count, 1).End(xlUp).Row – 7

        and save a lot of unnecessary subtractions further on in the code. Change the 2nd If block and following assignment in the For j2 loop to

        If TempDif < TempDif1 Then TempDif1 = TempDif
        vOut(j1, 1) = TempDif1

        With a 1,000 cell 1st range and 60,000 cell 2nd range these changes reduce execution time about 15% on my system.

        FWIW, with the same data, quicksort/binary search reduces execution time by more than a decimal order of magnitude.

      • fastexcel says:

        Thanks Harlan: I have updated the post with a version implementing most of your suggestions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s