Writing efficient VBA UDFs Part 10 – Volatile Functions and Function Arguments

February 2, 2012

I just realised that none of my previous 9 posts on writing efficient VBA UDFs has discussed when and why you should make Functions Volatile. Since that’s fairly fundamental I really should have covered the topic early in the series … but anyway here goes.

What does Volatile mean?

Normally Excel’s smart recalculation engine only recalculates formulas that either have been changed/entered or depend on a cell or formula that has been changed somewhere higher up the chain of precedents for the formula.

This makes for very efficient calculation speed since in a typical workbook only a small faction of the formulas will be dependent on any particular cell or piece of data.

But some functions need to recalculate at every recalculation. For example NOW() should always give you the current time at the last calculation, and RAND() should give you a different random number each time it is calculated. These functions are called Volatile Functions, and any formula that uses one of them is a Volatile formula.

You can see more discussion of Excel’s built-in volatile functions and the volatile actions that trigger a recalculation at http://www.decisionmodels.com/calcsecretsi.htm.

How does Excel’s smart recalc engine know when to recalculate a function or a formula?

Excel maintains its dependency trees by looking at what other cells a function or a formula refers to, and the smart recalc engine uses these dependency trees to work out which formulas to recalculate.

For Functions Excel only looks at the arguments to the function to determine what the function depends on. So if you write a function like this:


Function Depends(theCell as range)
Depends=ActiveSheet.range("Z9")+theCell + _
theCell.Offset(0,1)
End Function

and call it in a formula =Depends(“A1″)
then Excel will only recalculate your function when A1 changes, and not when B1 or Z9 changes.

This could give you incorrect results.

Note: During a recalculation if Excel does evaluate the UDF it determines which cell references are actually being used inside the function to affect the function result, and if those cells have not yet been finally calculated it will reschedule the Function for later calculation. This is required to make the UDF be finally calculated in the correct dependency sequence.

How to fix this problem

There are several ways to fix this problem, but only one good one!

Make the function Volatile

If you add Application.Volatile to the function it will always recalculate:

Function Depends(theCell as range)
Application.Volatile
Depends=ActiveSheet.range("Z9")+theCell+ _
theCell.Offset(0,1)
End Function

But this will slow down the calculation, so generally its a bad idea unless, like RAND() or NOW() the function really needs to be Volatile.

Use Ctrl/Alt/F9 to trigger a full calculation

If you press Ctrl/Alt/F9 then Excel will recalculate every single formula in all the open workbooks, regardless of what has changed or is volatile.
Of course this can be very slow.

Make sure the Arguments to the UDF refers to ALL the cells the UDF uses.

Change the UDF to

Function Depends(theCell1 as range, theCell2 as range)
Depends = theCell1.Resize(1, 1)+ _
theCell1.Resize(1, 1).Offset(0, 1) + theCell2
End Function

This is the best solution.

Call it using =Depends(A1:B1,Z9) so that Excel knows that B1 is being referenced by theCell1.Offset(0,1).

Now Excel knows all the cells that the function depends on and it will be recalculated correctly and efficiently.

Detecting whether a Function or Formula is Volatile

You can download VolatileFuncs.zip from http://www.DecisionModels.com/Downloads/VolatileFuncs.zip

This contains tests for the volatile Excel built-in functions, using a function to increment a counter each time the referenced cell changes.
Public jCalcSeq As Long ''' calculation sequence counter
'
Public Function CalcSeqCountRef(theRange As Range) As Variant
'
' COPYRIGHT © DECISION MODELS LIMITED 2000. All rights reserved
'
' increment calculation sequence counter at Full Recalc or when theRange changes
' fixed for false dependency
'
jCalcSeq = jCalcSeq + 1
CalcSeqCountRef = jCalcSeq + (theRange = theRange) + 1
End Function

Summary

Make sure that the arguments to your UDF always directly refer to ALL the cells that the UDF uses.


Writing Efficient UDFs Part 9 – An Example – Updated

January 31, 2012

Pedro wants to know how to speed up his UDF, which needs to calculate results for 35040 cells the minimum difference between the cell and a column of values of unknown length.

Pedro’s UDF

Function MinofDiff(r1 As Long) As Variant
Dim r2 As Range
Dim TempDif As Variant
Dim TempDif1 As Variant
Dim j As Long
Dim LastRow As Long
On Error GoTo FuncFail
If r1 = 0 Then GoTo skip
With Sheets("Dados")
LastRow = .Cells(.Rows.Count, "P").End(xlUp).Row
Set r2 = .Range("P8", "P" & LastRow)
End With
TempDif1 = Application.Max(r2)
For j = 1 To LastRow – 7
If r1 >= r2(j) Then
TempDif = r1 – r2(j)
Else
TempDif = r1
End If
MinofDiff = Application.Min(TempDif, TempDif1)
TempDif1 = MinofDiff
Next j
skip:
Exit Function
FuncFail:
MinofDiff = CVErr(xlErrNA)
End Function

There is a fundamental problem with Pedro’s UDF: it is referencing a range in column P without passing it in as a parameter, so if anything changes in column P the UDF could give the wrong answer because Excel will not recalculate it. Pedro has done this so that the UDF can dynamically adjust to the number of entries in column P.

On test data with 60000 entries in column P 20 calls to the UDF take 18.5 seconds on my laptop, so 34K calls would take about 9 hours to calculate! So why is it so slow?

  • Every time the function is called (35K times) it finds the last row and the MAX value in column P: but this only needs to be done once.
  • 35040 calls will hit the VBE refresh slowdown bug: so we need to bypass that.
  • The For loop is referencing each cell value in column P (using R2(j) ) twice. Each reference to a cell is slow because there is a large overhead for each call out to the Excel object model.
  • The UDF uses Worksheetfunction.Min to find out which of 2 values is smaller: its much quicker to compare the values using VBA If than invoking a worksheet function.

The revised UDF

To solve the fundamental problem with the UDF I will pass it an additional parameter: a whole column reference to column P. Then the UDF can resize the range to the last cell containing data. (Another alternative would be to create a Dynamic Named Range for column P and pass that as a parameter.

To solve the first 2 slowdown problems the UDF will be made into an array formula UDF that returns an array of 35040 results.

To avoid referencing each cell in column P twice inside the loop, the UDF will get all the values from column P once, into a variant array and then loop on the variant array.

Function MinofDiff2(R1 As Range, R2 As Range) As Variant
Dim R2Used As Range
Dim vArr2 As Variant
Dim vArr1 As Variant
Dim vOut() As Double
Dim TempDif As Double
Dim TempDif1 As Double
Dim D1 As Double
Dim D2 As Double
Dim TMax As Double
Dim j1 As Long
Dim j2 As Long
Dim LastRow As Long
'
On Error GoTo FuncFail
'
' handle full column
'
LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row
Set R2Used = R2.Resize(LastRow - 7, 1).Offset(7, 0)
'
' get values into arrays
'
vArr2 = R2Used.Value2
vArr1 = R1.Value2
'
' find max
'
TMax = Application.Max(R2Used)
'
' set output array to same size as R1
'
ReDim vOut(1 To UBound(vArr1), 1)
'
' loop on R1
'
For j1 = 1 To UBound(vArr1)
TempDif1 = TMax
D1 = vArr1(j1, 1)
'
' loop on R2
'
For j2 = 1 To (LastRow - 7)
D2 = vArr2(j2, 1)
If D1 >= D2 Then
TempDif = D1 - D2
Else
TempDif = D1
End If
If TempDif < TempDif1 Then
vOut(j1, 1) = TempDif
Else
vOut(j1, 1) = TempDif1
End If
TempDif1 = vOut(j1, 1)
Next j2
Next j1
MinofDiff2 = vOut
skip:
Exit Function
FuncFail:
MinofDiff2 = CVErr(xlErrNA)
End Function

Because this is an array function you need to select the 35040 cells that you want to contain the answer, then type the formula into the formula bar =MinofDiff2(A1:A35040,P:P) and then press Ctrl/Shift/Enter to enter the formula as an array formula into the 35040 cells.

This revised UDF takes .222 seconds for 20 values, and completes the 35040 UDF calculations in 6.25 minutes, a speedup factor of over 80.

Updated with Harlan Grove’s suggestions

Harlan Grove has pointed out several ways of speeding up the UDF. Here is a revised version implementing most of his suggestions. It is about 17% faster than my original version.


Function MinofDiff3(R1 As Range, R2 As Range) As Variant
Dim R2Used As Range
Dim vArr2 As Variant
Dim vArr1 As Variant
Dim vOut() As Double
Dim TempDif As Double
Dim TempDif1 As Double
Dim D1 As Double
Dim D2 As Double
Dim TMax As Double
Dim TMin As Double
Dim j1 As Long
Dim j2 As Long
Dim LastRow As Long
'
On Error GoTo FuncFail
'
' handle full column
'
LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row - 7
Set R2Used = R2.Resize(LastRow, 1).Offset(7, 0)
'
' get values into arrays
'
vArr2 = R2Used.Value2
vArr1 = R1.Value2
'
' find max & Min
'
TMax = Application.Max(R2Used)
TMin = Application.Min(R2Used)
'
' set output array to same size as R1
'
ReDim vOut(1 To UBound(vArr1), 1)
'
' loop on R1
'
For j1 = 1 To UBound(vArr1)
TempDif1 = TMax
D1 = vArr1(j1, 1)
TempDif = D1 - TMax
If D1 > TMax Then
If TempDif < TMax Then
vOut(j1, 1) = TempDif
Else
vOut(j1, 1) = TMax
End If
Else
If D1 < TMin Then
vOut(j1, 1) = D1
Else
'
' loop on R2
'
For j2 = 1 To LastRow
D2 = vArr2(j2, 1)
If D1 >= D2 Then
TempDif = D1 - D2
Else
TempDif = D1
End If
If TempDif < TempDif1 Then TempDif1 = TempDif
vOut(j1, 1) = TempDif1
Next j2
End If
End If
Next j1
MinofDiff3 = vOut
skip:
Exit Function
FuncFail:
MinofDiff3 = CVErr(xlErrNA)
End Function

Harlan also points out that a version using QuickSort to sort R2 and Binary Search instead of the loop would be an order of magnitude faster!


Excel 2010 Tables/Listobject: slow update and how to bypass

January 30, 2012

There was an interesting post on Stack Overflow recently about a performance problem when updating Excel 2007/2010 Tables/Listobjects. Certainly something strange is going on!

Duplicating the Problem

1. Create a table by selecting a few cells (I used A1:A3) on a sheet (Sheet1) and using Format as Table on the Home tab

2. On a different sheet (Sheet2) create a few thousand (I used 10000) of the simplest formula you can think of (I used =”A”)

3. Create a VBA Sub that updates a different cell in the same sheet as the table:


Sub Updater()
Dim j As Long
Dim dTime As Double
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
dTime = Timer
For j = 1 To 10000
Worksheets("Sheet1").Range("d5") = j
Next j
MsgBox Timer - dTime
End Sub

4. Select a cell within the Table

5. Run the VBA Sub

On my system that takes 8.8 seconds.

Bypassing the Problem

For this slowdown to occur each of the following conditions must be true:

  • A cell within the Table must be selected
  • The sheet containing the Table must be the Active Sheet
  • The cell being updated must be on the same sheet as the table, but does not have to be within the table
  • There must be a reasonable number of formulas in the workbook.

So change any or all of these conditions or delete all the formulas and the update will only take 0.5 seconds on my system.

Whats actually happening?

The time taken is proportional mostly to the number and slightly to the size of the formulas in the workbook, but none of the formulas are actually being calculated.

So it seems to me that each time the cell is updated Excel is scanning all the formulas in the workbook as though they might need to be changed.

Maybe this has something to do with the automatic extension of formulas within a table when you add a new row or the fact that the Table definition and its associated Name has to change if you add a new row.

But the fact that the slowdown only occurs if the Table is on the active sheet means that I think this is a bug.

Can you think of a better explanation?


Writing efficient VBA UDFs (Part 8) – Getting the previously calculated value from the calling cells

January 8, 2012

If you have a UDF that depends on some slow-calculating resource you may want the UDF to mostly just return the values obtained  at the last calculation from the cells the UDF occupies, and only occasionally go and use the slow-calculating resource.

In Excel 2010 you can create efficient C++ XLL UDFs that execute asynchronously and multithreaded, which can be a very effective solution.

But how do you get the previous value from the cells calling a VBA UDF?

Lets suppose that you want to pass the UDF a parameter for the slow-calculating resource and a switch to tell it when to use the slow resource. You can set the switch (I am using a defined name called “RefreshSlow”)  and refresh the UDFs in a VBA sub like this:

Sub RefreshUDFs()
Dim lCalcMode As Long
lCalcMode = Application.Calculation
Application.Calculation = xlCalculationManual
Names(“RefreshSlow”).RefersTo = True
Calculate
Names(“RefreshSlow”).RefersTo = False
Application.Calculation = lCalcMode
End Sub

I will use a dummy function to simulate getting a slow resource:

Function GetSlowResource(vParam As Variant) As Variant
Dim j As Long
For j = 1 To 10000000
Next j
GetSlowResource = Rnd()
End Function

This function (ignores the parameter) and just (slowly) returns a random number.

There are several ways of getting the previously calculated value for a UDF: they each have advantages and disadvantages.

Application.Caller.Value

You can use Application.Caller.Value, but this causes a circular reference that you have to switch on Iteration to solve. This is slow and can mask other unintentional circular refs, so its not recommended.


Function UDF1(vParam, Refresh)
If Not Refresh Then
UDF1 = Val(Application.Caller.Value2)
Else
UDF1 = GetSlowResource(vParam)
End If
End Function

Application.Caller.Text

If you use Application.Caller.Text you don’t get the circular reference, but it retrieves the formatted value that is displayed in the cell as a string. So if the cell is formatted as a number with 2 decimal places the retrieved value will be truncated to 2 decimal places.

Function UDF2(vParam, Refresh)
If Not Refresh Then
UDF2 = Val(Application.Caller.Text)
Else
UDF2 = GetSlowResource(vParam)
End If
End Function

This solution will work OK if you can control the formatting or the function returns a string.

Application.Caller.ID

You can use the Range.ID property to store and retrieve a string value within the UDF.

Function UDF3(vParam, Refresh)
Dim var As Variant
If Not Refresh Then
UDF3 = Val(Application.Caller.ID)
Else
var = GetSlowResource(vParam)
UDF3 = var
Application.Caller.ID = var
End If
End Function

This works well, except that the Range.ID property is not stored in a Saved workbook, so the next time you open the workbook the retrieved value will be Blank/Zero.

Using an XLM or XLL function to pass the Previous value to the UDF

Using XLM or XLL technology it is possible to create a non-multi-threaded command-equivalent function to retrieve the previous value.
Here is the code for an XLL+ function called PREVIOUS which has a parameter to make it Volatile or not Volatile.
(Command-equivalent functions default to Volatile but when using it to pass the previous value to a VBA UDF you generally want it to be non-volatile).
This function also works for multi-celled array formulae.
Edited following Keith Lewis comment.


CXlOper* PREVIOUS_Impl(CXlOper& xloResult, const CXlOper* Volatile_op)
{
// Input buffers
bool Volatile;
// Validate and translate inputs
static CScalarConvertParams Volatile__params(L"Volatile",
XLA_DEFAULT_ZERO|XLA_DEFAULT_EMPTY|XLA_DEFAULT_NONNUMERIC|
XLA_DEFAULT_BLANK, 0, -1, true);
XlReadScalar(*Volatile_op, Volatile, Volatile__params);
// End of generated code
//}}XLP_SRC
// defined as a macro function defer recalc so that the func gets previous results
CXlOper xloCaller,xlo;
CXlOper arg;
arg=true;
if (!Volatile) arg=false;
// set volatility of this function: 237 is the function number for volatile
xlo.Excel(237,1,&arg);
// Get caller. Fail if it is not a range of cells
if ( ( xloCaller.GetCaller() != 0 ) || !xloCaller.IsRef() ) return CXlOper::RetError(xlerrNA);
//coerce the caller ref
xloResult.Coerce(xloCaller);
return xloResult.Ret();
}

Then you can use this to pass the previous value to the UDF.

Function UDF4(vParam, Refresh, Previous)
Dim var As Variant
If Not Refresh Then
UDF4 = Previous
Else
var = GetSlowResource(vParam)
UDF4 = var
End If
End Function

The UDF is called from a formula like this =UDF4(“AAPL”,RefreshSlow,PREVIOUS(False))

This works well, but requires access to the XLL PREVIOUS function (Laurent Longre’s MOREFUNC addin has a similar function).

Its supposed to be possible to write a similar function using the old XLM Macro language, but I have not tried it.

Conclusion

There are several ways of getting the previous value from the last calculation for a VBA UDF. But the best solution requires using a C++ XLL.

Special prize to the first person to write the XLM Macro Previous UDF!


UK Excel Developer Conference: London January25

December 18, 2011

Simon Murphy (AKA Smurf) has organised another one-day conference for UK Developers in London on January 25.

The last one was a great chance to learn some awesome stuff, meet some fellow Excel developer geeks and consume some beer, and this one promises to be even better

The agenda is here (subject to change of course), and you can sign up here. (its an absolute bargain at £200).

My session is called VBA to C : Pratfalls and Perils

15:30 – 16:30 VBA to C : Pratfalls and Perils
- Stories based on a c++ newby’s efforts to convert 10K lines of VBA UDFs to
C++ XLLs.
- Demonstrations and examples using Visual Studio 2010 and Planatech XLL+

If you want to know why I embarked on this somewhat perilous journey see my post on UDF Technology choices.

The idea of this session is very much to approach C++ XLLs from the viewpoint of a VBA developer.

I aim to start with some simple UDFs, then cover some of the more idiotic mistakes that I keep making,
and progress to slightly more sophisticated stuff that demonstrates how to use some of the XLL+ wizard features.

If there is time I will show how relatively easy it turned out to be (once I figured out HOW) to do multi-threaded UDFs with shared memory …

So sign up and come along for a great time and some really interesting and important stuff!


TEXT vs VALUE vs VALUE2 – Slow TEXT and how to avoid it

November 30, 2011

I was intrigued by a recent post pointing out that using .Text to retrieve data from Excel ranges got slower and slower as you iterated through the rows. So I took some time to explore and compare the three main properties (Range.Value, Range.Value2 and Range.Text)  for getting result values from an Excel Range into VBA.

Of course, as you will see, they each have their own peculiarities and advantages.

Range.Text

This gets the formatted value of a cell. Actually it looks like .Text gets the value from the cell and then formats it, because the more complex the formatting the slower it is.
.Text is a read-only property, so you cannot set it directly.

If you try getting .Text from multiple cells into a variant (varr = Range(“A1:A10″).Text) you do NOT get an array of results. Instead if all the cells in the range contain the same value formatted in the same way you get a single formatted value, but if ANY of the cells has different values or formats you get Null ( this could be a useful trick).

When used inside a UDF you can use .Text to get the formatted value of the calling cell as it was before the UDF was called (you can’t do this with .Value or .Value 2 without causing a circular reference). Here is an example that adds an extra . on each calculation.

Function LastValue()
Dim var As Variant
Application.Volatile
var = Application.Caller.Text
Debug.Print var
var = var & "."
LastValue = var
End Function

The major drawback of .Text when used this way is that it gives you the formatted value, so the value you get could be ### if the user has set an inappropriate zoom or column width, or numbers could be retrieved as 1E+18 or …

Range.Value

This mostly gets the underlying value from the cell.

But if the cell is formatted as a date or currency then Range.Value converts the underlying value into a VBA date or currency variable before passing it to the VBA variable used in the assignment statement. Since VBA currency only has 4 decimal places this can cause a loss of accuracy. Suppose cell G4 has a value of 123.456789 formatted as currency. Then Range(“g4″).Value will retrieve a value of 123.4568 rather than the actual value!
If you assign this to a Variant you get a variant with a subtype of currency, but if you assign it to a Double then the value first gets converted to currency datatype and truncated, and then the currency datatype gets converted to a double.

Maybe using .Value to retrieve cells formatted as dates into a variant is useful because then you can use IsDate() to detect that the cell was a date.

Range.Value is an efficient way of transferring a range of cell values to VBA because you can assign the Range to a variant variable and the you get a variant containing a 2-dimensional array of variants. This is much more efficient that looping on the cells one-by-one.

.Value is (unfortunately) the default property of the Range object.

Range.Value2

This works the same way as Range.Value, except that it does not check the cell format and convert to Date or Currency. And thats probably why its faster than .Value when retrieveing numbers.

So .Value2 really should be the default, and is definitely the one to use 99% of the time.

Performance Comparison

So how do these properties compare for speed? Here is my test code:

Sub textit()
Dim dTime As Double
Dim j As Long
Dim jStart As Long
Dim var As Variant
dTime = MicroTimer()
For jStart = 1 To 40000 Step 5000
dTime = MicroTimer
For j = 1 To 5000
var = Range("a1").Offset(jStart + j - 2, 0).Text
Next j
dTime = MicroTimer - dTime
Debug.Print dTime
Next jStart
End Sub

I ran this using a fresh worksheet with test data of numbers in the first 40000 rows.

The first run showed more-or-less constant time for each block. Then I changed the row-height of 10 rows at random intervals. The next run (.Text(2) is much slower, and the times increase from block to block.

So what’s going on: why so much slower with times increasing?

Well it turns out that once enough row-heights have been changed the time for .Text is a function of the number of rows between the selected visible rows on the screen and the row being processed!!!!

(And No I don’t know why, .Text must be doing some sort of cumulative row height calculation).

So if you add Range(“a1″).Offset(jStart).Select immediately after the For jStart = 1 To 40000 Step 5000 you get a faster and more constant set of times. Note you have to have Application.Screenupdating =True or this trick won’t work.

Finally I replaced .Text with .Value and then with .Value2, and then used a variant array instead of the inner loop to get the full set of timings:

Conclusions

  • .Text is seriously slow even if you bypass the row-height problem.
  • .Value can seriously damage your numbers
  • .Value2 is faster than .value with numbers (no significant difference with text)
  • .Value2 using a variant array is much the fastest way to go

So do you ever use .Text? And if so why?


Writing efficient VBA UDFs (Part 7) – UDFs calculated multiple times

November 25, 2011

There are several circumstances where Excel will calculate a UDF multiple times when you would expect it to only be calculated once. This can be a significant problem if your UDF takes a long time to execute.
The previous posts on writing efficient VBA UDfs (Part1,Part2,Part3,Part4,Part5 Part6 ) have not discussed this problem so I will show how to get around it.

Multiple UDF recalcs caused by uncalculated cells

When Excel recalcs a workbook after changes have been made the calculation engine starts by calculating the most recently changed formulas, and then uses the most recent calculation sequence for the remaining formulas.
If the calculation engine finds a formula that depends on a cell that has been dirtied/changed (or is volatile) but has not yet been calculated, it reschedules the formula to the end of the calculation chain so that it can be recalculated again after the uncalculated cell.
The problem is that the calculation engine only does this rescheduling after the formula/UDF has been calculated, so a formula containing a UDF can be calculated many times in each recalculation.

Here is a very simple example:

  1. Set Calculation to Manual so that its easier to see whats happening.
  2. Enter this UDF into a standard Module in the VBE.Public Function Tracer(theCell As Range)
    Tracer = theCell.Value
    Debug.Print Application.Caller.Address & "-" & Tracer
    End Function
  3. Show the Immediate Window (Ctrl G)
  4. Enter 1 in Cell A1
  5. Enter =Tracer(A1)+1 in cell A2
  6. Enter =Tracer(A2)+1 in cell A3

The Immediate window shows
$A$2-1
$A$3-2
because the formulas were calculated as they were entered.
Now clear the immediate window, return to Excel and press F9 to recalculate. The immediate Window now shows
$A$3-
$A$2-1
$A$3-2
Which shows that cell A3 was calculated first (with the value of its parameter range A2 showing as empty), followed by A2, followed by A3 again, this time with the correct value for its parameter A2.
Now if you clear the Immediate window and calculate the formulas again without changing anything (use Ctrl/Alt/F9). This time A3 only gets recalculated once, because Excel is reusing the final calculation sequence from the previous recalculation.

Handling uncalculated cells

Fortunately its fairly easy for the UDF to detect when its being passed an uncalculated cell because the cell will be empty:

Public Function Tracer2(theCell As Range)
If IsEmpty(theCell) Then Exit Function
Tracer2 = theCell.Value
Debug.Print Application.Caller.Address & "-" & Tracer2
End Function

This version of the UDF checks if the cell is empty and exits immediately. If you need to distinguish between genuinely empty cells and uncalculated cells you can check that the cell contains a formula useing:

=IsEmpty(theCell.Value) and Len(theCell.formula)>0 Then Exit Function

or

=IsEmpty(theCell.Value) and theCell.HasFormula Then Exit Function

If the parameter is a range of cells containing formulae then you need something a bit more complex:

Public Function IsCalced(theParameter As Variant) As Boolean
'
' Charles Williams 9/Jan/2009
'
' Return False if the parameter refers to as-yet uncalculated cells
'
Dim vHasFormula As Variant
IsCalced = True
On Error GoTo Fail
If TypeOf theParameter Is Excel.Range Then
vHasFormula = theParameter.HasFormula
'
' HasFormula can be True, False or Null:
' Null if the range contains a mix of Formulas and data
'
If IsNull(vHasFormula) Then vHasFormula = True
If vHasFormula Then
'
' CountA returns 0 if any of the cells are not yet calculated
'
If Application.WorksheetFunction.CountA(theParameter) = 0 Then IsCalced = False
End If
ElseIf VarType(theParameter) = vbEmpty Then
'
' a calculated parameter is Empty if it references uncalculated cells
'
IsCalced = False
End If
Exit Function
Fail:
IsCalced = False
End Function

This function handles both range references and calculated ranges (array formula expressions etc) and checks if ALL the cells in the parameter contain formulas and ANY of the cells are uncalculated.

Only Variant and Range Parameters can be uncalculated.

Only a UDF parameter defined as a Range or a Variant can be uncalculated. If all your paramters are defined as, for instance, Double then Excel will attempt to coerce the parameter to Double before passing it to the UDF, and if the Paramter actually refers to an uncalculated cell then the UDF will not be called.

Multiple UDF Recalcs caused by the Function Wizard

Whenever you use the Function Wizard with a UDF the UDF gets called lots of times, because the Function Wizard uses Evaluate to dynamically show you the result of the function as you enter the parameters for the function. This is not good if your UDF is slow to execute!
You can detect that the UDF has been called by the function wizard by checking if the Standard commandbar is enabled (this works in Excel 2007 and Excel 2010 even though the commandbars are not visible).

If Not Application.CommandBars("Standard").Controls(1).Enabled Then Exit Function

Multiple UDF Recalcs with multi-cell Array Formula UDFs

Useing an array UDF that returns results to multiple cells can be a very good way of speeding up UDF execution (see Part 5 – Array UDFs go faster), but there is a nasty slowdown bug you should be aware of:
When a multi-cell UDF is entered or modified and depends on a volatile formula: the UDF is evaluated once for each cell it occupies. This does not happen when the UDF is recalculated, only when it is entered or changed.

UDFs in Conditional Formatting formulas.

Formulas in Conditional Formatting rules get evaluated each time the portion of the screen containing the conditional format gets redrawn or recalculated (you can demonstrate this by using a Debug.Print statement in a UDF being used in a conditional formatting rule). So on the whole using UDFs in Conditional formats is probably not a great idea.

Conclusion

If you have UDFs which take a long time to execute it makes sense to add code to check for both uncalculated cells and the UDF being called by the function wizard.


Evaluate Functions and Formulas fun: How to make Excel’s Evaluate method twice as fast

November 2, 2011

Prompted by a comment from Sam on Match vs Find I thought I would take a look at Excel’s rather quirky Evaluate method with Excel 2010 to see how it performed.

The Evaluate method internally uses Excel’s formula parser and calculator, and this makes it surprisingly powerful. You can use it on virtually any kind of formula, range reference or Defined Name. But, as we will see, it does have a number of strange “quirks” that you have to navigate around.

Depending on the context Evaluate will either return an object (for example a Range) or values.

I will be using exactly the same test setup of 100000 rows of randomly generated XY pairs and timing routine as in Match vs Find, so you can directly compare the results.

Using the square brackets [ ] shortcut for Evaluate

Sam’s comment suggested using [COUNTIFS] to see how the timing compared with MATCH and FIND. Of course its not quite the same thing because the loop on Match and Find allows the VBA to do something for each XY pair found. Sam’s VBA looks like this:

Sub FindXY_COUNTIFS1()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = [COUNTIFS(A1:A100000,"x",B1:B100000,"y")]
Debug.Print "COUNTIFS1 " & j & " " & (Microtimer - dTime) * 1000
End Sub

It takes 11.6 millisecs to find the 25135 XY pairs generated using a constant of 0.5 in the test data generator.

[ ] is a shortcut for Application.Evaluate. The advantage of using the [ ] brackets is that it is concise and the formula inside the [ ] is exactly the same as when used in a worksheet cell. The disadvantage is that you cannot generate the formula as a string. I tend to only use this notation when evaluating my hidden workbook-scoped defined names, because they are not likely to change. (Of course sometimes I get lazy …)

Using Application.Evaluate instead of [ ]

You can use Evaluate or Application.Evaluate with a string instead of the [ ]

j = Evaluate("COUNTIFS(A1:A100000," & Chr(34) & "x" & Chr(34) & ",B1:B100000," & Chr(34) & "y" & Chr(34) & ")")
The timing is virtually identical to the [ ] shortcut method.

Application.Evaluate and the Activesheet

One trap for the unwary with [ ] , Evaluate and Application.Evaluate is that all references that do not contain a worksheet (unqualified references like A1:A100000) are assumed to refer to whatever the Active sheet currently happens to be.
So if you are going to use Application.Evaluate you should always use a qualified reference (Sheet1!A1:A100000) unless you like your code to live dangerously.

Worksheet.Evaluate

Worksheets and Charts also have an Evaluate Method. When you use these methods unqualified references are assumed to refer to the worksheet or chart.

Sub FindXY_COUNTIFS3()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = Worksheets("Sheet1").Evaluate("COUNTIFS(A1:A100000," & Chr(34) & "x" & Chr(34) & ",B1:B100000," & Chr(34) & "y" & Chr(34) & ")")
Debug.Print "COUNTIFS3 " & j & " " & (Microtimer - dTime) * 1000
End Sub

Now for the surprise: Worksheet.Evaluate is twice as fast as Application.Evaluate!
(actually 6.1 millisecs as opposed to 11.6 millisecs)

I am fairly sure that there is a bug in Application.Evaluate that actually does the evaluation twice.
Certainly if you use Application.evaluate on a UDF the UDF will be executed twice.

Chart.Evaluate

In the real world I have never actually used Chart.Evaluate (probably because I hate programming Chart objects), but according to Help it seems you can do interesting things with it:

“Chart Objects. You can specify any chart object name, such as “Legend”, “Plot Area”, or “Series 1″, to access the properties and methods of that object. For example, Charts("Chart1").Evaluate("Legend").Font.Name returns the name of the font used in the legend.”

Evaluating Array Formulas

Amazingly if you give Evaluate an array formula it evaluates it as an array formula:

Sub FindXY_COUNTIFS4()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = ActiveSheet.Evaluate("SUM((A1:A100000=" & Chr(34) & "x" & Chr(34) & ")*(B1:B100000=" & Chr(34) & "y" & Chr(34) & "))")
Debug.Print "COUNTIFS4 " & j & " " & (Microtimer - dTime) * 1000
End Sub

This is quite a lot slower than COUNTIFS: it takes nearly 19 milliseconds.
If you are a SUMPRODUCT fan you could use

j = ActiveSheet.Evaluate("SUMPRODUCT(--(A1:A100000=" & Chr(34) & "x" & Chr(34) & "),--(B1:B100000=" & Chr(34) & "y" & Chr(34) & "))")
But its not significantly faster.

Evaluate speed compared to a formula in a cell

You would expect Evaluate to be slower than Excel natively calculating the formula in a cell. And indeed it is, but its quite close for a single formula;

Countifs Formula 6.0
Evaluate Countifs 6.1

Array Sum Formula 16.9
Evaluate Array Sum 18.9

Evaluate speed compared to using Application.Worksheetfunction

It looks like there is a higher overhead to using Evaluate, which is what you would expect.
Here is a version of the FindXY sub using Evaluate with MATCH instead of Worksheetfunction.Match.

Sub FindXYEval()
Dim oRng As Range
Dim oLastRng As Range
Dim j As Long
Dim jRow As Long
Dim n As Long
Dim Rw As Long
Dim dTime As Double
dTime = Microtimer
Set oRng = Range("a1:A100000")
Set oLastRng = oRng(oRng.Rows.Count)
Rw = oLastRng.Row
On Error GoTo finish
With Application.WorksheetFunction
Do
Set oRng = Range(oRng(j + 1), oLastRng) '<<< Rw
End With
finish:
Debug.Print "MatchEval " & n & " " & (Microtimer - dTime) * 1000
End Sub

This takes 3720 milliseconds compared to 478 milliseconds using Worksheetfunction.Match. There are just over 50000 calls to Evaluate or Match so I reckon the additional overhead of using Evaluate is about 65 Microseconds per call.

More Evaluate Limitations: Updated

  • The string being evaluated must be less than 256 characters, even in Excel 2010.
  • A1 style references can be evaluated in both A1 and R1C1 reference mode (Application.ReferenceStyle), but R1C1 style references can only be evaluated in R1C1 mode.
  • Relative references in the string are treated as absolute, unless they are contained in defined names in which case the defined name is evaluated with respect to cell A1.
  • Dates should be in USA format (Month-Day-Year).
  • Evaluate will return an error value if the string formulae contains external references to closed workbooks or XLM functions.
  • Using Evaluate INDEX(rng,rownum,COLUMN()) gives incorrect answers except for the first column. Evaluate 0+INDEX(rng,rownum,COLUMN()) works
  • If the string formula contains a reference to both a UDF and a name it will fail with an error 2029 if the name reference occurs AFTER the UDF reference:
    • If fred is a named range and xyz() is a user defined VBA function then this statement returns error 2029: application.Evaluate(“=xyz(b1)+fred”)
    • This statement returns the correct value: application.Evaluate(“=fred+xyz(b1)”)
    • Microsoft KB article 823604 identifies this problem but does not correctly diagnose the circumstances that cause it.

You can bypass many of these limitations (at the cost of performance) by inserting the formula string into a worksheet cell and then reading the resulting cell value back into a VBA variable.

Error Handling

If Evalaute cannot evaluate the formula string it returns an error rather than raising an error, so the result of Evaluate should always be assigned to a Variant.

Conclusion

  • The Evaluate method adds a lot of power to VBA
  • Always use Worksheet.Evaluate rather than Application.Evaluate: its twice as fast and less error-prone
  • Using Worksheet.Evaluate has comparable speed to a cell formula but a higher overhead
  • Worksheetfunction has a lower overhead than Evaluate
  • Beware the Quirks of Evaluate!

So whats your experience of Evaluate?


MATCH vs FIND vs Variant Array VBA Performance Shootout in Excel 2010

October 26, 2011

When searching unsorted data in VBA I have always tended to use MATCH rather than FIND, mainly because early versions of Excel before Excel 2003 did not allow you to use FIND in UDFs.

But prompted by a discussion in one of the Excel forums I thought it was about time I revisited Find and Match to see which performs better in Excel 2010, and for good measure lets compare them to getting the data into a variant array and looping through the array.

Generating Test Data

Andreas Killer came up with a nice routine for generating test data:

Sub Init()
Dim Data(1 To 100000, 1 To 2)
Dim i As Long
Rnd -5
For i = 1 To UBound(Data)
If Rnd > 0.5 Then Data(i, 1) = "X"
If Rnd > 0.5 Then Data(i, 2) = "Y"
Next
Cells.ClearContents
Range("A1").Resize(UBound(Data), UBound(Data, 2)) = Data
End Sub

This code randomly generates X in column 1 and Y in column 2 in a range of 100 thousand rows, with the number of Xs and Ys controlled by the constant 0.5
Changing the 0.5 to 0.9 will give few Xs and Ys, and changing it to 0.001 will give lots.

Since the X and the Y are using different random numbers there will be rows with X but no Y and Y but no X, as well as rows with both X and Y.

This makes it easy to test how the various methods compare with different densities of data.

Timing

I am using the MicroTimer high-resolution timer API to give timing accuracy at the microsecond (millionths of a second) level, but the timing results will all be in milliseconds.

#If VBA7 Then
Private Declare PtrSafe Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#Else
Private Declare Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#End If
Public Function MicroTimer() As Double
'
' returns seconds
' uses Windows API calls to the high resolution timer
'
Dim cyTicks1 As Currency
Dim cyTicks2 As Currency
Static cyFrequency As Currency
'
MicroTimer = 0
'
' get frequency
'
If cyFrequency = 0 Then getFrequency cyFrequency
'
' get ticks
'
getTickCount cyTicks1
getTickCount cyTicks2
If cyTicks2 < cyTicks1 Then cyTicks2 = cyTicks1
'
' calc seconds
'
If cyFrequency Then MicroTimer = cyTicks2 / cyFrequency
End Function

The code uses conditional compilation for the Windows API calls so that it will work for both Excel 2010 32-bit and 64-bit Excel.

Using FIND

Here is the test code using Range.Find

Sub FindXY1()
Dim oRng As Range
Dim n As Long
Dim dTime As Double
Dim FirstAddress As String
dTime = MicroTimer
'
With Range("a1:A100000")
Set oRng = .Find("X", , xlValues, , , xlNext, False)
If Not oRng Is Nothing Then
FirstAddress = oRng.Address
If oRng.Offset(0, 1) = "Y" Then n = n + 1
Do
Set oRng = .FindNext(oRng)
If Not oRng Is Nothing Then
If oRng.Offset(0, 1) = "Y" And oRng.Address <> FirstAddress Then
n = n + 1
End If
End If
Loop While Not oRng Is Nothing And oRng.Address <> FirstAddress
End If
End With
'
Debug.Print "Find " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The Sub does a Range.Find on the 100 thousand rows looking for X, and then checks if column 2 for that cell also contains Y.
Then this is repeated using FindNext until we have looped back to the first range address.
When completed the number of rows with both X and Y (this is less than the number of times an X was found) and the time in milliseconds is shown in the Immmediate window.

Using Match

Here is the code using WorksheetFunction.MATCH

Sub FindXY2()
Dim oRng As Range
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
Set oRng = Range("a1:A100000")
On Error GoTo Finish
With Application.WorksheetFunction
Do
j = .Match("X", oRng, 0)
If oRng(j, 2).Value2 = "Y" Then n = n + 1
Set oRng = oRng.Resize(oRng.Rows.Count - j, 1).Offset(j, 0)
Loop
End With
Finish:
Debug.Print "Match " & n & " " & (MicroTimer - dTime) * 1000
End Sub

This Sub works by using Worksheetfunction.Match to find the first occurrence of X within the Range object oRng.
After each X is found oRng is resized to exclude the range already searched.

The loop terminates either when nothing is found (Worksheetfunction.Match raises an error)  or the Resize fails, also raising an error.

Using a Variant Array

Sub FindXY3()
Dim vArr As Variant
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
vArr = Range("a1:B100000").Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = "X" Then
If vArr(j, 2) = "Y" Then
n = n + 1
End If
End If
Next j
Debug.Print "Var array " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The code gets the Range into a Variant, creating a 2-dimensional array, and then loops down the array looking for X and Y.

Timing Results

I ran each sub 4 times using different densities of data.

The first is with a single XY pair at row number 100000. This tests the scanning speed per row rather than the calling overhead.

Then I used the Test data generator with constant values of 0.9, 0.5 and 0.001.

Here are the results giving timings in Milliseconds with counts of the XY pairs:

 
 
 
 
 

And here are the ratios of the times:

 
 
 
 
 

  • You can see that Find is significantly slower than Match or using a Variant array, regardless of the number of XY pairs found.
  • The timings for Match increase fast with the number of XY pairs, whereas the Variant array increases much more slowly.
  • This is because there is a much higher overhead for each call to Match than looping from row to row of the array.
  • For small numbers of XY pairs Match is faster than the Variant array.

Conclusions

  • Don’t use Range.Find unless you want to search a large number of columns for the same thing (you would have to do a Match for each column).
  • The Variant array approach is surprisingly effective, particularly when you expect a large number of hits.
  • Match wins easily for a small number of hits.

OK, so who is going to admit to using Range.Find?


Writing efficient VBA UDFs (Part 6) – Faster string handling and Byte arrays

October 18, 2011

None of  the previous posts on writing efficient VBA UDfs (Part1,Part2,Part3,Part4,Part5) talked about handling strings in VBA.
This could be a major omission since string-handling is one of VBAs slowest “features”.

Suppose you want to find the position of the first capital letter in a string.

Array Formula

You could use an array formula like this:

{=MATCH(TRUE,ISERR(FIND(MID(A5,ROW($1:$255),1),
LOWER(A5))),0)}

My test data is 2000 rows, each containing 25 lower-case characters and one randomly placed upper-case character.

2000 calls to this array formula takes 250 milliseconds.

So lets try some VBA UDFs.

Using LIKE

One way is to use the VBA LIKE statement:

Function FirstCap2(Cell As Range)
For FirstCap2 = 1 To Len(Cell.Value)
If Mid(Cell.Value, FirstCap2, 1) Like "[A-Z]" Then
Exit For
End If
Next FirstCap2
End Function

The code loops across the string using Mid to look at each character in turn, and then uses LIKE to see if the character is one of upper-case A to upper-case Z.
2000 calls to this UDF takes 50 milliseconds – a factor of 5 faster, but we can make it faster (of course).


Function FirstCap3(Rng As Range) As Long
Dim theString As String
theString = Rng.Value2
For FirstCap3 = 1 To Len(theString)
If Mid$(theString, FirstCap3, 1) Like "[A-Z]" Then
Exit For
End If
Next FirstCap3
End Function

I changed the code to only get the string out of the cell once, and to use Mid$ rather than Mid. All the VBA string handling functions have 2 versions: versions without the $ work with variant arguments, whereas versions with the $ suffix only work on string arguments, but are slightly faster.
2000 calls to this version of the UDF takes 17 milliseconds, nearly 3 times faster.

Using MID$

But maybe using LIKE is slow? Lets try comparing a lower-case version of the string and stopping when the characters don’t match:

Function FirstCap4(strInp As String) As Long
Dim tmp As String
Dim i As Long
Dim pos As Long
tmp = LCase$(strInp)
pos = -1
For i = 1 To Len(tmp)
If Mid$(tmp, i, 1) <> Mid$(strInp, i, 1) Then
pos = i
Exit For
End If
Next
FirstCap4 = pos
End Function

Well surprisingly this is slower than the optimised version using LIKE:
2000 calls to this version of the UDF takes 36 milliseconds.

Using Byte Arrays

Using Byte arrays with strings is one of VBAs less well known secrets, but its often an efficient way of handling strings when you need to inspect each character in turn.


Public Function FirstCap5(theRange As Range) As Long
Dim aByte() As Byte
Dim j As Long
FirstCap5 = -1
aByte = theRange.Value2
For j = 0 To UBound(aByte, 1) Step 2
If aByte(j) < 91 Then
If aByte(j) > 64 Then
FirstCap5 = (j + 2) / 2
Exit For
End If
End If
Next j
End Function

This version of the UDF is slightly faster: 2000 calls takes 15 milliseconds.

So how does this work?

First create an undimensioned array of Bytes : Dim aByte() as Byte
Then assign a string to it: aByte=”abEfg”
You can use the Locals window to see what the resulting Byte array looks like:

Each character in the string has resulted in 2 bytes which are the Unicode code points for the character. Since I am working in a UK English Locale using the Windows Latin-1 codepage the first byte is the ANSI number for the character and the second byte is always zero.

Unaccented english upper-case characters are ANSI numbers 65 to 90, so I can loop down the byte array, looking at every other byte, and do a numeric test directly on the character to see if it is upper-case. You can see that only the third character is upper-case.

Another surprising feature of  this kind of Byte array is that you can assign a byte array directly back to a string:

Dim str1 as string
str1=aByte

Str1 now contains “abEfg”

Array version of the Byte UDF

As discussed in Part 5 of writing efficient UDFs, Array Formulae go faster. So here is an array formula version of the Byte UDF.

Public Function AFirstCap(theRange As Range) As Variant
Dim aByte() As Byte
Dim j As Long
Dim L As Long
Dim vRange As Variant
Dim jAnsa() As Long
Dim NumCells As Long
vRange = theRange.Value2
NumCells = UBound(vRange, 1)
ReDim jAnsa(NumCells - 1, 0)
For L = 0 To NumCells - 1
jAnsa(L, 0) = -1
aByte = vRange(L + 1, 1)
For j = 0 To UBound(aByte, 1) Step 2
If aByte(j) < 91 Then
If aByte(j) > 64 Then
jAnsa(L, 0) = (j + 2) / 2
Exit For
End If
End If
Next j
Next L
AFirstCap = jAnsa
End Function


This version, entered into 2000 rows as an array formula using Control/Shift/Enter, takes just 4.8 milliseconds.

Conclusion

Here is a table comparing the speed of these different approaches.

Method

Milliseconds

Array Formula

250

LIKE UDF

50

Optimised LIKE UDF

17

MID$ UDF

36

Byte Array UDF

15

Array Formula version of Byte Array UDF

4.8

So the fastest VBA is just over 10 times faster than the slowest VBA solution, and a whopping 52 times faster than the array formula solution.

Using Byte arrays for strings can be a good solution for string handling where you need to inspect or manipulate many individual characters.

So what do you use Byte arrays for?


Follow

Get every new post delivered to your Inbox.