Archive for the ‘Calculation’ Category

Table Extension recalculation bug in Excel 2010 but not 2007

October 11, 2012

gizzzzmo has found an unexpected recalculation bug in Excel 2010.

You can duplicate it using the following steps.

Start Excel 2010 with a new blank workbook in Automatic Calculation mode.

  • in A1 enter =MAX(A4:A5)
  • in A2 enter =A1
  • in A3:A5 create a Table (Insert tab->Table, check My Table has Headers)
  • in A4 enter 1
  • in A5 enter 2

It should now look like this:

Now enter 3 into cell A6 and the screen changes to this:

The Table has been extended to include cell A6 and the formula in cell A1 has automagically been changed to reflect this (  =MAX(A4:A6)  ) , so cell A1 correctly shows 3.

But cell A2 still shows 2, even though its formula references cell A1, which has changed, and we are in automatic calculation mode so it should show 3. Recalc BUG!

If you now change cell A6 to 4 BOTH cells A1 and A2 correctly get recalculated, so the dependency tree is still valid.

This bug does not exists in Excel 2007, but is still there in my installation of Excel 2013 Tech Preview.

If you create the Table before entering the =MAX() formula then Excel creates a structured Table reference =MAX(Table1[Column1])  and recalculation works correctly.

 

Makeing the most of your XIPS Part2 – when 40 MXIPS for AVERAGEIFS is too slow

July 9, 2012

Peter wants to calculate a rolling average over 600K rows. His data consists of a Timestamp in Column A and a Value in column B:

+-Timestamp-+-value-+
| 1340816430|  .02  |
---------------------

He is using an array formula in each of the 600K rows (although it would work as an ordinary non-array formula):

{=AVERAGEIFS(B:B,A:A,"<"&A1+1000,A:A,">"&A1-1000)}
This calculates the average of the values starting at -1000 timestamp units and ending at +1000 timestamp units.

I setup some test data for the full 1048576 rows and found that 500 of the AVERAGEIFS formulas took 25 seconds to calculate on multithreaded 4 core system.

How many MXIPS can AVERAGEIFS do?

AVERAGEIFS is a fast function – its calculating 500 x (2000000 comparisons and averages of matching results) – say 1000 million operations in 25 seconds = 40 MXIPS (Million eXcel Instructions Per Second). The only problem is that Peter wants to do 600K of these, not 500, and it looks like that will take over 8 hours!

A more complicated but faster solution

Lets assume that the data is sorted ascending on Timestamp (anyway Excel is very fast at sorting data). Start by making sure that Excel is in Manual calculation mode!

Then in column C put

=IFERROR(MATCH(A1-1000,$A:$A,1),1)

and copy down. This finds the row that is 1000 timestamp units before the current row. The MATCH function is using its binary search of sorted data algorithm, and thats lightning fast. The IFERROR is to handle the starting condition for the first rows where there are no Timestamps at -1000 from the current Timestamp.

In Column D put

=IFERROR(MATCH(A1+1000,$A:$A,1),1048576)

and copy down. This finds the row that is 1000 timestamp units after the current row, in a similar way to column C

Now we know that start and end row numbers of the block of rows that Peter wants to average. So we can use OFFSET to get just that subset of rows, and feed it to AVERAGE.
So in Column E put

=AVERAGE(OFFSET(B1,C1-ROW(),0,D1-C1+1,1))

and copy down.

Now press Ctrl/Alt/F9 for a full calculation – on my system that takes just 20 seconds – several thousand times faster than AVERAGEIFS!

Designing the SpeedTools function library – my VBA to C Journey part 1

June 19, 2012

This is the first of a series of posts about my perilous voyage from VBA to C++.

I am starting with the (current) design of FastExcel V3 SpeedTools Beta 2 (bit of a mouthful so lets just call it SpeedTools).

SpeedTools contains a library of over 80 additional Excel functions as well as additional calculation modes beyond Automatic and Manual.
SpeedTools has to support all Excel versions from Excel 2000 to Excel 2010 64-bit and Excel VNext (I don’t know what MSoft are going to call it!).

Using both VBA and C++

The functions are mostly implemented in C++ as an XLL (using Planatech XLL Plus), but I also need to handle a variety of application level events and a form for the settings and options. I decided it was simpler to handle these using VBA, as well as some things that are not directly available through the C XLL API. For Beta 1 I implemented this as 2 separate addins (one XLA and one XLL) both loaded using the OPTIONS registry keys.

In Beta 2 implementing XLL functions that handled full columns efficiently meant that I had to access the used range from the XLL functions via COM, which turned out to mean that the XLL had to be loaded after a workbook was opened.
If you load the XLL via registry keys it opens before the first workbook, so I decided to load it from the XLA (using Application.RegisterXLL) at the first non-XLA workbook activate.

The VBA XLA also detects whether Excel is 64 or 32 bit and loads the appropriate XLL.

There are some occasions when data needs to be passed (both directions) between the VBA XLA and the C++ XLL. This requirement is mainly down to the need to store memory lookup persistent data in the workbook. I use either C external declarations or Application.Run which handles tricky stuff like string conversions) for this.

The User Interface

For Excel 2007 and later the user interface is created as 2 groups on the formula tab using Ribbon XML, with help from the Ken Puls Ribbon X book, and PED Version 2.

Clicking one of the buttons launches a userform to allow you to select a function from a list with a short description:

Choosing a function and clicking OK launches the Function Wizard. This is done by entering the function into the selected cell using a comma as the argument to the function, for example =LISTDISTINCTS.SUM(,) , and then calling the function wizard using Application.Dialogs(xlDialogFunctionWizard).Show.

The second group on the Formulas tab allows you to control the SpeedTools calculation settings and time the calculation of ranges:

The ribbon XML and associated action code is contained in a XLAM file, which is launched by the main VBA XLA if the Excel version is 2007 or later. The action code then calls the appropriate code in the main VBA XLA using Application.Run.

For Excel 2003 and earlier the UI is provided by Toolbar code in the main VBA XLA. The toolbar is only loaded for Excel 2003 and earlier.


Getting it all together with an Installer

There are currently 12 files that all need to be copied to a single folder at installation time, so I use a custom script for Setup Factory. The custom script also writes the registry keys to load the main VBA XLA and handles the uninstall. The 12 files currently are:

  • The Custom PNG images for the ribbon (the toolbar custom images are embedded in the XLA).
  • The CHM help files and PDF manuals
  • The 32 bit and 64 bit versions of the XLL
  • The main VBA XLA
  • The VBA XLAM for the ribbon UI.
  • Readme.txt and License.txt

Its essential to have most of these files in a single folder and to avoid Excel’s disastrous attempts to put addins in weird and wonderful places. Using an installer makes all this simple.

Next

I plan the next post in this series to be about using the XLL Plus function wizards to create XLL functions.

Vlookup tricks: the wild side of VLOOKUP

April 5, 2012

Here is another VLOOKUP post for Bill Jelen’s (now extended) VLOOKUP week.

VLOOKUP Week – Brought to You by Mr. Excel
(Click Image for the Official Site of VLOOKUP Week)

Here is a little-used wild VLOOKUP trick that can sometimes get you out of a hole! And its got a great XIPS rating!

Wild-Card Lookup with VLOOKUP

If you are doing exact-match lookup on text data you can use wild-cards in the Lookup Value.

Suppose you have a table of names and ID Numbers called IDNumbers which looks like this:

then

=VLOOKUP(“*n*”,IDNumbers,2,FALSE)

will find the first name that contains an n (happens to be Jan-Karel) and returns his ID which is 2345.

The Range-Lookup argument (4th argument) has to be FALSE for this technique to work: sorting the data in wild-card sequence is not yet something Excel can do!

You can use the wild card characters asterisk (*) and question mark (?). An asterisk matches any sequence of characters, including none and a question mark matches any single character.

If you want to match text that includes asterisks and/or question marks you have to precede them with a tilde character (~). So “*~*” would match any string that ended in *

The Wild-Card XIPS rating

Wild-card VLOOKUPs perform reasonably fast. 10 wild-card VLOOKUPS each looking up 500000 rows and not finding a match takes just over half a second single-threaded on my system. Thats a XIPS rating (eXcel Instructions per Second) of 10 x 500000 / 0.5 = 10000000.

If I switch on Multi-Threaded Calculation it takes just 0.149 seconds which is 33.5 MXIPS (Millions of Excel Instructions per second)

33 million comparisons per second can’t be bad.

VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

March 29, 2012

OK, time for a post on VLOOKUP for Bill Jelen’s VLOOKUP week.

VLOOKUP Week – Brought to You by Mr. Excel
(Click Image for the Official Site of VLOOKUP Week)

For all you VLOOKUP junkies who can’t get enough of VLOOKUP, here is a wonderful reason to use 2 VLOOKUPs instead of 1 !!

Exact Match VLOOKUP is slow

When you have a lot of data in Excel (think 10,000 or 100,000 or 1,000,000 rows) you often need to detect when a value you are looking for does not exist in the data.
You can do this using VLOOKUP and the formula would look something like this, supposing you were trying to get information about a Part Number from the 4th column in a Parts List:

=VLOOKUP(PartNumber,PartsList,4,FALSE)

If Part Number is not in the Parts List then the formula will return #N/A. The final VLOOKUP argument (FALSE) tells VLOOKUP to do a linear exact match search of the Parts List, starting at the first row and looking at each row in turn until it finds the Part Number or reaches the end of the Parts List.

This works well with a small number of VLOOKUPs and when Parts List has a small number of rows, but for large numbers of VLOOKUPs with a large PartsList it can take minutes to calculate.

This problem is made even worse by the fan-out effect of making a change to the Parts List table.
Because of the way that Excel’s smart recalculation engine tracks dependencies, if you make a change to a particular row the Parts List table ALL the formulas that refer to Parts List will be recalculated, even if the result of the formula does not use the row that was changed.

So how do you speed up this slow calculation?

If you sort the PartsList table on Part Number then it would be nice to be able to use a formula like this:

=VLOOKUP(PartNumber,PartsList,4,TRUE)

The TRUE as the final VLOOKUP argument tells VLOOKUP to do a binary search of the parts list.

Binary Search is lightning fast compared to the linear search.

But it does not care about missing data: if the PartNumber does not exist VLOOKUP will happily find the next largest part number that is less than Part Number.

And it won’t tell you that it could not find the Part Number, so its extremely difficult to spot that you have got the wrong answer!

Thats why many people say you should never use the TRUE argument in VLOOKUP.

VLOOKUP has a major design error

Oh and by the way TRUE is the default if you don’t enter anything for the fourth argument.
So if your data is not sorted, or you have missing data, and you forget to add the FALSE argument you will get the wrong answer.

The magic solution: 2 VLOOKUPs are better than 1

OK, so here is the magic trick you can use to make safe fast VLOOKUPs on sorted data, even when there may be missing or mismatched data.

Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.

=VLOOKUP(PartNumber,PartsList,1,TRUE)

If the PartNumber exists this formula will return the same Part Number as the PartNumber you gave it.

But if the PartNumber does not exist the formula will give you a different part number!

So we can use IF to find missing Part Numbers:

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, “Exists”, “Missing”)

And so for the final magic formula you just replace the “Exists” with the VLOOKUP for the data you want!

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)

Because VLOOKUP binary search is so much faster than VLOOKUP linear search this trick is faster even with small numbers of rows (50 or so).

More LOOKUP Tricks

My website has a page showing many tricks for using Lookups:

http://www.decisionmodels.com/optspeede.htm

And there are 2 earlier posts in this blog on Lookups:

http://fastexcel.wordpress.com/2011/07/20/developing-faster-lookups-part-1-using-excels-functions-efficiently/

http://fastexcel.wordpress.com/2011/07/22/developing-faster-lookups-part-2-how-to-build-a-faster-vba-lookup/

Writing efficient VBA UDFs Part 10 – Volatile Functions and Function Arguments

February 2, 2012

I just realised that none of my previous 9 posts on writing efficient VBA UDFs has discussed when and why you should make Functions Volatile. Since that’s fairly fundamental I really should have covered the topic early in the series … but anyway here goes.

What does Volatile mean?

Normally Excel’s smart recalculation engine only recalculates formulas that either have been changed/entered or depend on a cell or formula that has been changed somewhere higher up the chain of precedents for the formula.

This makes for very efficient calculation speed since in a typical workbook only a small faction of the formulas will be dependent on any particular cell or piece of data.

But some functions need to recalculate at every recalculation. For example NOW() should always give you the current time at the last calculation, and RAND() should give you a different random number each time it is calculated. These functions are called Volatile Functions, and any formula that uses one of them is a Volatile formula.

You can see more discussion of Excel’s built-in volatile functions and the volatile actions that trigger a recalculation at http://www.decisionmodels.com/calcsecretsi.htm.

How does Excel’s smart recalc engine know when to recalculate a function or a formula?

Excel maintains its dependency trees by looking at what other cells a function or a formula refers to, and the smart recalc engine uses these dependency trees to work out which formulas to recalculate.

For Functions Excel only looks at the arguments to the function to determine what the function depends on. So if you write a function like this:


Function Depends(theCell as range)
Depends=ActiveSheet.range("Z9")+theCell + _
theCell.Offset(0,1)
End Function

and call it in a formula =Depends(“A1″)
then Excel will only recalculate your function when A1 changes, and not when B1 or Z9 changes.

This could give you incorrect results.

Note: During a recalculation if Excel does evaluate the UDF it determines which cell references are actually being used inside the function to affect the function result, and if those cells have not yet been finally calculated it will reschedule the Function for later calculation. This is required to make the UDF be finally calculated in the correct dependency sequence.

How to fix this problem

There are several ways to fix this problem, but only one good one!

Make the function Volatile

If you add Application.Volatile to the function it will always recalculate:

Function Depends(theCell as range)
Application.Volatile
Depends=ActiveSheet.range("Z9")+theCell+ _
theCell.Offset(0,1)
End Function

But this will slow down the calculation, so generally its a bad idea unless, like RAND() or NOW() the function really needs to be Volatile.

Use Ctrl/Alt/F9 to trigger a full calculation

If you press Ctrl/Alt/F9 then Excel will recalculate every single formula in all the open workbooks, regardless of what has changed or is volatile.
Of course this can be very slow.

Make sure the Arguments to the UDF refers to ALL the cells the UDF uses.

Change the UDF to

Function Depends(theCell1 as range, theCell2 as range)
Depends = theCell1.Resize(1, 1)+ _
theCell1.Resize(1, 1).Offset(0, 1) + theCell2
End Function

This is the best solution.

Call it using =Depends(A1:B1,Z9) so that Excel knows that B1 is being referenced by theCell1.Offset(0,1).

Now Excel knows all the cells that the function depends on and it will be recalculated correctly and efficiently.

Detecting whether a Function or Formula is Volatile

You can download VolatileFuncs.zip from http://www.DecisionModels.com/Downloads/VolatileFuncs.zip

This contains tests for the volatile Excel built-in functions, using a function to increment a counter each time the referenced cell changes.
Public jCalcSeq As Long ''' calculation sequence counter
'
Public Function CalcSeqCountRef(theRange As Range) As Variant
'
' COPYRIGHT © DECISION MODELS LIMITED 2000. All rights reserved
'
' increment calculation sequence counter at Full Recalc or when theRange changes
' fixed for false dependency
'
jCalcSeq = jCalcSeq + 1
CalcSeqCountRef = jCalcSeq + (theRange = theRange) + 1
End Function

Summary

Make sure that the arguments to your UDF always directly refer to ALL the cells that the UDF uses.

Writing Efficient UDFs Part 9 – An Example – Updated

January 31, 2012

Pedro wants to know how to speed up his UDF, which needs to calculate results for 35040 cells the minimum difference between the cell and a column of values of unknown length.

Pedro’s UDF

Function MinofDiff(r1 As Long) As Variant
Dim r2 As Range
Dim TempDif As Variant
Dim TempDif1 As Variant
Dim j As Long
Dim LastRow As Long
On Error GoTo FuncFail
If r1 = 0 Then GoTo skip
With Sheets("Dados")
LastRow = .Cells(.Rows.Count, "P").End(xlUp).Row
Set r2 = .Range("P8", "P" & LastRow)
End With
TempDif1 = Application.Max(r2)
For j = 1 To LastRow – 7
If r1 >= r2(j) Then
TempDif = r1 – r2(j)
Else
TempDif = r1
End If
MinofDiff = Application.Min(TempDif, TempDif1)
TempDif1 = MinofDiff
Next j
skip:
Exit Function
FuncFail:
MinofDiff = CVErr(xlErrNA)
End Function

There is a fundamental problem with Pedro’s UDF: it is referencing a range in column P without passing it in as a parameter, so if anything changes in column P the UDF could give the wrong answer because Excel will not recalculate it. Pedro has done this so that the UDF can dynamically adjust to the number of entries in column P.

On test data with 60000 entries in column P 20 calls to the UDF take 18.5 seconds on my laptop, so 34K calls would take about 9 hours to calculate! So why is it so slow?

  • Every time the function is called (35K times) it finds the last row and the MAX value in column P: but this only needs to be done once.
  • 35040 calls will hit the VBE refresh slowdown bug: so we need to bypass that.
  • The For loop is referencing each cell value in column P (using R2(j) ) twice. Each reference to a cell is slow because there is a large overhead for each call out to the Excel object model.
  • The UDF uses Worksheetfunction.Min to find out which of 2 values is smaller: its much quicker to compare the values using VBA If than invoking a worksheet function.

The revised UDF

To solve the fundamental problem with the UDF I will pass it an additional parameter: a whole column reference to column P. Then the UDF can resize the range to the last cell containing data. (Another alternative would be to create a Dynamic Named Range for column P and pass that as a parameter.

To solve the first 2 slowdown problems the UDF will be made into an array formula UDF that returns an array of 35040 results.

To avoid referencing each cell in column P twice inside the loop, the UDF will get all the values from column P once, into a variant array and then loop on the variant array.

Function MinofDiff2(R1 As Range, R2 As Range) As Variant
Dim R2Used As Range
Dim vArr2 As Variant
Dim vArr1 As Variant
Dim vOut() As Double
Dim TempDif As Double
Dim TempDif1 As Double
Dim D1 As Double
Dim D2 As Double
Dim TMax As Double
Dim j1 As Long
Dim j2 As Long
Dim LastRow As Long
'
On Error GoTo FuncFail
'
' handle full column
'
LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row
Set R2Used = R2.Resize(LastRow - 7, 1).Offset(7, 0)
'
' get values into arrays
'
vArr2 = R2Used.Value2
vArr1 = R1.Value2
'
' find max
'
TMax = Application.Max(R2Used)
'
' set output array to same size as R1
'
ReDim vOut(1 To UBound(vArr1), 1)
'
' loop on R1
'
For j1 = 1 To UBound(vArr1)
TempDif1 = TMax
D1 = vArr1(j1, 1)
'
' loop on R2
'
For j2 = 1 To (LastRow - 7)
D2 = vArr2(j2, 1)
If D1 >= D2 Then
TempDif = D1 - D2
Else
TempDif = D1
End If
If TempDif < TempDif1 Then
vOut(j1, 1) = TempDif
Else
vOut(j1, 1) = TempDif1
End If
TempDif1 = vOut(j1, 1)
Next j2
Next j1
MinofDiff2 = vOut
skip:
Exit Function
FuncFail:
MinofDiff2 = CVErr(xlErrNA)
End Function

Because this is an array function you need to select the 35040 cells that you want to contain the answer, then type the formula into the formula bar =MinofDiff2(A1:A35040,P:P) and then press Ctrl/Shift/Enter to enter the formula as an array formula into the 35040 cells.

This revised UDF takes .222 seconds for 20 values, and completes the 35040 UDF calculations in 6.25 minutes, a speedup factor of over 80.

Updated with Harlan Grove’s suggestions

Harlan Grove has pointed out several ways of speeding up the UDF. Here is a revised version implementing most of his suggestions. It is about 17% faster than my original version.


Function MinofDiff3(R1 As Range, R2 As Range) As Variant
Dim R2Used As Range
Dim vArr2 As Variant
Dim vArr1 As Variant
Dim vOut() As Double
Dim TempDif As Double
Dim TempDif1 As Double
Dim D1 As Double
Dim D2 As Double
Dim TMax As Double
Dim TMin As Double
Dim j1 As Long
Dim j2 As Long
Dim LastRow As Long
'
On Error GoTo FuncFail
'
' handle full column
'
LastRow = R2.Cells(R2.Rows.Count, 1).End(xlUp).Row - 7
Set R2Used = R2.Resize(LastRow, 1).Offset(7, 0)
'
' get values into arrays
'
vArr2 = R2Used.Value2
vArr1 = R1.Value2
'
' find max & Min
'
TMax = Application.Max(R2Used)
TMin = Application.Min(R2Used)
'
' set output array to same size as R1
'
ReDim vOut(1 To UBound(vArr1), 1)
'
' loop on R1
'
For j1 = 1 To UBound(vArr1)
TempDif1 = TMax
D1 = vArr1(j1, 1)
TempDif = D1 - TMax
If D1 > TMax Then
If TempDif < TMax Then
vOut(j1, 1) = TempDif
Else
vOut(j1, 1) = TMax
End If
Else
If D1 < TMin Then
vOut(j1, 1) = D1
Else
'
' loop on R2
'
For j2 = 1 To LastRow
D2 = vArr2(j2, 1)
If D1 >= D2 Then
TempDif = D1 - D2
Else
TempDif = D1
End If
If TempDif < TempDif1 Then TempDif1 = TempDif
vOut(j1, 1) = TempDif1
Next j2
End If
End If
Next j1
MinofDiff3 = vOut
skip:
Exit Function
FuncFail:
MinofDiff3 = CVErr(xlErrNA)
End Function

Harlan also points out that a version using QuickSort to sort R2 and Binary Search instead of the loop would be an order of magnitude faster!

Excel 2010 Tables/Listobject: slow update and how to bypass

January 30, 2012

There was an interesting post on Stack Overflow recently about a performance problem when updating Excel 2007/2010 Tables/Listobjects. Certainly something strange is going on!

Duplicating the Problem

1. Create a table by selecting a few cells (I used A1:A3) on a sheet (Sheet1) and using Format as Table on the Home tab

2. On a different sheet (Sheet2) create a few thousand (I used 10000) of the simplest formula you can think of (I used =”A”)

3. Create a VBA Sub that updates a different cell in the same sheet as the table:


Sub Updater()
Dim j As Long
Dim dTime As Double
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
dTime = Timer
For j = 1 To 10000
Worksheets("Sheet1").Range("d5") = j
Next j
MsgBox Timer - dTime
End Sub

4. Select a cell within the Table

5. Run the VBA Sub

On my system that takes 8.8 seconds.

Bypassing the Problem

For this slowdown to occur each of the following conditions must be true:

  • A cell within the Table must be selected
  • The sheet containing the Table must be the Active Sheet
  • The cell being updated must be on the same sheet as the table, but does not have to be within the table
  • There must be a reasonable number of formulas in the workbook.

So change any or all of these conditions or delete all the formulas and the update will only take 0.5 seconds on my system.

Whats actually happening?

The time taken is proportional mostly to the number and slightly to the size of the formulas in the workbook, but none of the formulas are actually being calculated.

So it seems to me that each time the cell is updated Excel is scanning all the formulas in the workbook as though they might need to be changed.

Maybe this has something to do with the automatic extension of formulas within a table when you add a new row or the fact that the Table definition and its associated Name has to change if you add a new row.

But the fact that the slowdown only occurs if the Table is on the active sheet means that I think this is a bug.

Can you think of a better explanation?

Writing efficient VBA UDFs (Part 8) – Getting the previously calculated value from the calling cells

January 8, 2012

If you have a UDF that depends on some slow-calculating resource you may want the UDF to mostly just return the values obtained  at the last calculation from the cells the UDF occupies, and only occasionally go and use the slow-calculating resource.

In Excel 2010 you can create efficient C++ XLL UDFs that execute asynchronously and multithreaded, which can be a very effective solution.

But how do you get the previous value from the cells calling a VBA UDF?

Lets suppose that you want to pass the UDF a parameter for the slow-calculating resource and a switch to tell it when to use the slow resource. You can set the switch (I am using a defined name called “RefreshSlow”)  and refresh the UDFs in a VBA sub like this:

Sub RefreshUDFs()
Dim lCalcMode As Long
lCalcMode = Application.Calculation
Application.Calculation = xlCalculationManual
Names(“RefreshSlow”).RefersTo = True
Calculate
Names(“RefreshSlow”).RefersTo = False
Application.Calculation = lCalcMode
End Sub

I will use a dummy function to simulate getting a slow resource:

Function GetSlowResource(vParam As Variant) As Variant
Dim j As Long
For j = 1 To 10000000
Next j
GetSlowResource = Rnd()
End Function

This function (ignores the parameter) and just (slowly) returns a random number.

There are several ways of getting the previously calculated value for a UDF: they each have advantages and disadvantages.

Application.Caller.Value

You can use Application.Caller.Value, but this causes a circular reference that you have to switch on Iteration to solve. This is slow and can mask other unintentional circular refs, so its not recommended.


Function UDF1(vParam, Refresh)
If Not Refresh Then
UDF1 = Val(Application.Caller.Value2)
Else
UDF1 = GetSlowResource(vParam)
End If
End Function

Application.Caller.Text

If you use Application.Caller.Text you don’t get the circular reference, but it retrieves the formatted value that is displayed in the cell as a string. So if the cell is formatted as a number with 2 decimal places the retrieved value will be truncated to 2 decimal places.

Function UDF2(vParam, Refresh)
If Not Refresh Then
UDF2 = Val(Application.Caller.Text)
Else
UDF2 = GetSlowResource(vParam)
End If
End Function

This solution will work OK if you can control the formatting or the function returns a string.

Application.Caller.ID

You can use the Range.ID property to store and retrieve a string value within the UDF.

Function UDF3(vParam, Refresh)
Dim var As Variant
If Not Refresh Then
UDF3 = Val(Application.Caller.ID)
Else
var = GetSlowResource(vParam)
UDF3 = var
Application.Caller.ID = var
End If
End Function

This works well, except that the Range.ID property is not stored in a Saved workbook, so the next time you open the workbook the retrieved value will be Blank/Zero.

Using an XLM or XLL function to pass the Previous value to the UDF

Using XLM or XLL technology it is possible to create a non-multi-threaded command-equivalent function to retrieve the previous value.
Here is the code for an XLL+ function called PREVIOUS which has a parameter to make it Volatile or not Volatile.
(Command-equivalent functions default to Volatile but when using it to pass the previous value to a VBA UDF you generally want it to be non-volatile).
This function also works for multi-celled array formulae.
Edited following Keith Lewis comment.


CXlOper* PREVIOUS_Impl(CXlOper& xloResult, const CXlOper* Volatile_op)
{
// Input buffers
bool Volatile;
// Validate and translate inputs
static CScalarConvertParams Volatile__params(L"Volatile",
XLA_DEFAULT_ZERO|XLA_DEFAULT_EMPTY|XLA_DEFAULT_NONNUMERIC|
XLA_DEFAULT_BLANK, 0, -1, true);
XlReadScalar(*Volatile_op, Volatile, Volatile__params);
// End of generated code
//}}XLP_SRC
// defined as a macro function defer recalc so that the func gets previous results
CXlOper xloCaller,xlo;
CXlOper arg;
arg=true;
if (!Volatile) arg=false;
// set volatility of this function: 237 is the function number for volatile
xlo.Excel(237,1,&arg);
// Get caller. Fail if it is not a range of cells
if ( ( xloCaller.GetCaller() != 0 ) || !xloCaller.IsRef() ) return CXlOper::RetError(xlerrNA);
//coerce the caller ref
xloResult.Coerce(xloCaller);
return xloResult.Ret();
}

Then you can use this to pass the previous value to the UDF.

Function UDF4(vParam, Refresh, Previous)
Dim var As Variant
If Not Refresh Then
UDF4 = Previous
Else
var = GetSlowResource(vParam)
UDF4 = var
End If
End Function

The UDF is called from a formula like this =UDF4(“AAPL”,RefreshSlow,PREVIOUS(False))

This works well, but requires access to the XLL PREVIOUS function (Laurent Longre’s MOREFUNC addin has a similar function).

Its supposed to be possible to write a similar function using the old XLM Macro language, but I have not tried it.

Conclusion

There are several ways of getting the previous value from the last calculation for a VBA UDF. But the best solution requires using a C++ XLL.

Special prize to the first person to write the XLM Macro Previous UDF!

Evaluate Functions and Formulas fun: How to make Excel’s Evaluate method twice as fast

November 2, 2011

Prompted by a comment from Sam on Match vs Find I thought I would take a look at Excel’s rather quirky Evaluate method with Excel 2010 to see how it performed.

The Evaluate method internally uses Excel’s formula parser and calculator, and this makes it surprisingly powerful. You can use it on virtually any kind of formula, range reference or Defined Name. But, as we will see, it does have a number of strange “quirks” that you have to navigate around.

Depending on the context Evaluate will either return an object (for example a Range) or values.

I will be using exactly the same test setup of 100000 rows of randomly generated XY pairs and timing routine as in Match vs Find, so you can directly compare the results.

Using the square brackets [ ] shortcut for Evaluate

Sam’s comment suggested using [COUNTIFS] to see how the timing compared with MATCH and FIND. Of course its not quite the same thing because the loop on Match and Find allows the VBA to do something for each XY pair found. Sam’s VBA looks like this:

Sub FindXY_COUNTIFS1()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = [COUNTIFS(A1:A100000,"x",B1:B100000,"y")]
Debug.Print "COUNTIFS1 " & j & " " & (Microtimer - dTime) * 1000
End Sub

It takes 11.6 millisecs to find the 25135 XY pairs generated using a constant of 0.5 in the test data generator.

[ ] is a shortcut for Application.Evaluate. The advantage of using the [ ] brackets is that it is concise and the formula inside the [ ] is exactly the same as when used in a worksheet cell. The disadvantage is that you cannot generate the formula as a string. I tend to only use this notation when evaluating my hidden workbook-scoped defined names, because they are not likely to change. (Of course sometimes I get lazy …)

Using Application.Evaluate instead of [ ]

You can use Evaluate or Application.Evaluate with a string instead of the [ ]

j = Evaluate("COUNTIFS(A1:A100000," & Chr(34) & "x" & Chr(34) & ",B1:B100000," & Chr(34) & "y" & Chr(34) & ")")
The timing is virtually identical to the [ ] shortcut method.

Application.Evaluate and the Activesheet

One trap for the unwary with [ ] , Evaluate and Application.Evaluate is that all references that do not contain a worksheet (unqualified references like A1:A100000) are assumed to refer to whatever the Active sheet currently happens to be.
So if you are going to use Application.Evaluate you should always use a qualified reference (Sheet1!A1:A100000) unless you like your code to live dangerously.

Worksheet.Evaluate

Worksheets and Charts also have an Evaluate Method. When you use these methods unqualified references are assumed to refer to the worksheet or chart.

Sub FindXY_COUNTIFS3()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = Worksheets("Sheet1").Evaluate("COUNTIFS(A1:A100000," & Chr(34) & "x" & Chr(34) & ",B1:B100000," & Chr(34) & "y" & Chr(34) & ")")
Debug.Print "COUNTIFS3 " & j & " " & (Microtimer - dTime) * 1000
End Sub

Now for the surprise: Worksheet.Evaluate is twice as fast as Application.Evaluate!
(actually 6.1 millisecs as opposed to 11.6 millisecs)

I am fairly sure that there is a bug in Application.Evaluate that actually does the evaluation twice.
Certainly if you use Application.evaluate on a UDF the UDF will be executed twice.

Chart.Evaluate

In the real world I have never actually used Chart.Evaluate (probably because I hate programming Chart objects), but according to Help it seems you can do interesting things with it:

“Chart Objects. You can specify any chart object name, such as “Legend”, “Plot Area”, or “Series 1″, to access the properties and methods of that object. For example, Charts("Chart1").Evaluate("Legend").Font.Name returns the name of the font used in the legend.”

Evaluating Array Formulas

Amazingly if you give Evaluate an array formula it evaluates it as an array formula:

Sub FindXY_COUNTIFS4()
Dim j As Long
Dim dTime As Double
dTime = Microtimer
j = ActiveSheet.Evaluate("SUM((A1:A100000=" & Chr(34) & "x" & Chr(34) & ")*(B1:B100000=" & Chr(34) & "y" & Chr(34) & "))")
Debug.Print "COUNTIFS4 " & j & " " & (Microtimer - dTime) * 1000
End Sub

This is quite a lot slower than COUNTIFS: it takes nearly 19 milliseconds.
If you are a SUMPRODUCT fan you could use

j = ActiveSheet.Evaluate("SUMPRODUCT(--(A1:A100000=" & Chr(34) & "x" & Chr(34) & "),--(B1:B100000=" & Chr(34) & "y" & Chr(34) & "))")
But its not significantly faster.

Evaluate speed compared to a formula in a cell

You would expect Evaluate to be slower than Excel natively calculating the formula in a cell. And indeed it is, but its quite close for a single formula;

Countifs Formula 6.0
Evaluate Countifs 6.1

Array Sum Formula 16.9
Evaluate Array Sum 18.9

Evaluate speed compared to using Application.Worksheetfunction

It looks like there is a higher overhead to using Evaluate, which is what you would expect.
Here is a version of the FindXY sub using Evaluate with MATCH instead of Worksheetfunction.Match.

Sub FindXYEval()
Dim oRng As Range
Dim oLastRng As Range
Dim j As Long
Dim jRow As Long
Dim n As Long
Dim Rw As Long
Dim dTime As Double
dTime = Microtimer
Set oRng = Range("a1:A100000")
Set oLastRng = oRng(oRng.Rows.Count)
Rw = oLastRng.Row
On Error GoTo finish
With Application.WorksheetFunction
Do
Set oRng = Range(oRng(j + 1), oLastRng)    '<<<<= Rw
j = ActiveSheet.Evaluate("Match(" & Chr(34) & "X" & Chr(34) & "," & oRng.Address & ", 0)")
jRow = jRow + j
If oRng(j, 2).Value2 = "Y" Then n = n + 1
Loop Until jRow + 1 > Rw
End With
finish:
Debug.Print "MatchEval " & n & " " & (Microtimer - dTime) * 1000
End Sub

This takes 3720 milliseconds compared to 478 milliseconds using Worksheetfunction.Match. There are just over 50000 calls to Evaluate or Match so I reckon the additional overhead of using Evaluate is about 65 Microseconds per call.

More Evaluate Limitations: Updated

  • The string being evaluated must be less than 256 characters, even in Excel 2010.
  • A1 style references can be evaluated in both A1 and R1C1 reference mode (Application.ReferenceStyle), but R1C1 style references can only be evaluated in R1C1 mode.
  • Relative references in the string are treated as absolute, unless they are contained in defined names in which case the defined name is evaluated with respect to cell A1.
  • Dates should be in USA format (Month-Day-Year).
  • Evaluate will return an error value if the string formulae contains external references to closed workbooks or XLM functions.
  • Using Evaluate INDEX(rng,rownum,COLUMN()) gives incorrect answers except for the first column. Evaluate 0+INDEX(rng,rownum,COLUMN()) works
  • If the string formula contains a reference to both a UDF and a name it will fail with an error 2029 if the name reference occurs AFTER the UDF reference:
    • If fred is a named range and xyz() is a user defined VBA function then this statement returns error 2029: application.Evaluate(“=xyz(b1)+fred”)
    • This statement returns the correct value: application.Evaluate(“=fred+xyz(b1)”)
    • Microsoft KB article 823604 identifies this problem but does not correctly diagnose the circumstances that cause it.

You can bypass many of these limitations (at the cost of performance) by inserting the formula string into a worksheet cell and then reading the resulting cell value back into a VBA variable.

Error Handling

If Evalaute cannot evaluate the formula string it returns an error rather than raising an error, so the result of Evaluate should always be assigned to a Variant.

Conclusion

  • The Evaluate method adds a lot of power to VBA
  • Always use Worksheet.Evaluate rather than Application.Evaluate: its twice as fast and less error-prone
  • Using Worksheet.Evaluate has comparable speed to a cell formula but a higher overhead
  • Worksheetfunction has a lower overhead than Evaluate
  • Beware the Quirks of Evaluate!

So whats your experience of Evaluate?


Follow

Get every new post delivered to your Inbox.

Join 34 other followers