Threading and Hyper-Threading: Optimizing Excel Calculation speed by changing the number of threads

I have just implemented measuring multi-threaded calculation efficiency in FastExcel V3 Profile Workbook: so I thought it would be interesting to see the effect on calculation speed of varying the number of threads, and switching off hyper-threading.

Hyperthreading

Almost all of todays PCs have multiple cores. My desktop has an Intel I7 870 chip. This contains 4 hyper-threaded cores. Hyper-threading is an Intel hardware feature that allows each core to look to the software as though it is 2 cores. If one of the hyperthreads on a core gets stalled waiting for data the other hyperthread takes control of the resources and tries to continue. How efficient this is is highly dependent on the software application, and I had no idea whether Excel made good use of hyper-threading or not.

Because hyper-threading is a hardware feature you can only switch it off or on using the BIOS.

Excel Multi-threaded Calculation

Microsoft did a great job of implementing multi-threaded calculation in Excel 2007. The way this works is

  • Excel analyzes the calculation chains of formula dependencies looking for lengths of chain that can be processed in parallel
  • Starts a number of separate calculation threads
  • Assigns the lengths of chain to the calculation threads for processing
  • And finally gathers everything together

How successful this process is depends very much on the structure and linkages between your formulas, whether the functions you are using are multi-threaded etc, and this can vary enormously between workbooks.

You can control the number of threads that Excel assigns to the calculation process using File–>Options–>Advanced Options–>Formulas. The number of threads assigned can be larger or smaller than the number of logical cores available.

The Tests

I picked three largeish workbooks:

  • TTG – 4.7 million formulas using 704 Megabytes of workbook memory
  • GL – 466 thousand formulas using 114 Megabytes of workbook memory
  • PF2 – 284 thousand formulas using 700 Megabytes of workbook memory

I used my main desktop machine with Windows 7 32 bit and Excel 2010 32-bit.
This PC has an Intel Core I7 870 2.93 GHZ with 4 hyper-threaded physical cores, so 8 logical cores.

I ran a full calculation on each of the 3 workbooks using 1 to 8, 12, 24, 64 and 256 calculation threads with hyper-threading enabled. Then I switched off hyper-threading and ran a full calculation with 4 threads (so matching the physical cores) and 8 threads.

Timing Results with Hyper-threading enabled

Thread_TableTimings are in seconds for the second or third successive full calculation (to enable comparable optimization of the calculation process).

TTGGLPF2As you can see, all three workbooks show different characteristics:

  • TTG and PF2 are both fastest with 8 threads but GL is fastest with only 3 threads.
  • Increasing the number of threads beyond the number of logical cores available slows down calculation, but the increase is much less marked with PF2 than with the other 2 workbooks.
  • Increasing the number of threads beyond the number of physical cores up to the number of logical cores improves speed for TTG and PF2: so hyperthreading is successfully exploited by Excel. But physical cores are much more effective than logical cores.

The effect of disabling Hyperthreading

When I disabled hyperthreading using the BIOS:

  • Calculation using 4 threads ran slightly faster for TTG (88 vs 93) and PF2(7.6 vs 7.7) but slower for GL (0.49 vs 0.48)
  • Calculation using 8 threads ran slightly slower for all 3 workbooks (TTG 89 vs 73, PF2 7.7 vs 6.5, GL 0.58 vs 0.55)

So the effect of hyperthreading is noticeable but not large.

Running out of memory

There have been threads in the newsgroups about multi-threaded calculation causing Excel to give warning messages about running out of resources whilst calculating. I have not been able to duplicate this problem even with a large workbook like TTG running with 256 threads. The suggested solutions are either to turn off hyper-threading using the BIOS or to reduce the number of threads from 8 to 6.

Conclusions

  • Excel’s multi-threaded calculation can be very successful at reducing calculation times.
  • The effect of hyper-threading is not as large as that from multiple physical cores, but its still worth having.
  • The effectiveness of multi-threading is very dependent on the workbook. There will be workbooks where the overhead of analyzing the calculation chains outweighs the gain in calculation speed.
  • Excel’s default setting to assign a calculation thread to all available logical cores seems sensible.
Posted in Calculation | Tagged , | 9 Comments

Inserting a UDF and Launching the Function Wizard from VBA

In a previous post I suggested you could do this by entering the function using VBA into the selected cell using a comma as the argument to the function, for example =LISTDISTINCTS.SUM(,) , and then calling the function wizard using Application.Dialogs(xlDialogFunctionWizard).Show.

This method mostly works but has 2 major drawbacks:

  • It won’t allow you to use F4 within the function wizard to change from relative to absolute.
  • You get #Value or #Name when launching an XLL-based UDF with only 1 argument.

After a lot of trial and error I eventually found a way that seems to work in all cases:

Sub GoFuncWiz(strFunc As String, blArray As Boolean)
On Error Resume Next
If blArray Then
Selection.FormulaArray = "=" & strFunc & "()"
Else
Selection.Formula = "=" & strFunc & "()"
End If
Application.OnTime Now + TimeValue("00:00:01") / 4, "RangeFuncWiz"
End Sub
Sub RangeFuncWiz()
On Error Resume Next
Selection.FunctionWizard
End Sub

The simple (once you know how) solution is to launch the Range.FunctionWizard method, but with a quarter-second delay!
Oh, and you don’t need the dummy comma argument either.

Posted in UDF, VBA, XLL | Leave a comment

2013 in review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 85,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment

Excel 2013 SDI Bug: “Calculate” in Status Bar strikes again

In the old days (I’m talking Excel 5 to Excel 2003 here) there was a worrying situation you could find yourself in where, no matter what you did, Excel would show you “Calculate” in the statusbar.
Even when actually nothing needed calculating.
You could press F9, Ctrl/Alt/F9 or even Shift/Ctr/Alt/F9 until the cows came home, but you could not get rid of that pesky “Calculate”.

Excel 2007 to the rescue

Then along came Excel 2007 and solved the problem.

For details of the circumstances causing “Calculate” to appear in the statusbar see http://www.decisionmodels.com/calcsecretsf.htm

Excel 2013 and Calculate in the statusbar

Having just spent the best part of 2 days sorting out what looked like a problem with a lot of complicated VBA FastExcel code I have just discovered that its not a FastExcel problem at all: its an Excel 2103 SDI Bug (Single Document Interface – each window is separate from the others. As opposed to MDI -Multiple Document Interface where all the windows are within a parent window – as in all previous Excel versions).

Basically what should happen is very simple:

  • When Excel detects that something needs to be calculated it puts the “Calculate” message in the statusbar. (This corresponds to VBA Application.CalculationState=xlPending)
  • Whilst Excel is calculating it shows “Calculating x%” in the statusbar (Application.CalculationState=xlCalculating)
  • When Excel has finished calculating it removes the messages from the statusbar (Application.CalculationState=xlDone)

There are a few exceptions to this:

  • If Workbook.ForceFullCalculation is true then the statusbar always shows Calculate
  • If there are circular references and Iteration is switched on then the statusbar always shows Calculate
  • If your VBA has taken control of the statusbar then you don’t see this happening until you relinquish control
  • If you are using Automatic Calculation mode then the switch into and out of “Calculate” status does not happen.

Unfortunately the Excel 2013 SDI implementation currently only does all this to the active window: so if you have multiple workbooks open things can rapidly get in a mess:
just follow this simple recipe:

Start Excel 2013 with an empty workbook (Book1) and in automatic calculation mode.

Add a (non-volatile) formula to cell A1 (=99)

Open another empty workbook (Book2)

Switch to Manual calculation Mode

Add a (non-volatile) formula to cell a1 in Book2 (=55)

The statusbar in book2 now shows “calculate” but the one in Book1 does not (Bug)

statusbar1Now switch to the Book1 window and press F9. The Calculate message in Book2 should go away: but it does not (Bug)

Statusbar2Now you can go back to the Book2 window and press F9 again, but that pesky “calculate” will NOT go away.
If you change the formula in cell A1 (or do anything else that makes the workbook uncalculated) and then press F9 the “calculate” goes away!

You can even make this happen in Automatic mode.

Repeat the steps above, but when you have switched to Book1 with book2 showing calculate, instead of pressing F9 switch to Automatic calculation mode.
Now switch back to Book2 and no matter what you do you cannot get rid of “Calculate” (except by going back to Manual, dirtying Book2 and recalculating).

Conclusion

Moving Excel 2013 to SDI is the only way to get sensible support for multiple screens, and so was probably neccessary.
But it has caused quite a lot of breakage of existing applications (Toolbars, Modeless Forms etc)

I hope Microsoft can fix this Calculate problem in the forthcoming SP1 release before it bites too many people!

Posted in Calculation, VBA | 5 Comments

Finding out if a function is Volatile or Multithreaded using VBA: UDFs for UDFs

Part of my new Profiling Formulas and Functions command requires the code to determine whether a Function is  a native built-in Excel function, or an XLL function, or some other kind (VBA, Automation). And I also want to know if its multi-threaded or volatile because that can have a significant effect on calculation performance.

So here is how to do it!

Built-in Native Functions

You can get a list of the functions that are built-in to Excel from the Excel XLL SDK. I added 2 columns to the list showing whether the functions are Multi-Threaded or Volatile:

Native_FuncsThen I can use VLOOKUP to find out if the function is a built-in Excel function, and if so whether it is Volatile or Multi-threaded.

XLL Functions

XLL functions have to be registered using the REGISTER C-API command. When registering the function you have to include a typestring that declares what type each of the function arguments is, and whether the function is multi-threaded or volatile.

And it turns out there is a VBA method to get the typestrings for all registered XLL Functions.
Application.RegisteredFunctions returns a 3-column array containing the name of the XLL file, the Function name and the function typestring.

But of course it is not quite so easy as that. The Function name it returns is the internal function name in the XLL code, which is usually not the same as the name of the UDF function as used by Excel!
So for example a UDF name like REVERSE.TEXT could have an XLL internal name of f1.

Using Register.ID to match up the Names

The way to find out which XLL function name (if any) corresponds to the Excel UDF Name is to find the internal number of the function (its Register ID).

For an XLL UDF this is the number returned if you enter the Excel UDF name in a formula without any () after it (for example =REVERSE.TEXT ). And you can get this in VBA using EVALUATE:

vExcelFuncID = Evaluate(strFunc)

Getting the Register Id number of the internal XLL function name requires calling an XLM Macro command called REGISTER.ID using VBA Application.ExecuteExcel4Macro.

Then if both Register.IDs match you have an XLL function and you can look in the typestring to see if its multi-threaded (the typestring contains $) or volatile (typestring contains #).

Other UDF Types (VBA, Automation …)

If the UDF is not built-in and not a registered XLL function it must be either a VBA or an Automation UDF (or an XLM UDF!). None of these can be multi-threaded.

But they can be volatile and I have not found a straight-forward way of determining this programmatically. The simplest way to find out is to put the UDF in a formula and see if the formula recalculates with every F9: but you can’t do that from a VBA UDF!

The VBA Code

You can download the code as an XLAM from here.

There are 2 Subroutines to get the Native function lists and registered XLL functions into module level arrays:

Option Explicit
Option Base 1
Dim vFuncRegister As Variant
Dim vNative As Variant

Private Sub GetNative()
'
' get the list of native built-in functions and their attributes
'
If IsEmpty(vNative) Then
vNative = ThisWorkbook.Worksheets("NativeFuncs").Range("A2").Resize(ThisWorkbook.Worksheets("NativeFuncs").Range("A1000").End(xlUp).Row - 1, 3)
End If
End Sub
Private Sub GetFuncRegister()
'
' get list of registered functions and their funcIDs
'
Dim sCmd As String
Dim vRes As Variant
Dim j As Long
'
If Not IsEmpty(vFuncRegister) Then Exit Sub
'
vFuncRegister = Application.RegisteredFunctions     ''' get data on XLL registered functions
'
' add column for funcid
'
ReDim Preserve vFuncRegister(LBound(vFuncRegister) To UBound(vFuncRegister), 1 To 4) As Variant
'
For j = LBound(vFuncRegister) To UBound(vFuncRegister)
'
' get funcids
'
If vFuncRegister(j, 1) Like "*xll" Then
sCmd = "REGISTER.ID(""" & CStr(vFuncRegister(j, 1)) & """,""" & CStr(vFuncRegister(j, 2)) & """)"
vRes = Application.ExecuteExcel4Macro(sCmd)
If Not IsError(vRes) Then vFuncRegister(j, 4) = vRes
End If
Next j
End Sub

The main function checks the function name against the native functions array, then if not found, checks it against the registered XLL functions array and if not found assumes it must be a VBA or Automation (or XLM) UDF.


Private Function CheckFunc(strFunc As String, blMulti As Boolean, blVolatile As Variant) As String
'
' returns
' type = B if built-in, X if XLL else O for Other (VBA or Automation)
' blMulti is true if multithreaded
' BlVolatile is True if Volatile, False if not volatile and ? if don't know
'
Dim strType As String
Dim vFound As Variant
Dim j As Long
Dim strTypeString As String
Dim vExcelFuncID As Variant
'
blMulti = True
blVolatile = False
'
' check for native xl function
'
On Error Resume Next
vFound = Application.VLookup(strFunc, vNative, 1, False)
On Error GoTo 0
If Not IsError(vFound) Then
strType = "B"
If Application.VLookup(strFunc, vNative, 3, False) = "V" Then blVolatile = True
If Application.VLookup(strFunc, vNative, 2, False) = "S" Or Val(Application.Version) < 12 Then blMulti = False
End If
'
If Len(strType) = 0 Then
'
' get xlfuncid - if not error then its an XLL func
'
vExcelFuncID = Evaluate(strFunc)
If Not IsError(vExcelFuncID) Then
strType = "X"
For j = LBound(vFuncRegister) To UBound(vFuncRegister)
If strFunc = vFuncRegister(j, 2) Or vExcelFuncID = vFuncRegister(j, 4) Then
strTypeString = vFuncRegister(j, 3)
If InStr(strTypeString, "!") > 0 Or _
(InStr(strTypeString, "#") > 0 And (InStr(strTypeString, "R") > 0 Or InStr(strTypeString, "U") > 0)) _
Then blVolatile = True
If InStr(strTypeString, "$") = 0 Or Val(Application.Version) < 12 Then blMulti = False
Exit For
End If
Next j
End If
End If
'
If Len(strType) = 0 Then
'
' else its Other (VBA or Automation)
'
strType = "O"
blMulti = False     ''' cant be multi
blVolatile = "?"    ''' don't know if volatile
End If
'
CheckFunc = strType
End Function

Then there are 3 UDFs to find out if the UDF is Volatile, is Multi-threaded and what type of UDF it is.

Public Function IsMultiThreaded(strFuncName As String) As Variant
'
' check if a function is Multi-Threaded
' Returns true or false
'
Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
strType = CheckFunc(strFuncName, blMulti, blVolatile)
'
IsMultiThreaded = blMulti
End Function
Public Function IsVolatile(strFuncName As String) As Variant
'
' check if a function is volatile
' returns True or False or ? if don't know
'
Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
strType = CheckFunc(strFuncName, blMulti, blVolatile)
'
IsVolatile = blVolatile
End Function
Public Function FuncType(strFuncName As String) As Variant
'
' get type of function: B for built-in Excel, X for XLL, O for Other

Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
FuncType = CheckFunc(strFuncName, blMulti, blVolatile)
End Function

Note: Yes – the production version of this code is more optimised for speed, but this version is easier to understand!

Summary

When you are trying to optimise Excel calculation speed its important to know which functions are multi-threaded or volatile.

This post demonstrates a way of doing this programmatically.

Limitations:

There are some limitations of this method:

  • Cannot determine if VBA and Automation UDFs are volatile.
  • Will not detect XLL functions that internally change volatility programmatically.

Any ideas on how to determine the volatility of VBA UDFs will be gratefully received!

Posted in Calculation, UDF, VBA, XLL | Leave a comment

Bordeaux Wine Tasting: the ones that got away

Saturday evening (after I got back from Seattle: the Microsoft MVP Summit ) there was a major Bordeaux wine tasting event planned. My ex-neighbour Joe had initiated this, explaining that he had this bottle of Haut Brion that needed drinking.

But just as I arrived back on Saturday afternoon, somewhat jet-lagged (8 hours time difference), we heard that Joe had had to go into hospital, (he is out now) so the main event has had to be postponed to next year.

The ones that got away

Joe of course was being a bit economical with the truth: actually he had a whole case of Haut Brion and a bottle of Pichon-Longueville 1995.

DSCF4189DSCF4191My contribution was a bottle of Palmer 1996 and a bottle of Phelan Segur 1996.
This would give us a Graves, Pauillac, Margaux and St Estephe to compare!

DSCF4190So we left those bottles for another day and went with a somewhat lesser selection:

The wines we actually tasted

We were down to the four of us, me and Jane plus our son Ben and his wife Jo.
Ben and Jo got married in 1996 so their contribution was a bottle of 1996 Champagne:

DSCF4184So we started the evening with that, accompanied by some smoked salmon. The champagne was a strong golden colour with a delicious rich taste.

DSCF4183Then we tasted the 1996 Phelan Segur against a 2002 Segla (the second wine of Chateau Rausan-Segla) and a 2002 Art Series Leeuwin Estate cabernet sauvignon.

The Phelan Segur was disappointing: tasting notes say things like – dry, thin, lacking fruit, although the colour and clarity were good. We had decanted it 2 hours before, which I think was a mistake: we will try opening and immediately tasting the next bottle. It scored 13 out of 20.

The Segal was nice but not exceptional: tasting notes say things like – pleasant nose, nice fruit, a little thin. It scored 14 out of 20.

Everyone really liked the Leeuwin Estate. (well of course its actually a bit of a cheat since its from Margaret River in Western Australia rather than Bordeaux). Tasting notes say things like – subtle, nice finish, red currants. It scored 16 out of 20.
Unfortunately that was my last bottle!

So it was a great evening (I crashed at 10.30 pm) but we are really looking forward to the rematch next year!

Posted in Wine | 1 Comment

Summit Winter 2013

On Saturday I got back from the winter 2013 MVP summit at the Microsoft campus in Redmond. The summit is a great chance to talk to the Microsoft Excel teams and to meet up with other Excel MVPs.
I also spent some time with the Excel Mac team and the Mac MVPs trying to understand the Excel development environment on the Mac, and time with the BI teams and SQL server MVPs.
Everything discussed is under strict NDA, so I can’t give you any details but it was all very interesting!

Surface 2

All the attending MVPs got a special offer on a 32Gb Surface 2 (RT). I must admit I had not planned on buying this because it does not support the development environment (VBA and Visual Studio) I need, but the offer was too good to refuse!

DSCF4206But after playing with it for a few hours I am very impressed. It comes with Excel, Outlook, Word and Powerpoint, 200 GB of Skydrive and a years worth of Skype. I added a 64GB micro SD card as the D: drive and a Bluetooth mouse.

For office use, with Email, web browsing and photos this is a great machine: you very quickly get using to touching the screen (and wishing my main laptop worked like that).

Geek Meet

Here are some snaps from the monday evening reception:

DSCF4166Roger Govier and Zak Barrese discussing the finer points of Apps for Office.

DSCF4151Patrick Matthews, Bob Umlas and Bill Jelen: Pivot Tables or Array Formulas?

DSCF4136Kevin (Zorvek) Jones does not look convinced by Zak(FireFyter)’s explanation of the Office 365 SKUs

DSCF4141Zak (and another thing …) Barrese ignores Sam Rad and Ingeborg (Teylyn) Hawighorst showing off an orange something!

DSCF4152Felipe Gualberto and Ken Puls: which DAX function would you use to compare apples with oranges? Ken looks horrified: someone must have suggested VLOOKUP instead of CALCULATE!

DSCF4143Jacob Hildebrand and Jordan Goldmeier: Jordan shows what happens when you unwind Clippy – Jacob patiently waits for another seat to become free.

DSCF4156Roger Govier and Sam Rad say cheese for the camera whilst Ingeborg and Liam Bastick discuss the idea of arranging an Excel User Conference in OZ/NZ in 2015 with lots of MVP speakers (hopefully including me, Bill Jelen, Ken Puls, Bob Umlas …)

Mystery Photo

DSCF4002Here is this year’s mystery photo: exactly where can you find this dragon?

Posted in Uncategorized | 6 Comments

Formula References between Sheets versus within Sheets shootout: Which calculates faster and uses more Memory

I thought I would revisit the differences between formulas that reference other worksheets and formulas that only reference their own worksheet. Referencing other worksheets always used to be a memory hog, but so much has changed between Excel 2003 and Excel 2013 that I wanted to see the current status.

The Test Workbooks

The test workbooks are all generated by simple VBA code contained in thw MakeInterLinkedSheets.xlsb workbook, which you can down load from here.

Generating Many Linked Worksheets

The code to generate the interlinked worksheets is shown below. You can choose how many worksheets to generate, and then each worksheet will contain a column of constants and a column of formulas that refer to each of the other worksheets. So if you choose 1500 worksheets each worksheet will contain 1500 formulas with every formula referring to a different worksheet (you can’t get much more linked than that!). Thats a total of 2.25 million formulas.


Sub MakeManyLinkedSheets()
 '
 ' make a large number of worksheets, each of which links to all of the others
 '
 Dim j As Long
 Dim k As Long
 Dim varSheets As Variant
 Dim nSheets As Long
 Dim nRequest As Long
 Dim nAdd As Long
 Dim var() As Variant
 '
 varSheets = Application.InputBox("Enter the Number of Interlinked Sheets to Generate", "Inter-Linked Sheets", 1500)
 If Not IsNumeric(varSheets) Then
 MsgBox "Input must be a number, MakeManyLinkedSheets cancelled", vbOKOnly + vbCritical
 Exit Sub
 Else
 Application.ScreenUpdating = False
 Application.Calculation = xlManual
 '
 nRequest = CLng(varSheets)
 '
 ' add sheets: cannot add more than 255 in one .Add statement
 '
 nSheets = ActiveWorkbook.Worksheets.Count
 nRequest = nRequest - nSheets
 Application.StatusBar = "Adding Sheets"
 Do While nRequest > 0
 nAdd = nRequest
 If nAdd > 255 Then nAdd = 255
 ActiveWorkbook.Worksheets.Add before:=ActiveWorkbook.Worksheets(nSheets), Count:=(nAdd)
 nSheets = ActiveWorkbook.Worksheets.Count
 nRequest = nRequest - nAdd
 Loop
 '
 ' add constant and linkage formula
 '
 For j = 1 To ActiveWorkbook.Worksheets.Count
 Application.StatusBar = "Generating Linkages on Sheet " & CStr(j)
 ReDim var(ActiveWorkbook.Worksheets.Count, 2)
 For k = 1 To ActiveWorkbook.Worksheets.Count
 var(k, 1) = j * k
 var(k, 2) = "=Sheet" & CStr(k) & "!a" & CStr(k)
 Next k
 Worksheets(j).Range("a1").Resize(ActiveWorkbook.Worksheets.Count, 2).Formula = var
 Next j
 Application.StatusBar = False
 Application.Calculation = xlAutomatic
 End If
 End Sub

Since you still (even in XL 2013) cannot create more than 255 sheets in a single Worksheets.Add command the code creates the worksheets in blocks of 255.

Memory Used & File Size

In old versions of Excel (97/2000) this code hit the memory wall at about 200 worksheets.
In Excel 2013 32-bit you can get up to over 2500 worksheets but 4000 fails at about 1.4 gigabytes.
In Excel 2013 64-bit I got to 5 gigabytes of memory trying for 4000 sheets but it was so slow I gave up.

For 1500 sheets:

  • Excel 2010 32 uses about 430 Megabytes of memory for the workbook
  • Excel 2013 32 uses about 540 Megabytes of memory for the workbook
  • Excel 2013 64 uses about 770 Megabytes of memory for the workbook
  • The workbook takes about 40 Megabytes when saved as an XLSB

Comparing Within Sheet References and Between Sheets References

Generating the Formulas

I used 3 different methods for generating the within-sheet reference formulas (in R1C1 mode):

Method 1 uses a formula that refers to the previous column on this row:
=Sheet1000!RC[-1]

Method 2 uses a formula that always refers to column 1 for this row:
=Sheet1000!RC1

Method 3 uses a formula that randomly refers to rows and columns (nRequest is the number of sheets requested):
"=Sheet1000!R" & Int(Rnd() * nRequest + 1) & "C" & Int(Rnd() * nRequest + 1)

The reason for using 3 different formulas is to see what effect different kinds of references have on memory and calculation speed.

The complete code for method 3 looks like this:

</pre>
Sub MakeManyFormulas3()
 '
 ' make a single worksheet that refers to itself
 ' generate pairs of columns:
 ' numeric constant followed by formula that refers to a random row and column
 '
 Dim j As Long
 Dim k As Long
 Dim varSheets As Variant
 Dim nSheets As Long
 Dim nRequest As Long
 Dim nAdd As Long
 Dim var() As Variant
 '
 varSheets = Application.InputBox("Enter the Number of Formulas to Generate", "Inter-Linked Sheets", 200)
 If Not IsNumeric(varSheets) Then
 MsgBox "Input must be a number, MakeManyLinkedFormulas cancelled", vbOKOnly + vbCritical
 Exit Sub
 Else
 Application.ScreenUpdating = False
 Application.Calculation = xlManual
 Application.ReferenceStyle = xlR1C1
 nRequest = CLng(varSheets)
 '
 ' add constant and linkage formula
 '
 For j = 1 To nRequest
 ReDim var(nRequest, 2)
 For k = 1 To nRequest
 var(k, 1) = j * k
 '
 ' refers to random row and column
 '
 var(k, 2) = "=Sheet1000!R" & Int(Rnd() * nRequest + 1) & "C" & Int(Rnd() * nRequest + 1)
 Next k
 Worksheets("Sheet1000").Range("a1").Resize(nRequest, 2).Offset(0, (j - 1) * 2).FormulaR1C1 = var
 Next j
 Application.StatusBar = False
 Application.Calculation = xlAutomatic
 Application.ReferenceStyle = xlA1
 End If
 End Sub

Timing and Memory Results

The memory used is the difference between before and after memory (Private Working Set) as measured by Windows 7 Task Manager.

The Full Calculate Time is the time taken for the second or third multi-threaded (4 cores – 8 threads) full calculation of all 2.25 million formulas as measured by FastExcel.

Conclusions

I must admit I was surprised about the calculation times: I thought they would be larger for the between sheets references than for the within sheets references. But there is no real noticeable difference: a larger factor is where the within-sheet formulas refer to, or more likely the total number of unique formulas used.
(Random >column to the left> always first column)

XL Version Formula Method Memory (MB)
Full Calc Time
2010 32 Interlink Sheets

426 MB

0.30 Seconds

2010 32 Previous Column

320 MB

0.27 Seconds

2010 32 First Column

214 MB

0.26 Seconds

2010 32 Random

286 MB

0.60 Seconds

 
2013 32 Interlink Sheets

538 MB

0.29 Seconds

2013 32 Previous Column

402 MB

0.29 Seconds

2013 32 First Column

300 MB

0.26 Seconds

2013 32 Random

369 MB

0.66 Seconds

2013 64 Interlink Sheets

835 MB

0.33 Seconds

My conclusions from all this are:

  • 64-bit Excel uses more memory than 32-bit Excel
  • Interlinking sheets uses more memory than within-sheet references.
  • There is no significant calculation time penalty in using inter-sheet references
  • Excel 2013 uses more memory than Excel 2010
  • The more unique formulas there are the more memory and calculation time is needed
Posted in Calculation, Memory, VBA | Tagged | 1 Comment

Exploring Range.Calculate and Range.CalculateRowMajorOrder: fast but quirky formula calculation

The Range.Calculate methods are very useful additions to Excel’s other calculation methods (Application level Calculate, CalculateFull, CalculateFullRebuild and Worksheet.calculate: the missing one is Workbook.Calculate!).

You can use the Range Calculate methods to:

  • Force calculation of a block of formulas or a single formula
  • See how long the variations of a particular formula take to calculate
  • Speed up repeated calculations

Download my RangeCalc Addin

You can download my RangeCalc addin from my website’s downloads page (xla password is dm).

This adds a button to the addins tab which uses Range.Calculate to time calculation of the currently selected cells.

Inspecting the RangeCalc code: different problems with different versions

You can unlock the xla to view the code using a password of dm.
The code in the RangeCalc sub bypasses a number of Range.calculate quirks in various Excel versions:


Sub RngTimer()
 '
 ' COPYRIGHT © DECISION MODELS LIMITED 2000,2001. All rights reserved
 '
 ' timed calculation of selected Range
 '
 ' bypass grouped and interactive problem 17/10/00
 ' remove interactive=false: Excel 97 Hangs when UDF error 14/2/01
 ' fix for application.iteration and array formulae with Excel2002 29/10/2001
 '
 Dim dRangeTime As Double
 Dim iMsg As Integer
 Dim blIter As Boolean
 Dim oCalcRange As Range ''' range to calculate
 Dim dOvhd As Double
 Dim strMessage As String
 '
 ' store iteration property
 '
 blIter = Application.Iteration
 '
 If ActiveWorkbook Is Nothing Or ActiveSheet Is Nothing Or ActiveWindow Is Nothing Or Selection Is Nothing Then
 Exit Sub
 Else
 If TypeName(Selection) = "Range" Then
 '
 ' if Excel2002 or greater handle iteration problem
 '
 If Left(Application.Version, 1) = "1" Then
 '
 ' switch off iteration
 '
 Application.Iteration = False
 End If
 '
 ' expand selected range to include all of any multicell array formula
 ' - makes Excel 2002 behave like earlier versions
 ' - allows notification if range has been expanded
 '
 Call ExpandRange(Selection, oCalcRange)
 '
 On Error GoTo errhandl
 '
 dOvhd = MicroTimer ''' ensure frequency is initialised
 dOvhd = MicroTimer ''' get time
 dOvhd = MicroTimer - dOvhd ''' calc microtimer overhead
 '
 dRangeTime = MicroTimer
 oCalcRange.Calculate
 dRangeTime = MicroTimer - dRangeTime - dOvhd
 '
 On Error GoTo 0
 '
 dRangeTime = Int(dRangeTime * 100000) / 100
 '
 ' 16/11/2009 - bypass multi-cell array formula problem
 '
 If Val(Application.Version) > 9 And Val(Application.Version) < 12 Then
 oCalcRange.Dirty
 End If
 '
 ' change message if array formula caused expansion of selection
 '
 If oCalcRange.Count = Selection.Count Then
 strMessage = CStr(Selection.Count) & " Cell(s) in Selected Range "
 Else
 strMessage = CStr(oCalcRange.Count) & " Cell(s) in Expanded Range "
 End If
 iMsg = MsgBox(strMessage & CStr(dRangeTime) & " Milliseconds", vbOKOnly + vbInformation, "RangeCalc")
 End If
 End If
 Application.Iteration = blIter ''' restore setting
 Set oCalcRange = Nothing
 Exit Sub
 errhandl:
 On Error GoTo 0
 Application.Iteration = blIter ''' restore setting
 Set oCalcRange = Nothing
 iMsg = MsgBox("Unable to Calculate Range", vbOKOnly + vbCritical, "RangeCalc")
 End Sub

Circular References

Using Range.Calculate on ranges that contain circular references within the range fails in Excel versions before Excel 2007.
In Excel 2007 and later Range.calculate only does a single iteration of the circular reference in Manual calculation mode, regardless of the Iteration settings.
So the RangeCalc addin switches iteration off whilst doing the Range.Calculate.

Multiple Sheets Selected

If you have multiple sheets selected Range.Calculate fails with a 1004 error, so the RangeCalc code has an error trap and message for any failure in Range.Calculate.

Multiple Areas selected  on a single Sheet

Range.Calculate will happily calculate a multi-area selection as long as all the areas are on the same sheet.

Multi-Cell Array formulas

If you do not select all the cells in a multi-cell array formula Range.Calculate will fail. My RangeCalc addin solves this problem by:

  • Automatically expanding the range to calculate to include all the cells in any array formula which intersects the selected range
  • Notifying the user that the range has been expanded

The VBA code to exapnd the range looks like this:


Sub ExpandRange(oStartRange As Range, oEndRange As Range)
 '
 ' COPYRIGHT © DECISION MODELS LIMITED 2000,2001. All rights reserved
 '
 ' Input: oStartRange, a range object that may or may not contain array formulae
 ' Output: oEndRange, a range object that has been expanded -
 ' to include all the cells in any array formula that is partly in the range
 '
 Dim oCell As Range
 Dim oArrCell As Range
 '
 ' loop on cells in oStartRange
 ' and expand range to include all the cells in any array formulae
 '
 On Error Resume Next
 '
 Set oEndRange = oStartRange
 For Each oCell In oStartRange
 If oCell.HasArray = True Then
 For Each oArrCell In oCell.CurrentArray
 '
 ' add any extra array cells
 '
 If Intersect(oEndRange, oArrCell) Is Nothing Then
 '
 ' if this cell is not in the expanded range then add it
 '
 Set oEndRange = Union(oEndRange, oArrCell)
 End If
 Next oArrCell
 End If
 Next oCell
 Set oCell = Nothing
 Set oArrCell = Nothing
 End Sub

There is also another problem with multi-cell array formulas and Range.Calculate, but it only exists in Excel 2002 and 2003 (after a Range.Calculate the array formula gets evaluated once for each cell it occupies in all subsequent recalculations). This problem is bypassed by using Range.Dirty on the range!

Note: The bug in Range.Dirty  is still there in Excel 2013. (it always works on the active sheet even when the range refers to another sheet!)

Range.Calculate and Range.CalculateRowMajorOrder – different handling of within-range dependencies

In early Excel versions (Excel 97 and 2000) Range.Calculate used a very simple calculation method: calculate the cells in each row in turn from left to right and ignore any forward references or within range dependencies. This method is fine as long as you know thats what it does and arrange your formulas accordingly (otherwise you may get incorrect results)!

But some people thought this was a bug, so it got fixed in Excel 2002 and 2003 (and later versions): Range.Calculate now starts by doing the left-to right calculation on each row in turn, and then starts recalculating any cells that refer to uncalculated cells within the range. In other words it achieves the same result as the standard Excel recalculation method.

The only problem was that this made Range.Calculate slower than in previous versions: and so some customers refused to upgrade because they could not run their bump runs fast enough!

So in Excel 2007 Microsoft solved the problem by introducing Range.CalculateRowMajorOrder. This method worked exactly the same way as the Excel 97 versions of Range.Calculate and was faster than the new Range.Calculate, and so everyone was happy except the VBA coders who had to work out when to use which method.

Some more Range.Calculate Limitations

Whilst the 2 Range Calculate methods are very useful, they do have some limitations:

  • They are both single-threaded calculation methods (In todays world this a serious limitation)
  • There is no keystroke sequence to initiate them from the UI (FastExcel uses Alt-F9 for this)
  • Re-entrant use of Range.Calculate is not allowed: for instance you can’t call Range.Calculate from inside a UDF
  • Range.Calculate works in US english dates etc.

Summary

  • Range.Calculate and Range.CalculateRowMajorOrder can be fast calculation methods
  • But they are not multi-threaded
  • For me they are essential tools for comparing formula speed
  • They need a bit of wrapping code, as in my RangeCalc addin, to make them generally useful.
Posted in Calculation, VBA | Tagged , | 1 Comment

Parsing Functions from Excel Formulas using VBA: Is MID or a Byte array the best method?

As part of extending the performance profiling abilities of FastExcel, I wanted to develop a Function Profiler Map. A key component of this is to extract the names of the functions embedded in Excel formulas.

So I experimented with some different approaches:

  • using Rob Van Gelder’s AudXL formula parser
  • using the MID function to scan through character by character
  • using byte arrays

The Test Data

For performance testing I am using a large worksheet with 638K used cells, 103K constants and 413K formulas. There are 3 functions on this sheet (710K INDEX, 5 SUM and 51K IF).

For validity testing I am using a small worksheet with some tricky formulas:

ParseFormulas1

Rob Van Gelder’s AudXL formula parser

Rob converted a javascript-based Excel formula parser  written by Eric Bachtal to VBA. You can download AudXL.xla from here. It is a useful tool for breaking formulas apart so that they are more readable.

So I started with that as a performance baseline: it takes 130 seconds to parse the large test worksheet.

But of course a general purpose formula parser should be slower than a purpose-built parser that only finds the functions.

Rules for Finding Functions in Formula strings

  • Function names are always followed by open bracket “(“
  • Function Names are always preceded by an operator
    +-,/*(=><& :!^
  • Formula text enclosed in single or double quotes cannot contain a function

Assumptions

  • Formulas to be parsed will be obtained using Range.FormulaR1C1.
  • This means that the formula string will will use American English settings and native function names.
  • And I only have to parse formulas that have been validated by Excel.
  • The full range of Unicode characters are allowed, so I can’t just use ASCII codes.

Download the VBA Code

You can download the routines containing the VBA from here

The First attempt: using Byte arrays

In a previous post I discussed using Byte arrays as an efficient method for processing all the characters in a string.
This is a good method for handling Unicode strings – each character gets converted into 2 separate bytes (0 to 255) which uniquely define the character and cover all the possible National Language characters in the world.

So “(” for instance is always 40 and 0 and


Dim abFormula() as Byte
Dim str1 as string
str1="=NA()"
abFormula=str1


produces:

ParseFormulas2

Note that the Byte arrays produced by assigning a string to a byte array are always zero-based.

You can also do the reverse: assigning a byte array to a string converts back.

The First Algorithm

So here is an algorithm that follows the rules:

Scan each character in the byte array from left to right:
If the character is a single or double quote ignore all subsequent characters until there is another single or double quote.
if the character is in the list of operators (abStartChars) then set a flag
if the character is NOT in the list of operators check if its a “(” and
if it IS a “(” and there are one or more non-operator characters preceding it we have found a function name.

The code looks like this:


Function GetFunc1(abFormula() As Byte, abStartChars() As Byte, abEndChar() As Byte, _
 abQuotes() As Byte, abSQ() As Byte, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a byte array starting at the jStart byte position
 ' byte array is zero based pairs of bytes
 ' abFormula is a byte array version of the formula
 ' abStartChars is a byte array containing characters that can precede the function name
 ' abEndChar is a byte array containing "("
 ' abQuotes is a byte array contianing a double quote
 ' abSQ is a byte array containing a single quote
 '
 ' returns name of function as a string and jEnd as the byte position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blString As Boolean
 Dim abFunc() As Byte
 '
 jFirst = jStart + 2
 blString = False
 For j = jStart + 2 To (UBound(abFormula) - 2) Step 2
 '
 ' skip text strings
 '
 If (abFormula(j) = abQuotes(0) And abFormula(j + 1) = abQuotes(1)) _
 Or (abFormula(j) = abSQ(0) And abFormula(j + 1) = abSQ(1)) Then
 blString = Not blString
 End If
 If Not blString Then
 '
 ' look for non startchar
 '
 blStart = False
 For jStartChar = 0 To UBound(abStartChars) Step 2
 If abFormula(j) = abStartChars(jStartChar) _
 And abFormula(j + 1) = abStartChars(jStartChar + 1) Then
 blStart = True
 jFirst = j + 2
 Exit For
 End If
 Next jStartChar
 If Not blStart Then
 If abFormula(j) = abEndChar(0) And abFormula(j + 1) = abEndChar(1) Then
 '
 ' we have a (
 '
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 '
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 ' convert slice of formula to function name string
 '
 ReDim abFunc(0 To (jEnd - jFirst - 1)) As Byte
 For k = 0 To UBound(abFunc)
 abFunc(k) = abFormula(jFirst + k)
 Next k
 GetFunc1 = abFunc
 Exit Function
 '
 ElseIf abFormula(jFirst) = abEndChar(0) _
 And abFormula(jFirst + 1) = abEndChar(1) Then
 jFirst = jFirst + 2
 End If
 End If
 End If
 End If
 Next j
 End Function

And the driver routine just loops on all the formulas in the worksheet, calling the function parsing routine and storing/counting the functions found in a dictionary.


Sub testing1()
 Dim strFormula As String
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim abEndChar() As Byte
 Dim abQuotes() As Byte
 Dim abSQ() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim dTime As Double
 Dim dicFuncs As New Dictionary
 '
 ''' characters that can come before the start of a Function name
 Const strStartChars1 As String = "+-,/*=><& :!^" & vbLf
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars1
 abEndChar = "("
 abQuotes = Chr(34)
 abSQ = Chr(39)
 '
 dicFuncs.CompareMode = TextCompare
 '
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 For Each oCell In oFormulas
 strFormula = oCell.FormulaR1C1
 abFormula = strFormula
 jStart = 0
 jEnd = 0
 Do
 strFunc = UCase(GetFunc1(abFormula, abStartChars, abEndChar, abQuotes, abSQ, jStart, jEnd))
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 '
 ' add Function to dictionary and count occurrences
 '
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + 1
 Else
 dicFuncs.Add strFunc, 1
 End If
 Loop
 Next oCell
 '
 MsgBox MicroTimer - dTime
 End Sub

This version takes 48 seconds: better but with room for improvement.

The (nearly) Final Algorithm

  • I don’t need to worry about upper-lower case Function names, since Excel already takes care of that, so I can use the default Binary Compare for the dictionary
  • All the special characters for operators etc always have byte 2 =0 so I only need to test that once per character rather on each comparison.
  • If I first search left to right for the “(” character and then work backwards looking for an operator character the function does many fewer comparison operations, because there is only 1 end character but many start characters.

The resulting VBA code looks like this:


Function GetFunc3(abFormula() As Byte, abStartChars() As Byte, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a byte array bstarting at the jStart byte position
 ' byte array is zero based pairs of bytes
 ' abFormula is a byte array version of the formula
 ' abstartchars is a byte array containing characters that can precede the function name
 '
 ' returns a string name of function and jEnd as the position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jj As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blDoubleQ As Boolean
 Dim blSingleQ As Boolean
 Dim abFunc() As Byte
 '
 jFirst = jStart + 2
 blDoubleQ = False
 For j = jStart + 2 To (UBound(abFormula) - 2) Step 2
 '
 ' start and end characters always have byte 2 =0
 '
 If abFormula(j + 1) = 0 Then
 '
 ' skip text strings
 '
 If abFormula(j) = 39 Then blSingleQ = Not blSingleQ
 If Not blSingleQ Then
 If abFormula(j) = 34 Then blDoubleQ = Not blDoubleQ
 If Not blDoubleQ Then
 '
 ' look for (
 '
 If abFormula(j) = 40 Then
 '
 ' we have a (
 ' look backwards for a startchar
 '
 blStart = False
 For jj = j - 2 To jStart Step -2
 For jStartChar = 0 To UBound(abStartChars) Step 2
 If abFormula(jj) = abStartChars(jStartChar) Then
 blStart = True
 jFirst = jj + 2
 Exit For
 End If
 Next jStartChar
 If blStart Then Exit For
 Next jj
 If blStart Then
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 Exit For
 ElseIf abFormula(jFirst) = 40 Then
 jFirst = jFirst + 2
 End If
 End If
 End If
 End If
 End If
 End If
 Next j
 If blStart And jEnd > jFirst Then
 '
 ' convert slice of formula to function name string
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 '
 ReDim abFunc(0 To (jEnd - jFirst - 1)) As Byte
 For k = 0 To UBound(abFunc)
 abFunc(k) = abFormula(jFirst + k)
 Next k
 GetFunc3 = abFunc
 End If
 End Function

And the corresponding driver routine looks like this:


Sub testing3()
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim dTime As Double
 Dim dicFuncs As New Dictionary
 '
 ''' characters that can come before the start of a Function name
 Const strStartChars2 As String = "+-,/*(=><& :!^" & vbLf
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars2
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 For Each oCell In oFormulas
 abFormula = oCell.FormulaR1C1
 jStart = 0
 jEnd = 0
 Do
 strFunc = GetFunc3(abFormula, abStartChars, jStart, jEnd)
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + 1
 Else
 dicFuncs.Add strFunc, 1
 End If
 Loop
 Next oCell
 MsgBox MicroTimer - dTime
 End Sub

This version takes 11.6 seconds: but maybe I can make it faster?

Using MID$() instead of a Byte array

What happens if I forget about all this Byte array stuff and just implement the same algorithm using MID$() to extract each character from the formula and INSTR to check against the list of start operators (start characters)?

The GetFunc VBA code looks like this:


Function GetFunc4(strFormula As String, strStartChars As String, strEndChar As String, strQuotes As String, strSQ As String, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a formula string starting at the jStart character position
 '
 ' strStartChars is a string containing characters that can precede the function name
 '
 ' returns a string name of function and jEnd as the position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jj As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blDoubleQ As Boolean
 Dim blSingleQ As Boolean
 Dim abFunc() As Byte
 Dim strChar As String
 Dim iStartChar As Long
 '
 jFirst = jStart + 1
 blDoubleQ = False
 For j = jStart + 1 To (LenB(strFormula) - 1)
 strChar = Mid$(strFormula, j, 1)
 '
 ' skip text strings
 '
 If strChar = strSQ Then blSingleQ = Not blSingleQ
 If Not blSingleQ Then
 If strChar = strQuotes Then blDoubleQ = Not blDoubleQ
 If Not blDoubleQ Then
 '
 ' look for (
 '
 If strChar = strEndChar Then
 '
 ' we have a (
 ' look backwards for a startchar
 '
 blStart = False
 For jj = j - 1 To jStart Step -1
 strChar = Mid$(strFormula, jj, 1)
 iStartChar = InStrB(strStartChars, strChar)
 If iStartChar > 0 Then
 blStart = True
 jFirst = jj + 1
 Exit For
 End If
 Next jj
 If blStart Then
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 '
 ' convert slice of formula to function name string
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 GetFunc4 = Mid$(strFormula, jFirst, jEnd - jFirst)
 Exit Function
 ElseIf Mid$(strFormula, jFirst, 1) = strEndChar Then
 jFirst = jFirst + 1
 End If
 End If
 End If
 End If
 End If
 Next j
 End Function

Using MID$ and INSTR is slower: it takes 21.8 seconds.

Optimising the Driver Routine.

OK so I am reasonably happy with the parsing routine: I have gone from 130 seconds for a generic parsing routine to 11.6 seconds for the specialised Byte array routine.

But the driver routine looks at every single formula on the worksheet, and we know that on most large worksheets a large percentage of the formulas are copied.

So I can use the dictionary approach to find the distinct formulas (use R1C1 rather than A1 mode because copied formulas are identical in R1C1) and just parse those:

Testing3C also makes some more speed improvements by looping on each area and getting the formulas into a variant array rather than looping directly on the cells.


Sub testing3c()
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim oArea As Range
 Dim vArr As Variant
 Dim vF As Variant
 Dim dTime As Double
 Dim dtotTime As Double
 Dim dicFuncs As New Dictionary
 Dim dicFormulas As New Dictionary
 Dim j As Long
 Dim k As Long
 '
 Const strStartChars2 As String = "+-,/*(=><& :!^" & vbLf ''' characters that can come before the start of a Function name
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars2
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 ' find distinct formulas in areas and count occurrences
 '
 For Each oArea In oFormulas.Areas
 '
 ' get a block of formulas
 '
 vArr = oArea.FormulaR1C1
 If IsArray(vArr) Then
 For k = 1 To UBound(vArr, 2)
 For j = 1 To UBound(vArr)
 If dicFormulas.Exists(vArr(j, k)) Then
 dicFormulas(vArr(j, k)) = dicFormulas(vArr(j, k)) + 1
 Else
 dicFormulas.Add vArr(j, k), 1
 End If
 Next j
 Next k
 Else
 If dicFormulas.Exists(vArr) Then
 dicFormulas(vArr) = dicFormulas(vArr) + 1
 Else
 dicFormulas.Add vArr, 1
 End If
 End If
 Next oArea
 '
 ' parse only the distinct formulas
 '
 For Each vF In dicFormulas
 abFormula = vF
 jStart = 0
 jEnd = 0
 Do
 strFunc = GetFunc3(abFormula, abStartChars, jStart, jEnd)
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + dicFormulas(vF)
 Else
 dicFuncs.Add strFunc, dicFormulas(vF)
 End If
 Loop
 Next vF
 MsgBox MicroTimer - dTime
 End Sub

Now this takes only 3.5 seconds: adding a formula string to a dictionary is much faster than parsing it!

Conclusion

  • Byte Arrays can be significantly faster for character by character operations rather than using MID and INSTR.
  • Looking at how the first attempt at an algorithm works can give you clues about how to improve it.
  • Even using Byte arrays string parsing operations are slow in VBA.
  • Using a dictionary its really fast to find the distinct formulas even on a large worksheet.

Challenge

OK guys: who can write a faster VBA function parsing routine?

Download the routines from here

Posted in Calculation, VBA | Leave a comment