MATCH vs FIND vs Variant Array VBA Performance Shootout in Excel 2010

October 26, 2011

When searching unsorted data in VBA I have always tended to use MATCH rather than FIND, mainly because early versions of Excel before Excel 2003 did not allow you to use FIND in UDFs.

But prompted by a discussion in one of the Excel forums I thought it was about time I revisited Find and Match to see which performs better in Excel 2010, and for good measure lets compare them to getting the data into a variant array and looping through the array.

Generating Test Data

Andreas Killer came up with a nice routine for generating test data:

Sub Init()
Dim Data(1 To 100000, 1 To 2)
Dim i As Long
Rnd -5
For i = 1 To UBound(Data)
If Rnd > 0.5 Then Data(i, 1) = "X"
If Rnd > 0.5 Then Data(i, 2) = "Y"
Next
Cells.ClearContents
Range("A1").Resize(UBound(Data), UBound(Data, 2)) = Data
End Sub

This code randomly generates X in column 1 and Y in column 2 in a range of 100 thousand rows, with the number of Xs and Ys controlled by the constant 0.5
Changing the 0.5 to 0.9 will give few Xs and Ys, and changing it to 0.001 will give lots.

Since the X and the Y are using different random numbers there will be rows with X but no Y and Y but no X, as well as rows with both X and Y.

This makes it easy to test how the various methods compare with different densities of data.

Timing

I am using the MicroTimer high-resolution timer API to give timing accuracy at the microsecond (millionths of a second) level, but the timing results will all be in milliseconds.

#If VBA7 Then
Private Declare PtrSafe Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#Else
Private Declare Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#End If
Public Function MicroTimer() As Double
'
' returns seconds
' uses Windows API calls to the high resolution timer
'
Dim cyTicks1 As Currency
Dim cyTicks2 As Currency
Static cyFrequency As Currency
'
MicroTimer = 0
'
' get frequency
'
If cyFrequency = 0 Then getFrequency cyFrequency
'
' get ticks
'
getTickCount cyTicks1
getTickCount cyTicks2
If cyTicks2 < cyTicks1 Then cyTicks2 = cyTicks1
'
' calc seconds
'
If cyFrequency Then MicroTimer = cyTicks2 / cyFrequency
End Function

The code uses conditional compilation for the Windows API calls so that it will work for both Excel 2010 32-bit and 64-bit Excel.

Using FIND

Here is the test code using Range.Find

Sub FindXY1()
Dim oRng As Range
Dim n As Long
Dim dTime As Double
Dim FirstAddress As String
dTime = MicroTimer
'
With Range("a1:A100000")
Set oRng = .Find("X", , xlValues, , , xlNext, False)
If Not oRng Is Nothing Then
FirstAddress = oRng.Address
If oRng.Offset(0, 1) = "Y" Then n = n + 1
Do
Set oRng = .FindNext(oRng)
If Not oRng Is Nothing Then
If oRng.Offset(0, 1) = "Y" And oRng.Address <> FirstAddress Then
n = n + 1
End If
End If
Loop While Not oRng Is Nothing And oRng.Address <> FirstAddress
End If
End With
'
Debug.Print "Find " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The Sub does a Range.Find on the 100 thousand rows looking for X, and then checks if column 2 for that cell also contains Y.
Then this is repeated using FindNext until we have looped back to the first range address.
When completed the number of rows with both X and Y (this is less than the number of times an X was found) and the time in milliseconds is shown in the Immmediate window.

Using Match

Here is the code using WorksheetFunction.MATCH

Sub FindXY2()
Dim oRng As Range
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
Set oRng = Range("a1:A100000")
On Error GoTo Finish
With Application.WorksheetFunction
Do
j = .Match("X", oRng, 0)
If oRng(j, 2).Value2 = "Y" Then n = n + 1
Set oRng = oRng.Resize(oRng.Rows.Count - j, 1).Offset(j, 0)
Loop
End With
Finish:
Debug.Print "Match " & n & " " & (MicroTimer - dTime) * 1000
End Sub

This Sub works by using Worksheetfunction.Match to find the first occurrence of X within the Range object oRng.
After each X is found oRng is resized to exclude the range already searched.

The loop terminates either when nothing is found (Worksheetfunction.Match raises an error)  or the Resize fails, also raising an error.

Using a Variant Array

Sub FindXY3()
Dim vArr As Variant
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
vArr = Range("a1:B100000").Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = "X" Then
If vArr(j, 2) = "Y" Then
n = n + 1
End If
End If
Next j
Debug.Print "Var array " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The code gets the Range into a Variant, creating a 2-dimensional array, and then loops down the array looking for X and Y.

Timing Results

I ran each sub 4 times using different densities of data.

The first is with a single XY pair at row number 100000. This tests the scanning speed per row rather than the calling overhead.

Then I used the Test data generator with constant values of 0.9, 0.5 and 0.001.

Here are the results giving timings in Milliseconds with counts of the XY pairs:

 
 
 
 
 

And here are the ratios of the times:

 
 
 
 
 

  • You can see that Find is significantly slower than Match or using a Variant array, regardless of the number of XY pairs found.
  • The timings for Match increase fast with the number of XY pairs, whereas the Variant array increases much more slowly.
  • This is because there is a much higher overhead for each call to Match than looping from row to row of the array.
  • For small numbers of XY pairs Match is faster than the Variant array.

Conclusions

  • Don’t use Range.Find unless you want to search a large number of columns for the same thing (you would have to do a Match for each column).
  • The Variant array approach is surprisingly effective, particularly when you expect a large number of hits.
  • Match wins easily for a small number of hits.

OK, so who is going to admit to using Range.Find?


Writing efficient VBA UDFs (Part 6) – Faster string handling and Byte arrays

October 18, 2011

None of  the previous posts on writing efficient VBA UDfs (Part1,Part2,Part3,Part4,Part5) talked about handling strings in VBA.
This could be a major omission since string-handling is one of VBAs slowest “features”.

Suppose you want to find the position of the first capital letter in a string.

Array Formula

You could use an array formula like this:

{=MATCH(TRUE,ISERR(FIND(MID(A5,ROW($1:$255),1),
LOWER(A5))),0)}

My test data is 2000 rows, each containing 25 lower-case characters and one randomly placed upper-case character.

2000 calls to this array formula takes 250 milliseconds.

So lets try some VBA UDFs.

Using LIKE

One way is to use the VBA LIKE statement:

Function FirstCap2(Cell As Range)
For FirstCap2 = 1 To Len(Cell.Value)
If Mid(Cell.Value, FirstCap2, 1) Like "[A-Z]" Then
Exit For
End If
Next FirstCap2
End Function

The code loops across the string using Mid to look at each character in turn, and then uses LIKE to see if the character is one of upper-case A to upper-case Z.
2000 calls to this UDF takes 50 milliseconds – a factor of 5 faster, but we can make it faster (of course).


Function FirstCap3(Rng As Range) As Long
Dim theString As String
theString = Rng.Value2
For FirstCap3 = 1 To Len(theString)
If Mid$(theString, FirstCap3, 1) Like "[A-Z]" Then
Exit For
End If
Next FirstCap3
End Function

I changed the code to only get the string out of the cell once, and to use Mid$ rather than Mid. All the VBA string handling functions have 2 versions: versions without the $ work with variant arguments, whereas versions with the $ suffix only work on string arguments, but are slightly faster.
2000 calls to this version of the UDF takes 17 milliseconds, nearly 3 times faster.

Using MID$

But maybe using LIKE is slow? Lets try comparing a lower-case version of the string and stopping when the characters don’t match:

Function FirstCap4(strInp As String) As Long
Dim tmp As String
Dim i As Long
Dim pos As Long
tmp = LCase$(strInp)
pos = -1
For i = 1 To Len(tmp)
If Mid$(tmp, i, 1) <> Mid$(strInp, i, 1) Then
pos = i
Exit For
End If
Next
FirstCap4 = pos
End Function

Well surprisingly this is slower than the optimised version using LIKE:
2000 calls to this version of the UDF takes 36 milliseconds.

Using Byte Arrays

Using Byte arrays with strings is one of VBAs less well known secrets, but its often an efficient way of handling strings when you need to inspect each character in turn.


Public Function FirstCap5(theRange As Range) As Long
Dim aByte() As Byte
Dim j As Long
FirstCap5 = -1
aByte = theRange.Value2
For j = 0 To UBound(aByte, 1) Step 2
If aByte(j) < 91 Then
If aByte(j) > 64 Then
FirstCap5 = (j + 2) / 2
Exit For
End If
End If
Next j
End Function

This version of the UDF is slightly faster: 2000 calls takes 15 milliseconds.

So how does this work?

First create an undimensioned array of Bytes : Dim aByte() as Byte
Then assign a string to it: aByte=”abEfg”
You can use the Locals window to see what the resulting Byte array looks like:

Each character in the string has resulted in 2 bytes which are the Unicode code points for the character. Since I am working in a UK English Locale using the Windows Latin-1 codepage the first byte is the ANSI number for the character and the second byte is always zero.

Unaccented english upper-case characters are ANSI numbers 65 to 90, so I can loop down the byte array, looking at every other byte, and do a numeric test directly on the character to see if it is upper-case. You can see that only the third character is upper-case.

Another surprising feature of  this kind of Byte array is that you can assign a byte array directly back to a string:

Dim str1 as string
str1=aByte

Str1 now contains “abEfg”

Array version of the Byte UDF

As discussed in Part 5 of writing efficient UDFs, Array Formulae go faster. So here is an array formula version of the Byte UDF.

Public Function AFirstCap(theRange As Range) As Variant
Dim aByte() As Byte
Dim j As Long
Dim L As Long
Dim vRange As Variant
Dim jAnsa() As Long
Dim NumCells As Long
vRange = theRange.Value2
NumCells = UBound(vRange, 1)
ReDim jAnsa(NumCells - 1, 0)
For L = 0 To NumCells - 1
jAnsa(L, 0) = -1
aByte = vRange(L + 1, 1)
For j = 0 To UBound(aByte, 1) Step 2
If aByte(j) < 91 Then
If aByte(j) > 64 Then
jAnsa(L, 0) = (j + 2) / 2
Exit For
End If
End If
Next j
Next L
AFirstCap = jAnsa
End Function


This version, entered into 2000 rows as an array formula using Control/Shift/Enter, takes just 4.8 milliseconds.

Conclusion

Here is a table comparing the speed of these different approaches.

Method

Milliseconds

Array Formula

250

LIKE UDF

50

Optimised LIKE UDF

17

MID$ UDF

36

Byte Array UDF

15

Array Formula version of Byte Array UDF

4.8

So the fastest VBA is just over 10 times faster than the slowest VBA solution, and a whopping 52 times faster than the array formula solution.

Using Byte arrays for strings can be a good solution for string handling where you need to inspect or manipulate many individual characters.

So what do you use Byte arrays for?


Follow

Get every new post delivered to your Inbox.