When searching unsorted data in VBA I have always tended to use MATCH rather than FIND, mainly because early versions of Excel before Excel 2003 did not allow you to use FIND in UDFs.
But prompted by a discussion in one of the Excel forums I thought it was about time I revisited Find and Match to see which performs better in Excel 2010, and for good measure lets compare them to getting the data into a variant array and looping through the array.
Generating Test Data
Andreas Killer came up with a nice routine for generating test data:
Sub Init() Dim Data(1 To 100000, 1 To 2) Dim i As Long Rnd -5 For i = 1 To UBound(Data) If Rnd > 0.5 Then Data(i, 1) = "X" If Rnd > 0.5 Then Data(i, 2) = "Y" Next Cells.ClearContents Range("A1").Resize(UBound(Data), UBound(Data, 2)) = Data End Sub
This code randomly generates X in column 1 and Y in column 2 in a range of 100 thousand rows, with the number of Xs and Ys controlled by the constant 0.5
Changing the 0.5 to 0.9 will give few Xs and Ys, and changing it to 0.001 will give lots.
Since the X and the Y are using different random numbers there will be rows with X but no Y and Y but no X, as well as rows with both X and Y.
This makes it easy to test how the various methods compare with different densities of data.
I am using the MicroTimer high-resolution timer API to give timing accuracy at the microsecond (millionths of a second) level, but the timing results will all be in milliseconds.
#If VBA7 Then Private Declare PtrSafe Function getFrequency Lib "kernel32" Alias _ "QueryPerformanceFrequency" (cyFrequency As Currency) As Long Private Declare PtrSafe Function getTickCount Lib "kernel32" Alias _ "QueryPerformanceCounter" (cyTickCount As Currency) As Long #Else Private Declare Function getFrequency Lib "kernel32" Alias _ "QueryPerformanceFrequency" (cyFrequency As Currency) As Long Private Declare Function getTickCount Lib "kernel32" Alias _ "QueryPerformanceCounter" (cyTickCount As Currency) As Long #End If Public Function MicroTimer() As Double ' ' returns seconds ' uses Windows API calls to the high resolution timer ' Dim cyTicks1 As Currency Dim cyTicks2 As Currency Static cyFrequency As Currency ' MicroTimer = 0 ' ' get frequency ' If cyFrequency = 0 Then getFrequency cyFrequency ' ' get ticks ' getTickCount cyTicks1 getTickCount cyTicks2 If cyTicks2 < cyTicks1 Then cyTicks2 = cyTicks1 ' ' calc seconds ' If cyFrequency Then MicroTimer = cyTicks2 / cyFrequency End Function
The code uses conditional compilation for the Windows API calls so that it will work for both Excel 2010 32-bit and 64-bit Excel.
Here is the test code using Range.Find
Sub FindXY1() Dim oRng As Range Dim n As Long Dim dTime As Double Dim FirstAddress As String dTime = MicroTimer ' With Range("a1:A100000") Set oRng = .Find("X", , xlValues, , , xlNext, False) If Not oRng Is Nothing Then FirstAddress = oRng.Address If oRng.Offset(0, 1) = "Y" Then n = n + 1 Do Set oRng = .FindNext(oRng) If Not oRng Is Nothing Then If oRng.Offset(0, 1) = "Y" And oRng.Address <> FirstAddress Then n = n + 1 End If End If Loop While Not oRng Is Nothing And oRng.Address <> FirstAddress End If End With ' Debug.Print "Find " & n & " " & (MicroTimer - dTime) * 1000 End Sub
The Sub does a Range.Find on the 100 thousand rows looking for X, and then checks if column 2 for that cell also contains Y.
Then this is repeated using FindNext until we have looped back to the first range address.
When completed the number of rows with both X and Y (this is less than the number of times an X was found) and the time in milliseconds is shown in the Immmediate window.
Here is the code using WorksheetFunction.MATCH
Sub FindXY2() Dim oRng As Range Dim j As Long Dim n As Long Dim dTime As Double dTime = MicroTimer Set oRng = Range("a1:A100000") On Error GoTo Finish With Application.WorksheetFunction Do j = .Match("X", oRng, 0) If oRng(j, 2).Value2 = "Y" Then n = n + 1 Set oRng = oRng.Resize(oRng.Rows.Count - j, 1).Offset(j, 0) Loop End With Finish: Debug.Print "Match " & n & " " & (MicroTimer - dTime) * 1000 End Sub
This Sub works by using Worksheetfunction.Match to find the first occurrence of X within the Range object oRng.
After each X is found oRng is resized to exclude the range already searched.
The loop terminates either when nothing is found (Worksheetfunction.Match raises an error) or the Resize fails, also raising an error.
Using a Variant Array
Sub FindXY3() Dim vArr As Variant Dim j As Long Dim n As Long Dim dTime As Double dTime = MicroTimer vArr = Range("a1:B100000").Value2 For j = LBound(vArr) To UBound(vArr) If vArr(j, 1) = "X" Then If vArr(j, 2) = "Y" Then n = n + 1 End If End If Next j Debug.Print "Var array " & n & " " & (MicroTimer - dTime) * 1000 End Sub
The code gets the Range into a Variant, creating a 2-dimensional array, and then loops down the array looking for X and Y.
I ran each sub 4 times using different densities of data.
The first is with a single XY pair at row number 100000. This tests the scanning speed per row rather than the calling overhead.
Then I used the Test data generator with constant values of 0.9, 0.5 and 0.001.
Here are the results giving timings in Milliseconds with counts of the XY pairs:
And here are the ratios of the times:
- You can see that Find is significantly slower than Match or using a Variant array, regardless of the number of XY pairs found.
- The timings for Match increase fast with the number of XY pairs, whereas the Variant array increases much more slowly.
- This is because there is a much higher overhead for each call to Match than looping from row to row of the array.
- For small numbers of XY pairs Match is faster than the Variant array.
- Don't use Range.Find unless you want to search a large number of columns for the same thing (you would have to do a Match for each column).
- The Variant array approach is surprisingly effective, particularly when you expect a large number of hits.
- Match wins easily for a small number of hits.
OK, so who is going to admit to using Range.Find?