MATCH vs FIND vs Variant Array VBA Performance Shootout in Excel 2010

When searching unsorted data in VBA I have always tended to use MATCH rather than FIND, mainly because early versions of Excel before Excel 2003 did not allow you to use FIND in UDFs.

But prompted by a discussion in one of the Excel forums I thought it was about time I revisited Find and Match to see which performs better in Excel 2010, and for good measure lets compare them to getting the data into a variant array and looping through the array.

Generating Test Data

Andreas Killer came up with a nice routine for generating test data:

Sub Init()
Dim Data(1 To 100000, 1 To 2)
Dim i As Long
Rnd -5
For i = 1 To UBound(Data)
If Rnd > 0.5 Then Data(i, 1) = "X"
If Rnd > 0.5 Then Data(i, 2) = "Y"
Next
Cells.ClearContents
Range("A1").Resize(UBound(Data), UBound(Data, 2)) = Data
End Sub

This code randomly generates X in column 1 and Y in column 2 in a range of 100 thousand rows, with the number of Xs and Ys controlled by the constant 0.5
Changing the 0.5 to 0.9 will give few Xs and Ys, and changing it to 0.001 will give lots.

Since the X and the Y are using different random numbers there will be rows with X but no Y and Y but no X, as well as rows with both X and Y.

This makes it easy to test how the various methods compare with different densities of data.

Timing

I am using the MicroTimer high-resolution timer API to give timing accuracy at the microsecond (millionths of a second) level, but the timing results will all be in milliseconds.

#If VBA7 Then
Private Declare PtrSafe Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#Else
Private Declare Function getFrequency Lib "kernel32" Alias _
"QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" Alias _
"QueryPerformanceCounter" (cyTickCount As Currency) As Long
#End If
Public Function MicroTimer() As Double
'
' returns seconds
' uses Windows API calls to the high resolution timer
'
Dim cyTicks1 As Currency
Dim cyTicks2 As Currency
Static cyFrequency As Currency
'
MicroTimer = 0
'
' get frequency
'
If cyFrequency = 0 Then getFrequency cyFrequency
'
' get ticks
'
getTickCount cyTicks1
getTickCount cyTicks2
If cyTicks2 < cyTicks1 Then cyTicks2 = cyTicks1
'
' calc seconds
'
If cyFrequency Then MicroTimer = cyTicks2 / cyFrequency
End Function

The code uses conditional compilation for the Windows API calls so that it will work for both Excel 2010 32-bit and 64-bit Excel.

Using FIND

Here is the test code using Range.Find


Sub FindXY1()
Dim oRng As Range
Dim n As Long
Dim dTime As Double
Dim FirstAddress As String
dTime = MicroTimer
'
With Range("a1:A100000")
Set oRng = .Find("X", , xlValues, , , xlNext, False)
If Not oRng Is Nothing Then
FirstAddress = oRng.Address
If oRng.Offset(0, 1) = "Y" Then n = n + 1
Do
Set oRng = .FindNext(oRng)
If Not oRng Is Nothing Then
If oRng.Offset(0, 1) = "Y" And oRng.Address <> FirstAddress Then
n = n + 1
End If
End If
Loop While Not oRng Is Nothing And oRng.Address <> FirstAddress
End If
End With
'
Debug.Print "Find " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The Sub does a Range.Find on the 100 thousand rows looking for X, and then checks if column 2 for that cell also contains Y.
Then this is repeated using FindNext until we have looped back to the first range address.
When completed the number of rows with both X and Y (this is less than the number of times an X was found) and the time in milliseconds is shown in the Immmediate window.

Using Match

Here is the code using WorksheetFunction.MATCH

Sub FindXY2()
Dim oRng As Range
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
Set oRng = Range("a1:A100000")
On Error GoTo Finish
With Application.WorksheetFunction
Do
j = .Match("X", oRng, 0)
If oRng(j, 2).Value2 = "Y" Then n = n + 1
Set oRng = oRng.Resize(oRng.Rows.Count - j, 1).Offset(j, 0)
Loop
End With
Finish:
Debug.Print "Match " & n & " " & (MicroTimer - dTime) * 1000
End Sub

This Sub works by using Worksheetfunction.Match to find the first occurrence of X within the Range object oRng.
After each X is found oRng is resized to exclude the range already searched.

The loop terminates either when nothing is found (Worksheetfunction.Match raises an error)  or the Resize fails, also raising an error.

Using a Variant Array

Sub FindXY3()
Dim vArr As Variant
Dim j As Long
Dim n As Long
Dim dTime As Double
dTime = MicroTimer
vArr = Range("a1:B100000").Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = "X" Then
If vArr(j, 2) = "Y" Then
n = n + 1
End If
End If
Next j
Debug.Print "Var array " & n & " " & (MicroTimer - dTime) * 1000
End Sub

The code gets the Range into a Variant, creating a 2-dimensional array, and then loops down the array looking for X and Y.

Timing Results

I ran each sub 4 times using different densities of data.

The first is with a single XY pair at row number 100000. This tests the scanning speed per row rather than the calling overhead.

Then I used the Test data generator with constant values of 0.9, 0.5 and 0.001.

Here are the results giving timings in Milliseconds with counts of the XY pairs:

And here are the ratios of the times:

  • You can see that Find is significantly slower than Match or using a Variant array, regardless of the number of XY pairs found.
  • The timings for Match increase fast with the number of XY pairs, whereas the Variant array increases much more slowly.
  • This is because there is a much higher overhead for each call to Match than looping from row to row of the array.
  • For small numbers of XY pairs Match is faster than the Variant array.

Conclusions

  • Don’t use Range.Find unless you want to search a large number of columns for the same thing (you would have to do a Match for each column).
  • The Variant array approach is surprisingly effective, particularly when you expect a large number of hits.
  • Match wins easily for a small number of hits.

OK, so who is going to admit to using Range.Find?

Advertisements
This entry was posted in VBA. Bookmark the permalink.

25 Responses to MATCH vs FIND vs Variant Array VBA Performance Shootout in Excel 2010

  1. I use Find almost exclusively (until now, that is). I would say that the vast majority of times I use it is on limited, single-column ranges. I will be using Match from now on.

  2. Jon says:

    I’ve found the find function to be overly complex so I never really used from the get go. I usually just use an array (I’ve made functions to deal with using arrays).

  3. Jim Cone says:

    Charles,
    I have some rather long Find code to do a multi sheet, multi word search and it completes generally before the start button rebounds.
    So your test results showing that Match beats Find surprised me.
    I tried your code on 50,000 rows of data and got approximately the same relative results that you did. I am going to continue reading your posts.

    Another surprise was the speed increase when I replaced your range.resize code line with a different range callout.
    This came up about 18% faster…

    ‘— code starts
    Sub FindXY333()
    Dim oRng As Range
    Dim oLastRng As Range
    Dim j As Long
    Dim n As Long
    Dim Rw As Long
    Dim dTime As Long

    dTime = timeGetTime
    Set oRng = Range(“a1:A50000”)
    Set oLastRng = oRng(oRng.Rows.Count)
    Rw = oLastRng.Row
    On Error GoTo Finish
    With Application.WorksheetFunction
    Do
    j = .Match(“X”, oRng, 0)
    If oRng(j, 2).Value2 = “Y” Then n = n + 1
    Set oRng = Range(oRng(j + 1), oLastRng) ‘<<<<= Rw
    End With

    Finish:
    MsgBox TellMeHowLong(dTime) & vbCr & n & ” found”
    End Sub
    ‘— code ends

    There were about 25,000 XY occurances.
    I use the Resize property frequently because of its ease of use.
    Wondering if I still should.

    Jim Cone

  4. Jim Cone says:

    Somebody did me wrong – probably me.
    The code I posted is incomplete. The Do / Loop should look like this…
    ‘—
    With Application.WorksheetFunction
    Do
    j = .Match(“X”, oRng, 0)
    If oRng(j, 2).Value2 = “Y” Then n = n + 1
    Set oRng = Range(oRng(j + 1), oLastRng)
    Loop Until oRng.Row >= Rw
    End With
    ‘—

    • fastexcel says:

      Jim,
      Yes your method is definitely faster:
      Equivalent timings on my system are 1.3, 97, 478, 950

      Had to change the loop a bit to avoid end-condition problems:
      Do
      Set oRng = Range(oRng(j + 1), oLastRng)
      j = .Match("X", oRng, 0)
      jRow = jRow + j
      If oRng(j, 2).Value2 = "Y" Then n = n + 1
      Loop Until jRow + 1 > Rw

  5. sam says:

    I think we have to keep the end application in mind.

    The advantage of Find is that it returns a Range Object.

    So Lets assume you want to delete rows in a table where we find the X,Y pair, or highlight the X,Y rows with red we would do this easily in the end

    oRng.entirerow.delete
    or
    oRng.interior.colorindex = 3

    Also it would be interesting to see how
    Application.WorksheetFunction.CountIf would perform over MATCH

    • Jon says:

      I find the union function works well for that, pretty speeding and I can still avoid the complex find function (that always seems to be buggy – I know, it’s my own user error, but it happens enough that I prefer my own.

  6. sam says:

    Sub FindXY_COUNTIFS()
    Dim j As Long
    Dim dTime As Double
    dTime = MicroTimer
    j = [COUNTIFS(A1:A100000,”x”,B1:B100000,”y”)]
    Debug.Print “COUNTIFS ” & j & ” ” & (MicroTimer – dTime) * 1000
    End Sub

    below are the timings
    COUNTIFS 25145 – 51.3873271629564
    COUNTIFS 25145 – 51.1837398335047
    COUNTIFS 25145 – 51.3968255727377

  7. jeff Weir says:

    I could be missing the point here, but It seems to me that this post covers the use of FIND in the case where you would want to use it exactly like MATCH: that is to check whether the ENTIRE text in your range/array element exactly matches your search string.

    But FIND does something MATCH does not: tests whether ANY text WITHIN your range/array exactly matches your test string.

    So doesn’t this post only cover the specific situation where you want to use FIND exactly like MATCH i.e. find an entire text string?

    That is –
    with FIND:
    * searching for “Excel Zero” will return a positive hit in “Excel Zero”; or
    * searching for “Zero” will return a positive hit in “Excel Zero”

    With MATCH:
    * searching for “Excel Zero” will return a positive hit in “Excel Zero”; but
    * searching for “Zero” will NOT return a positive hit in “Excel Zero”

    So you can use FIND to emulate all the circumstances in which you might use MATCH. But you cannot use MATCH to emulate all the circumstances in which you might use FIND. Ergo, you cannot say that MATCH is faster than FIND. You can say that MATCH is faster than FIND when you are using FIND to emulate MATCH.

  8. jeff Weir says:

    Sam just pointed out to me that MATCH accepts wildcards…I’d forgotten about that, assuming I ever knew it in the first place. I wonder if using wildcards in Match would change your results at all?

    • fastexcel says:

      I don’t know what using wildcards does to MATCH performance. LIKE in VBA is pretty fast these days (It was slow in early Excel versions). I guess I should do a MATCH vs LIKE vs FIND vs REGEX shootout..

  9. Bryan Metz says:

    Bah, I hate reading articles like this. With loops and type declarations and noncontiguous ranges to contend with when using the variant array method, I want the find method to be faster so I can be lazy!!!

  10. Pingback: VBA searching shootout between WorksheetFunction.Match: Performance Pros and Cons | Excel and UDF Performance Stuff

  11. Luca B says:

    One caveat about Match though… it gives a worksheet error if no match is found which either means running another function as an error trap or needing to call the Match twice….

    Unless you have a good trick for an Application.Match error trap up your sleeve?
    Please?

    😉

    • fastexcel says:

      Using Worksheetfunction.Match from VBA raises an error you can trap with On Error. See the example code in the Post.

      • David Zemens says:

        `Application.Match` *returns* an error which you can trap with the `IfError` function. The caveat is that you’ll need to assign it to a `Variant` type (otherwise Type Mismatch) or making a redundant call to the function in order to test the return value for error *before* assigning to a strongly-typed variable.

  12. Peter carr says:

    For Each works remarkably well: ForEach 25145 200.417874322738
    Dim cell As Range
    Dim dTime As Double
    Dim n As Long
    dTime = MicroTimer

    For Each cell In Range(“a1:B100000”).Columns(1).Cells
    If cell.Value2 = “X” Then
    If cell.Offset(, 1).Value2 = “Y” Then
    n = n + 1
    End If
    End If
    Next
    Debug.Print “ForEach ” & n & ” ” & (MicroTimer – dTime) * 1000

  13. Jim T says:

    I know it is possible to move a worksheet range into a 2-dimensional array with a single statement without having to step through each row/column, i.e.
    Option Base 1
    Dim ShtArray (1000,20)
    ShtArray = Range(“A1:T1000)

    Is it possible to move 2 worksheets into a 3-dimensional array with just 2 statements where the 1st dimension of the 3-dimensional array is the worksheet #, in a similar way, i.e.
    Option Base 1
    Dim ShtArray(2,1000,20)
    ShtArray(1,?,?) =Worksheet(“Sheet1”).Range(“A1:T1000)
    ShtArray(2,???)=Worksheet(“Sheet2”).Range(“A1:T1000)

  14. fastexcel says:

    Well you can sort of do it like this: define a variant array and then assign a range to a specific element of the array.
    Option Explicit
    Option Base 1
    Sub ThreedVar()
    Dim var(2) As Variant
    var(1) = Worksheets(“Sheet1”).Range(“A1:B2”)
    var(2) = Worksheets(“Sheet2”).Range(“A1:B4”)
    ‘ this is the syntax for addressing elements of the array
    MsgBox var(1)(2, 2)
    Worksheets(“Sheet1”).Range(“A10:B11”) = var(1)
    Worksheets(“Sheet2”).Range(“A10:B13”) = var(2)

    End Sub

    Note that VBA does not understand 3-dimensional range references

  15. Box says:

    I can verify a portion of this with respect to a different application. I am building a tool that is importing data from another excel based tool. The tool being imported has a massive pivot (2 years of day by day data with multiple rows of data for each individual date). All I was trying to do was find a specific date in this data set to use as a starting point. Using find it took a little over 45 seconds to return the row number of the date in question. Using match it was almost instantaneous!

  16. This blog is indeed one of the best online resources for learning especially for a newbie like me. Thanks for sharing your knowledge. Anyway, in the code using WorksheetFunction.MATCH, why don’t we just assign oRng(j, 2) with vbNullString? I mean to replace the line

    Set oRng = oRng.Resize(oRng.Rows.Count – j, 1).Offset(j, 0)

    with

    oRng(j, 2) = vbNullString

    so that you don’t have to resize the range oRng. I think your code will still work and I guess it would be faster than resizing the range. I use vbNullString instead of zero length string (“”) because VBA process it faster than ZLS. What do you think?

  17. alanelston says:

    Hello
    I have done some very extensive comparisons of .Match .Find as well as Array alternatives for a complex Merge requirement of mine involving data in two Workbooks with long rows of data up to about 3500 columns.
    The data is built up from sources where different spellings or different languages may be used for the column Headings. To allow for merging, each workbook has 20 Heading rows allowing for the different wording of the same basic column heading.
    A major part of the code is therefore comparing the headings in each Workbook to determine a column heading match.
    The common part of the code versions is: Consider one of the workbooks. Consider at/ For each column go “down” For 1 to 20 of the headings. For each one these headings I check for a match in the second Workbook headings.
    So I have considered full Array analysis along with doing all combinations of Row and Column .Match and .Find code versions to find a match in the second Workbook Heading range.

    Previously I had been a great fan of exclusively do#ng captured Variant Array analysis for these sort of thing.
    Various limitations forced me into considering the Worksheet Functions of Match and Find.
    I was surprised and impressed of the improved performance of considering a Find or Match as an alternative to finding a match “along a long row”.
    I had some interesting results, for example:
    _ that looping down the 20 rows in 20 separate long single row Finds was slightly quicker than a single Find for a 20 row range
    _ there is little difference using Find with arguments xlByRows or xlByColumns for the 20 row single range. ( This despite that, as expected , doing a Match of Find for the single column 20 rows range in a loop for the many columns is very slow. – I was expecting therefore an increased time for xlByColumns ? ( I also find that if I had multiple cells as a valid to be found, then the argument of xlByRows or xlByColumns makes no difference and the cell found first is that as if a search was done across the first row, then next row down.. etc.. ). I wonder if that is a bug.. ?? )
    _ I note a strange slowness of WorksheetsFunction.Match compared to Application.Match for XL 2007 ( but not for XL 2010 )

    In any case the Match across a long row was by far the best to the point of , on my somewhat older computers and XL versions, it being the only practical way for a close to instant merge of one row from one workbook to the other.

    I note the point Jeff Weir said a few posts back…. – I am also doing a comparison of Find in a situation to compare exactly with Match. I would expect Match has been optimised to do this limited version of what Find can do.
    So it could be said that the comparison I did was of limited worth……. but…
    BUT, I am glad I did: Here is a very important point: Unless you have a fairly fast 64 Bit machine with a recent XL version, you can forget doing any such Match operation of this sort for more than a handful of loops. Excel has a Weird Memory of memory problem for Range objects used in Match and that soon freezes up Excel , and if you are unlucky , your whole computer freezes and either an Excel restart or pulling the plug for a full computer reboot is necessary to recover the situation
    Find does not appear to have these limitations.

    Alan

  18. jeffrey Weir says:

    ” To allow for merging, each workbook has 20 Heading rows allowing for the different wording of the same basic column heading.”

    Yikes! Does the user fill these in? Or are they pre-assigned? Do you know what country and spelling would be required for a particular user ahead of time? Are the columns in the same order in each template with just the headings being different? Or can anything be anywhere?

    Reason I ask is that if I had control over the templates, I’d have just one column header, and do a lookup on that against a translation table to confirm “yes, this is the column that denotes x”. And that’s assuming that columns can appear in any order. If columns are in the same order in each template, no lookup required of course. But I assume that isn’t the case, and that you have no control over the templates.

  19. alanelston says:

    Hi Jeff
    Thanks for taking an interest.
    I am not a computer professional so I expect I am making a lot of mistakes. I am just learning Excel VBA for a couple of years to speed up a personal project. This involves keeping a record and analysing diets where just about every conceivable constituent nutrient from food products is taken into account. I expect I have now the most extensive nutritional list anywhere on this Earth. I collect data from most published and not published tables some of which themselves are massive.
    Basically as long as I get a table with Food products listed down vertically and the various Nutritional values in columns with the headings along the top then I can merge with my codes to my main list.
    For example the very simplest would be a PDF or text file from the Table you get from McDonalds.
    At the other extreme I have Government lists that break down the Products into thousands of different chemical, mineral stuff , “Phyto” bits and God knows what other stuff.
    I invent my own magical mixtures as well that is all included.
    I just need to keep track of what my patient ( my wife ) is eating in the finest detail.

    So I have no control, the headings can be any order, and can change occasionally from the same source. So it is convenient to just add the heading used when I collect any new tables. I am restricting myself for now to English and German and up until now 20 rows is sufficient. I tend to go back and forth merging different tables together as I try to get an ultimate list, so at this stage the multiple headings is convenient. Hopefully I will finally have so much data I can keep single sane definitive list, even if it is very large.
    But I also scrape information from internet sites. I was really impressed how quick that can be so my final data in my Diet protocol also calls up real time information as I use it. I never dreamed initially that could be an alternative. That of course also leads to different headings. Or I have codes that just recognise a Nutritional value, Nutritional name And the Units for a food product from any source, then it presents to me effectively a two column list based on normalised units which again I merge into my main File ( or messy FileS as they are still currently ! )

    I was quite impressed with the speed improvement I managed using .Match in a version of my merge codes. Then I was really naffed off when I hit this brick wall due to a “Thrashingly Weird Memory of memory leakage Bug” as I call it ( without really knowing what I am talking about.. lol ).
    If you have a very new fast machine and Excel version you are probably blissfully unaware of this problem. If you use older 32 Bit systems and Excel version then you can virtually forget .Match in any code that uses it more than a few times on a fairly long ( or deep ) second argument range/ array

    It is a really mental problem, It kicks in as soon as you use more than one Range object in any .Match. It does not then make a big difference if you use more than that. After you use a second one you are snookered

    I could drop off a few links to the ExcelForum Posts where I tend to develop my codes and you will soon get the point.. But I am not sure if that is allowed to post links in these Blog sites..
    I expect if you saw some of the Posts your face would look even more astonished as it does already in your Avatar.. lol.. It may not be advisable, I really did my head in on this one, and I don’t look forward myself to reading some of the Epic Posts I did on that and related problems.. 🙂
    ( I am Doc.AElstein AT most Places, DocAElstein at a few others, but I don‘t do any of that twittery and facebook stuff… )
    Alan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s