VBA searching shootout between WorksheetFunction.Match: Performance Pros and Cons

It is a rainy day here in Norfolk so …

Prompted by a claim that searching using a variant array was much faster than calling MATCH from VBA I thought it was time for a detailed performance analysis.
I shall try to discover when it is better to use the array method rather than MATCH and whether the performance penalty of a wrong choice is significant.

An earlier post match-vs-find-vs-variant-array-vba-performance-shootout looked at doing a slightly more complex search. This time I will use a simpler test case and concentrate on determining the influence of data size, the length of the search within the data and the overhead of each method.

Test Setup

The test setup is very simple: column A1:A100000 contains a sequence of numbers from 1 to 100000.
Each different method is tested using a variety of sizes of rows and a variety of search depth percentages.

Match_Array1So for example 25% search depth of 400 rows means that the search will look 100 rows deep in the 400.

I am testing with Excel 2010 32-bit mainly (with some Excel 2013 32-bit comparisons) under Windows 7 Pro 64 bit.
My PC is a Dell Latitude E6530 with 8GB memory and a quad core i7-3720QM 2.60 GHZ. However since all the tests are VBA based only one core will be used.

Methods Tested

I am testing 6 different methods.

  1. Linear search using a For loop on a variant array created from the resized range in Column A
  2. Linear search using WorksheetFunction.MATCH with the unsorted option directly on a range created from Column A.
  3. Linear search using Application.MATCH with the unsorted option directly on a range created from Column A.
  4. Linear search using WorksheetFunction.MATCH with the unsorted option on a variant array created from the resized range in Column A
  5. Binary search using WorksheetFunction.MATCH with the sorted option directly on a range created from Column A.
  6. Cell by Cell loop directly on a range created from Column A.

The VBA Code

The VBA code is designed as a main subroutine returning a variant array of results to a driver sub.

Each search method is embedded in 3 loops:

  • Loop on Range Size (Number of Rows)
    • Loop on % Search Depth (How far to traverse within the range)
      • Loop on Trails (each test is timed 5 times and the median of the 5 times is used)

Timing is done using the MicroTimer Windows high-resolution timer.


Sub DoSearchTests()
Dim j As Long
Dim vResults() As Variant
'
' call each method in turn
'
For j = 1 To 6
SearchTimer vResults(), j
Worksheets("Sheet1").Range("E4").Offset((j - 1) * 14, 0).Resize(UBound(vResults), UBound(vResults, 2)) = vResults
Next j
End Sub
Sub SearchTimer(vResults() As Variant, jMethod As Long)
'
' 6 Search Methods:
'
' 1 Linear search of variant array
' 2 Linear WorksheetFunction.MATCH on Range
' 3 Linear Application.MATCH on Range
' 4 Linear WorkSheetFunction.Match on Array
' 5 Binary Search Match on Range
' 6 Cell by Cell search
'
Dim vArr As Variant
Dim j As Long
Dim i As Long
Dim jF As Long
Dim jRow As Long
Dim jTrial As Long
Dim dTime As Double
Dim NumRows As Variant
Dim SearchDepth As Variant
Dim vTrials() As Variant
Dim rng As Range
Dim SearchVal As Variant
''' initialise
NumRows = Names("nRows").RefersToRange.Value2
SearchDepth = Names("SearchDepth").RefersToRange.Value2
ReDim vResults(LBound(NumRows) To UBound(NumRows), LBound(SearchDepth, 2) To UBound(SearchDepth, 2))
dTime = MicroTimer
With Worksheets("Sheet1")
''' loop on number of rows
For i = LBound(NumRows) To UBound(NumRows)
''' loop on % search depth
For jF = LBound(SearchDepth, 2) To UBound(SearchDepth, 2)
''' derive search value as a % of the number of rows
SearchVal = Int(SearchDepth(1, jF) * NumRows(i, 1))
If SearchVal < 1 Then SearchVal = 1
''' find the median time of 5 trials
ReDim vTrials(1 To 5)
For jTrial = 1 To 5
''' timing loop
If jMethod = 1 Then
''' get array and loop for search
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = SearchVal Then
jRow = j
Exit For
End If
Next j
dTime = MicroTimer - dTime
ElseIf jMethod = 2 Then
''' use linear WorksheetFunction.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 3 Then
''' use linear Application.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = Application.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 4 Then
''' use linear WorksheetFunction.Match on an array from the range
dTime = 0#
If NumRows(i, 1) <= 65536 Then
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
jRow = WorksheetFunction.Match(SearchVal, vArr, 0)
dTime = MicroTimer - dTime
End If
ElseIf jMethod = 5 Then
''' use binary search Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 1)
dTime = MicroTimer - dTime
ElseIf jMethod = 6 Then
''' get cell value and loop
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
For Each vArr In rng
If vArr = SearchVal Then
jRow = j
Exit For
End If
Next vArr
dTime = MicroTimer - dTime
End If
''' store timings for trials
vTrials(jTrial) = dTime * 1000000
Next jTrial
''' get median of the trials
vResults(i, jF) = WorksheetFunction.Median(vTrials)
Next jF
Next i
End With
End Sub

 Test Results

All timings are in Microseconds (millionths of a second).

Looping the Variant Array from a Range (Method 1)

XL2010_Method1The timings for this method show:

  • An overhead of 10 Microseconds per call
  • The 0% column is a good approximation to the time taken to read the range into a variant array. This increases with the size of the range.
  • Search time could be calculated by subtracting the 0% column from the other columns and as expected increases with the number of cells  being traversed.

Using WorksheetFunction.Match on Ranges (Method 2)

XL2010_Method2

The timings for this method show:

  • An overhead of 13 Microseconds per call (larger than the array method)
  • The 0% column is constant because no data is transferred from Excel to VBA..
  • Search time less overhead increases with the number of cells  being traversed but is lower than the array method.

Using Application.Match on Ranges (Method 3)

XL2010_Method3

I added this method to compare using Application.MATCH with WorksheetFunction.MATCH.
The timings for this method show:

  • An overhead of 16 Microseconds per call (larger than the WorkSheetFunction method)
  • The 0% column is constant because no time is taken to transfer the data from Excel to VBA
  • Search time less overhead increases with the number of cells  being traversed and is very similar to the WorkSheetFunction method.

Using WorksheetFunction.Match on an array derived from a Range (Method 4)

XL2010_Method4

The timings for this method show:

  • An overhead of 15 Microseconds per call (larger than both the array and Match methods)
  • The 0% column increases sharply with data size and is larger than Method 1 because the data is transferred from Excel to VBA and back to the MATCH function.
  • Search time less overhead increases with the number of cells  being traversed but is lower than the array method.
  • The 100000 row is missing because this method only allows a maximum of 65536 rows before the MATCH function  fails.

Using the Binary Search (sorted) option of WorksheetFunction.Match on Ranges (Method 5)

XL2010_Method5WOW!!! That is seriously fast.

  • The overhead is comparable to Linear Search WorksheetFunction.MATCH (Method 2)
  • The 0% column is constant because no data is transferred from Excel to VBA.
  • Search time is close to zero even over 100000 rows because of the efficiency of the Binary Search method being used.
  • The data has to be sorted for this to work
  • If there is missing data it would need to be handles with the double VLOOKUP/MATCH trick.

Using Cell by Cell search (Method 6)

XL2010_Method6The timings for this method show:

  • The overhead is comparable to Method 1 but the 0% column does not increase with increasing data size because only the minimum amount of data is being transferred from Excel to VBA.
  • For large amounts of data this method is extremely inefficient, but for small (20-30 roes) volumes you will not usually notice it.

Breakeven between the Array Method (1) and the WorksheetFunction.MATCH method (2)

Excel2010Compare

This table shows the ratio of method 1 timings to method timings, so anything >1 shows that method 2 is faster.

  • For large amounts of data MATCH is significantly faster than the array method.
  • The breakeven point for Excel 2010 32-bit is around 40 rows.
  • The traversing search speed of MATCH is faster than the array method and data does not need to be transferred from Excel to VBA.
  • But the overhead of the Array method is lower than that of Worksheetfunction.Match

Comparing Excel 2013 with Excel 2010

Excel2013CompareWith Excel 2013 the breakeven point is slightly more in favour of MATCH than with Excel 2010.

Excel 2013 Method 1 (Variant array search)

XL2013_Method1

Excel 2013 Method 2 (WorksheetFunction.Match)

XL2013_Method2

 Conclusions

  •  Both the Array method and WorksheetFunction method are fast.
  • When you just want to do a search the MATCH method is the way to go.
  • When you want to do a search combined with processing the data the Array method is often more efficient. (See match-vs-find-vs-variant-array-vba-performance-shootout  )
  • For small amounts of data either method is good because the timing differences will not be noticeable unless you are iterating the method thousands of times.
  • There does not seem to be a major difference in these tests between Excel 2010 and Excel 2013.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

7 Responses to VBA searching shootout between WorksheetFunction.Match: Performance Pros and Cons

  1. Gary says:

    I agree with your point that there does not seem to be difference between 2010 and 2013. but there is a huge difference compared to 2003.
    I have a query, typically >20,000 records. I open another workbook, use find to loop through the 20k+ records and add some data. it actually does 2 finds per record, one find is in a local worksheet.
    here are the results, and I wish I could find a way to speed it up in 2010+:
    excel 2003 18 seconds
    excel 2010 47 seconds
    excel 2013 47 seconds
    that’s significant. all tests were run 3 times in each version.
    I have no idea why a few operation in 2007+ are so much slower, but they are.

  2. fastexcel says:

    I did a comparison of VBA read/write speed between versions her https://fastexcel.wordpress.com/2011/06/13/vba-readwrite-speed-formula-benchmarking-excel-versions/ The major difference is a higher overhead per call so getting all the data into a variant array in one step is the way to go.

  3. Gary says:

    thanks for the idea, Charles. I commented out all of the cell writes and still took 38 seconds. so, I put all of the values I search for in an array, instead of going cell by cell, and it still took 38 seconds.

    commenting the cell writes in excel 2003 yields a better result, 15 seconds rather than 18.
    then adding the values to find in an array was even faster, 13 seconds vs 15 seconds.

    I think excel’s find operation is slower in newer versions of excel.

  4. Rob says:

    Hi,
    I was wondering after looking at your results how to proceed to do my string comparisons. I have one worksheet1 with about 60,000 strings (str1) of approx length 400 characters each stuffed in column A. In worksheet2 I have about 100 strings (str2) of about 20-40 characters. I want to find each occurrence of str2 found in str1 then copy str1 to worksheet3. If there are multiple occurrences of str2 in str1 then break up str1 into separate strings. For what it is worth, my data of 60,000 strings are sorted but the sorted value is located after str2. Hope that is clear.
    Thks,
    Rob

    • fastexcel says:

      That sounds tough: I guess I would try a BFMI approach first (Brute Force Massive Ignorance). Get the 60000 strings into one variant array and the 100 strings into another. Then loop to compare using INSTR and store row numbers when found.
      If that is way too slow you would have to break each row of str1 into separate words and store in an array/collection with a row index, then look for hits using str2.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s