It is a rainy day here in Norfolk so …
Prompted by a claim that searching using a variant array was much faster than calling MATCH from VBA I thought it was time for a detailed performance analysis.
I shall try to discover when it is better to use the array method rather than MATCH and whether the performance penalty of a wrong choice is significant.
An earlier post match-vs-find-vs-variant-array-vba-performance-shootout looked at doing a slightly more complex search. This time I will use a simpler test case and concentrate on determining the influence of data size, the length of the search within the data and the overhead of each method.
Test Setup
The test setup is very simple: column A1:A100000 contains a sequence of numbers from 1 to 100000.
Each different method is tested using a variety of sizes of rows and a variety of search depth percentages.
So for example 25% search depth of 400 rows means that the search will look 100 rows deep in the 400.
I am testing with Excel 2010 32-bit mainly (with some Excel 2013 32-bit comparisons) under Windows 7 Pro 64 bit.
My PC is a Dell Latitude E6530 with 8GB memory and a quad core i7-3720QM 2.60 GHZ. However since all the tests are VBA based only one core will be used.
Methods Tested
I am testing 6 different methods.
- Linear search using a For loop on a variant array created from the resized range in Column A
- Linear search using WorksheetFunction.MATCH with the unsorted option directly on a range created from Column A.
- Linear search using Application.MATCH with the unsorted option directly on a range created from Column A.
- Linear search using WorksheetFunction.MATCH with the unsorted option on a variant array created from the resized range in Column A
- Binary search using WorksheetFunction.MATCH with the sorted option directly on a range created from Column A.
- Cell by Cell loop directly on a range created from Column A.
The VBA Code
The VBA code is designed as a main subroutine returning a variant array of results to a driver sub.
Each search method is embedded in 3 loops:
- Loop on Range Size (Number of Rows)
- Loop on % Search Depth (How far to traverse within the range)
- Loop on Trails (each test is timed 5 times and the median of the 5 times is used)
Timing is done using the MicroTimer Windows high-resolution timer.
Sub DoSearchTests()
Dim j As Long
Dim vResults() As Variant
'
' call each method in turn
'
For j = 1 To 6
SearchTimer vResults(), j
Worksheets("Sheet1").Range("E4").Offset((j - 1) * 14, 0).Resize(UBound(vResults), UBound(vResults, 2)) = vResults
Next j
End Sub
Sub SearchTimer(vResults() As Variant, jMethod As Long)
'
' 6 Search Methods:
'
' 1 Linear search of variant array
' 2 Linear WorksheetFunction.MATCH on Range
' 3 Linear Application.MATCH on Range
' 4 Linear WorkSheetFunction.Match on Array
' 5 Binary Search Match on Range
' 6 Cell by Cell search
'
Dim vArr As Variant
Dim j As Long
Dim i As Long
Dim jF As Long
Dim jRow As Long
Dim jTrial As Long
Dim dTime As Double
Dim NumRows As Variant
Dim SearchDepth As Variant
Dim vTrials() As Variant
Dim rng As Range
Dim SearchVal As Variant
''' initialise
NumRows = Names("nRows").RefersToRange.Value2
SearchDepth = Names("SearchDepth").RefersToRange.Value2
ReDim vResults(LBound(NumRows) To UBound(NumRows), LBound(SearchDepth, 2) To UBound(SearchDepth, 2))
dTime = MicroTimer
With Worksheets("Sheet1")
''' loop on number of rows
For i = LBound(NumRows) To UBound(NumRows)
''' loop on % search depth
For jF = LBound(SearchDepth, 2) To UBound(SearchDepth, 2)
''' derive search value as a % of the number of rows
SearchVal = Int(SearchDepth(1, jF) * NumRows(i, 1))
If SearchVal < 1 Then SearchVal = 1
''' find the median time of 5 trials
ReDim vTrials(1 To 5)
For jTrial = 1 To 5
''' timing loop
If jMethod = 1 Then
''' get array and loop for search
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = SearchVal Then
jRow = j
Exit For
End If
Next j
dTime = MicroTimer - dTime
ElseIf jMethod = 2 Then
''' use linear WorksheetFunction.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 3 Then
''' use linear Application.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = Application.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 4 Then
''' use linear WorksheetFunction.Match on an array from the range
dTime = 0#
If NumRows(i, 1) <= 65536 Then
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
jRow = WorksheetFunction.Match(SearchVal, vArr, 0)
dTime = MicroTimer - dTime
End If
ElseIf jMethod = 5 Then
''' use binary search Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 1)
dTime = MicroTimer - dTime
ElseIf jMethod = 6 Then
''' get cell value and loop
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
For Each vArr In rng
If vArr = SearchVal Then
jRow = j
Exit For
End If
Next vArr
dTime = MicroTimer - dTime
End If
''' store timings for trials
vTrials(jTrial) = dTime * 1000000
Next jTrial
''' get median of the trials
vResults(i, jF) = WorksheetFunction.Median(vTrials)
Next jF
Next i
End With
End Sub
Test Results
All timings are in Microseconds (millionths of a second).
Looping the Variant Array from a Range (Method 1)
The timings for this method show:
- An overhead of 10 Microseconds per call
- The 0% column is a good approximation to the time taken to read the range into a variant array. This increases with the size of the range.
- Search time could be calculated by subtracting the 0% column from the other columns and as expected increases with the number of cells being traversed.
Using WorksheetFunction.Match on Ranges (Method 2)
The timings for this method show:
- An overhead of 13 Microseconds per call (larger than the array method)
- The 0% column is constant because no data is transferred from Excel to VBA..
- Search time less overhead increases with the number of cells being traversed but is lower than the array method.
Using Application.Match on Ranges (Method 3)
I added this method to compare using Application.MATCH with WorksheetFunction.MATCH.
The timings for this method show:
- An overhead of 16 Microseconds per call (larger than the WorkSheetFunction method)
- The 0% column is constant because no time is taken to transfer the data from Excel to VBA
- Search time less overhead increases with the number of cells being traversed and is very similar to the WorkSheetFunction method.
Using WorksheetFunction.Match on an array derived from a Range (Method 4)
The timings for this method show:
- An overhead of 15 Microseconds per call (larger than both the array and Match methods)
- The 0% column increases sharply with data size and is larger than Method 1 because the data is transferred from Excel to VBA and back to the MATCH function.
- Search time less overhead increases with the number of cells being traversed but is lower than the array method.
- The 100000 row is missing because this method only allows a maximum of 65536 rows before the MATCH function fails.
Using the Binary Search (sorted) option of WorksheetFunction.Match on Ranges (Method 5)
WOW!!! That is seriously fast.
- The overhead is comparable to Linear Search WorksheetFunction.MATCH (Method 2)
- The 0% column is constant because no data is transferred from Excel to VBA.
- Search time is close to zero even over 100000 rows because of the efficiency of the Binary Search method being used.
- The data has to be sorted for this to work
- If there is missing data it would need to be handles with the double VLOOKUP/MATCH trick.
Using Cell by Cell search (Method 6)
The timings for this method show:
- The overhead is comparable to Method 1 but the 0% column does not increase with increasing data size because only the minimum amount of data is being transferred from Excel to VBA.
- For large amounts of data this method is extremely inefficient, but for small (20-30 roes) volumes you will not usually notice it.
Breakeven between the Array Method (1) and the WorksheetFunction.MATCH method (2)
This table shows the ratio of method 1 timings to method timings, so anything >1 shows that method 2 is faster.
- For large amounts of data MATCH is significantly faster than the array method.
- The breakeven point for Excel 2010 32-bit is around 40 rows.
- The traversing search speed of MATCH is faster than the array method and data does not need to be transferred from Excel to VBA.
- But the overhead of the Array method is lower than that of Worksheetfunction.Match
Comparing Excel 2013 with Excel 2010
With Excel 2013 the breakeven point is slightly more in favour of MATCH than with Excel 2010.
Excel 2013 Method 1 (Variant array search)
Excel 2013 Method 2 (WorksheetFunction.Match)
Conclusions
- Both the Array method and WorksheetFunction method are fast.
- When you just want to do a search the MATCH method is the way to go.
- When you want to do a search combined with processing the data the Array method is often more efficient. (See match-vs-find-vs-variant-array-vba-performance-shootout )
- For small amounts of data either method is good because the timing differences will not be noticeable unless you are iterating the method thousands of times.
- There does not seem to be a major difference in these tests between Excel 2010 and Excel 2013.