It is a rainy day here in Norfolk so …
Prompted by a claim that searching using a variant array was much faster than calling MATCH from VBA I thought it was time for a detailed performance analysis.
I shall try to discover when it is better to use the array method rather than MATCH and whether the performance penalty of a wrong choice is significant.
An earlier post match-vs-find-vs-variant-array-vba-performance-shootout looked at doing a slightly more complex search. This time I will use a simpler test case and concentrate on determining the influence of data size, the length of the search within the data and the overhead of each method.
Test Setup
The test setup is very simple: column A1:A100000 contains a sequence of numbers from 1 to 100000.
Each different method is tested using a variety of sizes of rows and a variety of search depth percentages.
So for example 25% search depth of 400 rows means that the search will look 100 rows deep in the 400.
I am testing with Excel 2010 32-bit mainly (with some Excel 2013 32-bit comparisons) under Windows 7 Pro 64 bit.
My PC is a Dell Latitude E6530 with 8GB memory and a quad core i7-3720QM 2.60 GHZ. However since all the tests are VBA based only one core will be used.
Methods Tested
I am testing 6 different methods.
- Linear search using a For loop on a variant array created from the resized range in Column A
- Linear search using WorksheetFunction.MATCH with the unsorted option directly on a range created from Column A.
- Linear search using Application.MATCH with the unsorted option directly on a range created from Column A.
- Linear search using WorksheetFunction.MATCH with the unsorted option on a variant array created from the resized range in Column A
- Binary search using WorksheetFunction.MATCH with the sorted option directly on a range created from Column A.
- Cell by Cell loop directly on a range created from Column A.
The VBA Code
The VBA code is designed as a main subroutine returning a variant array of results to a driver sub.
Each search method is embedded in 3 loops:
- Loop on Range Size (Number of Rows)
- Loop on % Search Depth (How far to traverse within the range)
- Loop on Trails (each test is timed 5 times and the median of the 5 times is used)
- Loop on % Search Depth (How far to traverse within the range)
Timing is done using the MicroTimer Windows high-resolution timer.
Sub DoSearchTests() Dim j As Long Dim vResults() As Variant ' ' call each method in turn ' For j = 1 To 6 SearchTimer vResults(), j Worksheets("Sheet1").Range("E4").Offset((j - 1) * 14, 0).Resize(UBound(vResults), UBound(vResults, 2)) = vResults Next j End Sub Sub SearchTimer(vResults() As Variant, jMethod As Long) ' ' 6 Search Methods: ' ' 1 Linear search of variant array ' 2 Linear WorksheetFunction.MATCH on Range ' 3 Linear Application.MATCH on Range ' 4 Linear WorkSheetFunction.Match on Array ' 5 Binary Search Match on Range ' 6 Cell by Cell search ' Dim vArr As Variant Dim j As Long Dim i As Long Dim jF As Long Dim jRow As Long Dim jTrial As Long Dim dTime As Double Dim NumRows As Variant Dim SearchDepth As Variant Dim vTrials() As Variant Dim rng As Range Dim SearchVal As Variant ''' initialise NumRows = Names("nRows").RefersToRange.Value2 SearchDepth = Names("SearchDepth").RefersToRange.Value2 ReDim vResults(LBound(NumRows) To UBound(NumRows), LBound(SearchDepth, 2) To UBound(SearchDepth, 2)) dTime = MicroTimer With Worksheets("Sheet1") ''' loop on number of rows For i = LBound(NumRows) To UBound(NumRows) ''' loop on % search depth For jF = LBound(SearchDepth, 2) To UBound(SearchDepth, 2) ''' derive search value as a % of the number of rows SearchVal = Int(SearchDepth(1, jF) * NumRows(i, 1)) If SearchVal < 1 Then SearchVal = 1 ''' find the median time of 5 trials ReDim vTrials(1 To 5) For jTrial = 1 To 5 ''' timing loop If jMethod = 1 Then ''' get array and loop for search dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) vArr = rng.Value2 For j = LBound(vArr) To UBound(vArr) If vArr(j, 1) = SearchVal Then jRow = j Exit For End If Next j dTime = MicroTimer - dTime ElseIf jMethod = 2 Then ''' use linear WorksheetFunction.Match on the range dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) jRow = WorksheetFunction.Match(SearchVal, rng, 0) dTime = MicroTimer - dTime ElseIf jMethod = 3 Then ''' use linear Application.Match on the range dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) jRow = Application.Match(SearchVal, rng, 0) dTime = MicroTimer - dTime ElseIf jMethod = 4 Then ''' use linear WorksheetFunction.Match on an array from the range dTime = 0# If NumRows(i, 1) <= 65536 Then dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) vArr = rng.Value2 jRow = WorksheetFunction.Match(SearchVal, vArr, 0) dTime = MicroTimer - dTime End If ElseIf jMethod = 5 Then ''' use binary search Match on the range dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) jRow = WorksheetFunction.Match(SearchVal, rng, 1) dTime = MicroTimer - dTime ElseIf jMethod = 6 Then ''' get cell value and loop dTime = MicroTimer Set rng = .Range("a1:A" & CStr(NumRows(i, 1))) For Each vArr In rng If vArr = SearchVal Then jRow = j Exit For End If Next vArr dTime = MicroTimer - dTime End If ''' store timings for trials vTrials(jTrial) = dTime * 1000000 Next jTrial ''' get median of the trials vResults(i, jF) = WorksheetFunction.Median(vTrials) Next jF Next i End With End Sub
Test Results
All timings are in Microseconds (millionths of a second).
Looping the Variant Array from a Range (Method 1)
The timings for this method show:
- An overhead of 10 Microseconds per call
- The 0% column is a good approximation to the time taken to read the range into a variant array. This increases with the size of the range.
- Search time could be calculated by subtracting the 0% column from the other columns and as expected increases with the number of cells being traversed.
Using WorksheetFunction.Match on Ranges (Method 2)
The timings for this method show:
- An overhead of 13 Microseconds per call (larger than the array method)
- The 0% column is constant because no data is transferred from Excel to VBA..
- Search time less overhead increases with the number of cells being traversed but is lower than the array method.
Using Application.Match on Ranges (Method 3)
I added this method to compare using Application.MATCH with WorksheetFunction.MATCH.
The timings for this method show:
- An overhead of 16 Microseconds per call (larger than the WorkSheetFunction method)
- The 0% column is constant because no time is taken to transfer the data from Excel to VBA
- Search time less overhead increases with the number of cells being traversed and is very similar to the WorkSheetFunction method.
Using WorksheetFunction.Match on an array derived from a Range (Method 4)
The timings for this method show:
- An overhead of 15 Microseconds per call (larger than both the array and Match methods)
- The 0% column increases sharply with data size and is larger than Method 1 because the data is transferred from Excel to VBA and back to the MATCH function.
- Search time less overhead increases with the number of cells being traversed but is lower than the array method.
- The 100000 row is missing because this method only allows a maximum of 65536 rows before the MATCH function fails.
Using the Binary Search (sorted) option of WorksheetFunction.Match on Ranges (Method 5)
WOW!!! That is seriously fast.
- The overhead is comparable to Linear Search WorksheetFunction.MATCH (Method 2)
- The 0% column is constant because no data is transferred from Excel to VBA.
- Search time is close to zero even over 100000 rows because of the efficiency of the Binary Search method being used.
- The data has to be sorted for this to work
- If there is missing data it would need to be handles with the double VLOOKUP/MATCH trick.
Using Cell by Cell search (Method 6)
The timings for this method show:
- The overhead is comparable to Method 1 but the 0% column does not increase with increasing data size because only the minimum amount of data is being transferred from Excel to VBA.
- For large amounts of data this method is extremely inefficient, but for small (20-30 roes) volumes you will not usually notice it.
Breakeven between the Array Method (1) and the WorksheetFunction.MATCH method (2)
This table shows the ratio of method 1 timings to method timings, so anything >1 shows that method 2 is faster.
- For large amounts of data MATCH is significantly faster than the array method.
- The breakeven point for Excel 2010 32-bit is around 40 rows.
- The traversing search speed of MATCH is faster than the array method and data does not need to be transferred from Excel to VBA.
- But the overhead of the Array method is lower than that of Worksheetfunction.Match
Comparing Excel 2013 with Excel 2010
With Excel 2013 the breakeven point is slightly more in favour of MATCH than with Excel 2010.
Excel 2013 Method 1 (Variant array search)
Excel 2013 Method 2 (WorksheetFunction.Match)
Conclusions
- Both the Array method and WorksheetFunction method are fast.
- When you just want to do a search the MATCH method is the way to go.
- When you want to do a search combined with processing the data the Array method is often more efficient. (See match-vs-find-vs-variant-array-vba-performance-shootout )
- For small amounts of data either method is good because the timing differences will not be noticeable unless you are iterating the method thousands of times.
- There does not seem to be a major difference in these tests between Excel 2010 and Excel 2013.
I agree with your point that there does not seem to be difference between 2010 and 2013. but there is a huge difference compared to 2003.
I have a query, typically >20,000 records. I open another workbook, use find to loop through the 20k+ records and add some data. it actually does 2 finds per record, one find is in a local worksheet.
here are the results, and I wish I could find a way to speed it up in 2010+:
excel 2003 18 seconds
excel 2010 47 seconds
excel 2013 47 seconds
that’s significant. all tests were run 3 times in each version.
I have no idea why a few operation in 2007+ are so much slower, but they are.
I did a comparison of VBA read/write speed between versions her https://fastexcel.wordpress.com/2011/06/13/vba-readwrite-speed-formula-benchmarking-excel-versions/ The major difference is a higher overhead per call so getting all the data into a variant array in one step is the way to go.
thanks for the idea, Charles. I commented out all of the cell writes and still took 38 seconds. so, I put all of the values I search for in an array, instead of going cell by cell, and it still took 38 seconds.
commenting the cell writes in excel 2003 yields a better result, 15 seconds rather than 18.
then adding the values to find in an array was even faster, 13 seconds vs 15 seconds.
I think excel’s find operation is slower in newer versions of excel.
Find is usually much slower than getting the data into a variant array and looping the array:
see https://fastexcel.wordpress.com/2011/10/26/match-vs-find-vs-variant-array-vba-performance-shootout/
ok, I used some arrays, application.worksheetfunction.match() to find elements in the arrays to build new arrays, and wrote the final arrays all at once, instead of cell by cell and now the routine takes about 4 or 5 seconds.
hopefully the data is correct (lol), i’ll have to look it over more closely.
thanks for your help
Hi,
I was wondering after looking at your results how to proceed to do my string comparisons. I have one worksheet1 with about 60,000 strings (str1) of approx length 400 characters each stuffed in column A. In worksheet2 I have about 100 strings (str2) of about 20-40 characters. I want to find each occurrence of str2 found in str1 then copy str1 to worksheet3. If there are multiple occurrences of str2 in str1 then break up str1 into separate strings. For what it is worth, my data of 60,000 strings are sorted but the sorted value is located after str2. Hope that is clear.
Thks,
Rob
That sounds tough: I guess I would try a BFMI approach first (Brute Force Massive Ignorance). Get the 60000 strings into one variant array and the 100 strings into another. Then loop to compare using INSTR and store row numbers when found.
If that is way too slow you would have to break each row of str1 into separate words and store in an array/collection with a row index, then look for hits using str2.
Thank you for that test, it proved to be very useful in dealing with large arrays of data!