In this post I will look at 2 Binary Search UDFs. Both UDFs are the equivalent of MATCH with sorted ascending data, except that when there are multiple identical values that could be the answer:
- MATCH returns the LAST of the identical values
- BSearchFirstV and BSearchFirstR return the FIRST of the identical values
Binary search works by looking at the midpoint of the range of lookup values and comparing it to the value being looked up.
If the lookup value is less than the midpoint then the upper half of the range is discarded and Binary search looks at the midpoint of the lower half of the range. So on each successive iteration half the data gets discarded. This makes the Binary Search algorithm extremely efficient with anything more than 10-20 rows of data.
To find the first of multiple sorted matches we want to find the consecutive pair of values where the first is less than the lookupvalue and the second is >= the lookupvalue.
When the pair has been found we have to check whether its the first value we want or the second. And then we have to handle the boundary conditions:
- The lookup value is less than the first value in the lookup table
- The lookup value is equal to the first value in the lookup table
- The lookup value is greater then the last value in the lookup table
The algorithm is very simple and concise, but surprisingly difficult to program correctly. Many variations exist and its often useful to create your own variation for particular circumstances.
I have coded 2 variations: (these are revised versions – the first versions I posted had bugs!)
BSearchFirstR only works when you pass the LookupTable as a Range. BSearchFirstV is designed to work withe the LookupTable being passed as a 2-Dimensional 1 column N rows variant array (although if you give it a Range it converts it to a variant array anyway).
Here is the code:
Option Explicit Option Base 1 Option Compare Text ' ' COPYRIGHT © DECISION MODELS LIMITED 2006, 2007, 2011. All rights reserved ' May be redistributed for free but ' may not be sold without the author's explicit permission. ' Public Function BSearchFirstR(LookupValue As Variant, LookupArray As Range) As Variant ' ' find the position of the largest value in LookupArray <= LookupValue ' LookupArray must be a Range ' LookupArray must be sorted ascending ' if multiple LookupValues exits in LookupArray, the position of the first will be returned ' Dim low As Long Dim high As Long Dim middle As Long Dim varMiddle As Variant BSearchFirstR = CVErr(xlErrNA) With LookupArray low = 0 high = .Rows.Count Do While high - low > 1 middle = (low + high) \ 2 varMiddle = .Cells(middle, 1).Value2 If varMiddle >= LookupValue Then high = middle Else low = middle End If Loop If (low > 0 And high <= .Rows.Count) Then If .Cells(high, 1) > LookupValue Then BSearchFirstR = low Else BSearchFirstR = high End If ElseIf low = 0 Then If .Cells(1, 1).Value2 <= LookupValue Then BSearchFirstR = 1 End If End If End With End Function ' ' ' Public Function BSearchFirstV(LookupValue As Variant, LookupArray As Variant) As Variant ' ' find the position of the largest value in LookupArray <= LookupValue ' LookupArray must be a Range or a 2-dimensional variant array of N rows and 1 column ' LookupArray must be sorted ascending ' if multiple LookupValues exits in LookupArray, the position of the first will be returned ' Dim low As Long Dim high As Long Dim middle As Long Dim varMiddle As Variant BSearchFirstV = CVErr(xlErrNA) If TypeName(LookupArray) = "Range" Then LookupArray = LookupArray.Value2 low = 0 high = UBound(LookupArray) Do While high - low > 1 middle = (low + high) \ 2 varMiddle = LookupArray(middle, 1) If varMiddle >= LookupValue Then high = middle Else low = middle End If Loop If (low > 0 And high <= UBound(LookupArray)) Then If LookupArray(high, 1) > LookupValue Then BSearchFirstV = low Else BSearchFirstV = high End If ElseIf low = 0 Then If LookupArray(1, 1).Value2 <= LookupValue Then BSearchFirstV = 1 End If End If End Function
- Most error checking has been omitted to simplify the code
- There is no checking for uncalculated cells
- There is no allowance for whole-column references or ranges containing empty cells.
- The UDFs assume the data is sorted ascending
- The UDFs do not allow for wild-card text matching
- BSearchFirst V is faster than BSearchFirstR when the Lookup Table contains less than about 500 rows.
- For large ranges BSearchFirstR is considerable faster because the whole of the Lookup Table data is not transferred to VBA.
- Both variations are slower than using approximate MATCH, but are faster than using Linear Search (unsorted) MATCH to find the first exact match for large ranges, and are faster than Array formulae for finding the first of multiple matches
Using a VBA coded Binary Search routine will not improve the speed of the FLOOKUP UDF.
However when you want a variation of MATCH that is not in the built-in Excel MATCH VBA Binary Search is a useful tool.