Developing Faster Lookups – Part 3 – A Binary Search UDF – Updated

In part 1 and part 2 of Developing Faster Lookups I side-stepped the question of whether its faster to use your own VBA Binary Search routine than to use Excel’s Worksheet Functions.

In this post I will look at 2 Binary Search UDFs. Both UDFs are the equivalent of MATCH with sorted ascending data, except that when there are multiple identical values that could be the answer:

  • MATCH returns the LAST of the identical values
  • BSearchFirstV and BSearchFirstR return the FIRST of the identical values

Binary search works by looking at the midpoint of the range of lookup values and comparing it to the value being looked up.
If the lookup value is less than the midpoint then the upper half of the range is discarded and Binary search looks at the midpoint of the lower half of the range. So on each successive iteration half the data gets discarded. This makes the Binary Search algorithm extremely efficient with anything more than 10-20 rows of data.

To find the first of multiple sorted matches we want to find the consecutive pair of values where the first is less than the lookupvalue and the second is >= the lookupvalue.

When the pair has been found we have to check whether its the first value we want or the second. And then we have to handle the boundary conditions:

  • The lookup value is less than the first value in the lookup table
  • The lookup value is equal to the first value in the lookup table
  • The lookup value is greater then the last value in the lookup table

The algorithm is very simple and concise, but surprisingly difficult to program correctly. Many variations exist and its often useful to create your own variation  for particular circumstances.

I have coded 2 variations: (these are revised versions – the first versions I posted had bugs!)

BSearchFirstR only works when you pass the LookupTable as a Range. BSearchFirstV is designed to work withe the LookupTable being passed as a 2-Dimensional 1 column N rows variant array (although if you give it a Range it converts it to a variant array anyway).

Here is the code:

Option Explicit
Option Base 1
Option Compare Text
'
' COPYRIGHT © DECISION MODELS LIMITED 2006, 2007, 2011. All rights reserved
' May be redistributed for free but
' may not be sold without the author's explicit permission.
'
Public Function BSearchFirstR(LookupValue As Variant, LookupArray As Range) As Variant
'
' find the position of the largest value in LookupArray <= LookupValue
' LookupArray must be a Range
' LookupArray must be sorted ascending
' if multiple LookupValues exits in LookupArray, the position of the first will be returned
'
Dim low As Long
Dim high As Long
Dim middle As Long
Dim varMiddle As Variant
BSearchFirstR = CVErr(xlErrNA)
With LookupArray
low = 0
high = .Rows.Count
Do While high - low > 1
middle = (low + high) \ 2
varMiddle = .Cells(middle, 1).Value2
If varMiddle >= LookupValue Then
high = middle
Else
low = middle
End If
Loop
If (low > 0 And high <= .Rows.Count) Then
If .Cells(high, 1) > LookupValue Then
BSearchFirstR = low
Else
BSearchFirstR = high
End If
ElseIf low = 0 Then
If .Cells(1, 1).Value2 <= LookupValue Then
BSearchFirstR = 1
End If
End If
End With
End Function
'
'
'
Public Function BSearchFirstV(LookupValue As Variant, LookupArray As Variant) As Variant
'
' find the position of the largest value in LookupArray <= LookupValue
' LookupArray must be a Range or a 2-dimensional variant array of N rows and 1 column
' LookupArray must be sorted ascending
' if multiple LookupValues exits in LookupArray, the position of the first will be returned
'
Dim low As Long
Dim high As Long
Dim middle As Long
Dim varMiddle As Variant
BSearchFirstV = CVErr(xlErrNA)
If TypeName(LookupArray) = "Range" Then LookupArray = LookupArray.Value2
low = 0
high = UBound(LookupArray)
Do While high - low > 1
middle = (low + high) \ 2
varMiddle = LookupArray(middle, 1)
If varMiddle >= LookupValue Then
high = middle
Else
low = middle
End If
Loop
If (low > 0 And high <= UBound(LookupArray)) Then
If LookupArray(high, 1) > LookupValue Then
BSearchFirstV = low
Else
BSearchFirstV = high
End If
ElseIf low = 0 Then
If LookupArray(1, 1).Value2 <= LookupValue Then
BSearchFirstV = 1
End If
End If
End Function

Notes:

  • Most error checking has been omitted to simplify the code
  • There is no checking for uncalculated cells
  • There is no allowance for whole-column references or ranges containing empty cells.
  • The UDFs assume the data is sorted ascending
  • The UDFs do not allow for wild-card text matching

Timing Results

  • BSearchFirst V is faster than BSearchFirstR when the Lookup Table contains less than about 500 rows.
  • For large ranges BSearchFirstR is considerable faster because the whole of the Lookup Table data is not transferred to VBA.
  • Both variations are slower than using approximate MATCH, but are faster than using Linear Search (unsorted) MATCH to find the first exact match for large ranges, and are faster than Array formulae for finding the first of multiple matches

Conclusion

Using  a VBA coded Binary Search routine will not improve the speed of the FLOOKUP UDF.

However when you want a variation of MATCH that is not in the built-in Excel MATCH VBA Binary Search is a useful tool.

Advertisements
This entry was posted in UDF, VBA. Bookmark the permalink.

9 Responses to Developing Faster Lookups – Part 3 – A Binary Search UDF – Updated

  1. Steve says:

    Your 2nd function is missing the ‘With’ statement

  2. sam says:

    @ Charles

    “MATCH returns the LAST of the identical values”

    Well this not entirely true.

    If your data is sorted in Ascending order and you use the 1 type it returns the position of the last of the identical values

    If you data is sorted in Descending order and you use the -1 type it returns the position of the 1st of the identical values

  3. fastexcel says:

    Sam,
    Yes: the whole post is written for sorted ascending (“the UDFs are the equiovalent of MATCH with sorted ascending”). For sorted descending you need another small variation of Binary search to get the Last value: I will leave that as an exercise for the readers!

  4. Ludgy says:

    Hi
    Thanks for the functions.
    I’m experiencing a problem with BSearchFirstR though.

    I’m looking up a value which I can find in the LookupArray by doing a manual search but the function returns an #NA.

    I have duplicate values in the lookuparray. Could this throw the function off?

  5. Ludgy says:

    Oh hang on…
    Now it returns the wrong line number.
    My value is sitting in line 338 but the function returns 343

  6. fastexcel says:

    If you can send me (Charles at DecisionModels.com) your data I will take a look

  7. fastexcel says:

    Ludgy’s problem was that the data contained a mixture of upper and lower case keys sorted in a case-insensitive way, but the binary search was coded in a case-sensitive way: I have added Option Compare Text to the code to fix this.

  8. Ludgy says:

    Thanks for the speedy reply and for the code.
    It’s made my life a whole lot easier.

    Ludgy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s