Developing Faster Lookups – Part 2 – How to build a faster VBA Lookup

In part 1 I looked at some of the difficulties with making efficient use of Excel’s built-in Lookup and Match functions:

  • Lookups are often recalculated unecessarily even when all data actually relevant to the Lookup is unchanged.
  • Binary Search Lookup requires a double lookup to handle missing data
  • Linear Search requires either a double lookup or IFERROR to handle the #N/A from missing data
  • Excel’s Lookup and Match functions are not designed to handle multiple results
  • Whole-column arguments are handled well as long as the used-range is OK

An ideal Lookup/Match function would handle these problems and also

  • Allow Named column headers as well as numbers
  • Allow more than one column as the Lookup Value
  • Allow alternative sets of Lookup Values
  • VLOOKUP would handle both ascending and descending sorted data

And of course it has to be faster than Excel’s functions in most circumstances!
What I finally developed was the AVLOOKUP/AMATCH/AVLOOKUPS/AMATCHS series of UDFs in FastExcel Version2 and 3
This post describes the design approach and shows some sample code from these UDFs.

The Design Approach

Solution for No Relevant Data Changed

The solution I chose was to store the row number in the TableRange where the answer was last found, and then before doing the actual lookup check whether that row number still gives the right answer. If it does then short-circuit the UDF by returning the answer, else go do the full lookup.

There is a small problem with this approach in that a VBA UDF is not supposed to be able to alter anything except by returning values to its calling cells, so how do you store the Row number? The solution I chose is to to use the Range.ID property for the calling cell. This works well (althought it only persists for the duration of the Excel workbook session). In the multi-threaded C++ FastExcel V3 version I use lockable global memory mapped structures stored within the workbook.

The great thing about this approach is that it is fail-safe and avoids having to handle some tricky situations: if there are 2 UDF calls using this technique within a formula the first call will get the row number stored by the second call, decide that this gives the wrong answer and then do the full lookup to get the right answer.

Binary Search with Missing Data

Part 1 showed how to make this work by making a lookup lookup itself to see if it has found an approximate or an exact match. So you need to have 2 arguments for the function (Data_Is_Sorted and Do_Exact_Match) instead of just one (Data_Is_Sorted). Then if the data is sorted and an Exact Match is requested the UDF can do the Binary Search exact match trick.

Handling the #N/A from Missing Data

This also requires another optional parameter for the UDF, giving the Value to return if an exact match has been requested but not found (defaults to #N/A).

Optimizing Data Transfer

To minimise data transfer time if the Lookup_Table is a range then the UDF keeps it as a range and passes the range object to Application.WorksheetFunction.MATCH, instead of transferring all the data to a Variant array and then either searching that using VBA or passing the variant array to MATCH. If the input is a calculated range or array constant rather than a Range then the UDF passes that directly to MATCH.
When checking if the stored row number gives the right answer the UDF does need to get a value, but it only gets a single cell.

Handling Whole Column arguments

Because the UDF is passing Range Objects rather than Variant arrays to Excel worksheet functions it does not really need to bother about this. In other cases the trick would be to use Application.Intersect(UsedRange,InputRange).

Handling Multiple Results & Alternative sets of Lookup Values

This is handled by writing the UDF so that it loops on the inputs and lookups and returns an array of results rather than a single value.

Named Column Headers

If the Answer Column parameter is passed as a string then the UDF searches for it in the first row of the data, but if its passed as a number then that is used as the column index.

Ascending and descending Sorted data

Internally the UDF uses Excel’s MATCH function which handles both sort types.

Example code – the FLOOKUP UDF

FLOOKUP is a simplified version of the FastExcel V3 AVLOOKUP2 function.
FLOOKUP does not contain these AVLOOKUP2 features:

  • Calculated columns/ranges or constant array arguments
  • Named Column headers
  • Multiple Answers
  • More than one column as the Lookup Value
  • Alternative sets of Lookup Values
  • Function Wizard Help
  • Multi-threaded execution
  • Lookup memory persisted in saved workbook
  • Named Lookup Memory sets
  • Written in C++ as an XLL

FLOOKUP VBA Code

Option Explicit
Option Base 1
Option Compare Text
'
' COPYRIGHT © DECISION MODELS LIMITED 2006, 2007, 2011. All rights reserved
' May be redistributed for free but
' may not be sold without the author's explicit permission.
'
Public Function FLOOKUP(lookup_value As Variant, _
Lookup_Range As Range, _
Answer_ColNum As Variant, _
Optional Sort_Type As Variant, _
Optional Exact_Match As Variant)
'
' FLOOKUP UDF: Fast Lookup
'
' find the Lookup_Value in the first column of Lookup_Range
' and return the corresponding row value from Answer_Colnum
'
' Sort_Type may be:
' -1 = sorted descending
' 1 = sorted ascending
' 0 = not sorted
'
' Exact_Match may be: True or False or any Non-Boolean value or string
' Exact_Match is returned if there is no exact match
'
' Sort_Type is optional and defaults to 1
' Exact_Match is optional and defaults to False if sorted, or #N/A if unsorted
'
Dim vAnsa As Variant ''' output to be assigned to function
Dim vRow As Variant ''' the row number found in Lookup_Range else error
Dim vNoMatch As Variant ''' the value to return if no exact match found
Dim lSorted As Long ''' -1 =descending, 0= not sorted, 1=ascending
Dim jAnswerColumn As Long ''' the answer column number in Lookup_Range
Dim blExact As Boolean ''' exact match flag
Dim blFound As Boolean ''' true if a match found
'
' setup error handler
'
On Error GoTo FuncFail
'
' skip execution if empty or uncalculated inputs
'
If IsEmpty(lookup_value) Or IsEmpty(Lookup_Range(1, 1)) Or IsEmpty(Answer_ColNum) Then
Exit Function
End If
'
' convert defaults etc
'
SetDefaults Sort_Type, lSorted, Exact_Match, vNoMatch, blExact, _
Answer_ColNum, jAnswerColumn
'
' now try to find the lookup_value
'
blFound = False
If Not blExact Then
'
' approx match sorted requested
'
On Error Resume Next
vRow = Application.WorksheetFunction.Match(lookup_value, Lookup_Range.Columns(1), lSorted)
If Err = 0 Then blFound = True
On Error GoTo FuncFail
Err.Clear
Else
'
' exact match requested
'
' first check memory, If matches then done
'
vRow = lMemoryId(Application.Caller)
If vRow > 0 Then
'
' check value in column 1 against Lookup_Value
'
If vRow <= Lookup_Range.Rows.Count Then
If Lookup_Range(vRow, 1) = lookup_value Then
blFound = True
End If
End If
End If
'
' no memory or memory does not match
'
If Not blFound Then
'
' look for the row
'
vRow = Application.Match(lookup_value, Lookup_Range.Columns(1), lSorted)
'
' if unsorted and nomatch vRow contains error
'
If Not IsError(vRow) Then
If lSorted = 0 Then
blFound = True
Else
'
' we have a potential row: check against Lookup_Value
'
If Lookup_Range(vRow, 1) = lookup_value Then blFound = True
End If
End If
End If
End If
'
' return the value
'
If Not blFound Then
'
' unsuccessful Exact Match: return the given error value
'
FLOOKUP = vNoMatch
Else
'
' successful: get answer from column
'
FLOOKUP = Lookup_Range(vRow, jAnswerColumn)
End If
'
' store memory
'
If blExact Then
Call StoreID(Application.Caller, vRow)
End If
Exit Function
FuncFail:
FLOOKUP = CVErr(xlErrValue)
Exit Function
End Function
Private Sub SetDefaults(Sort_Type As Variant, lSorted As Long, Exact_Match As Variant, _
vNoMatch As Variant, blExact As Boolean, _
Answer_ColNum As Variant, jAnswerColumn As Long)
'
' sorted defaults to sort ascending : 1
'
lSorted = 1
If Not IsMissing(Sort_Type) Then lSorted = CLng(Sort_Type)
'
' set blExact and vNoMatch
'
vNoMatch = CVErr(xlErrNA)
blExact = True
'
If IsMissing(Exact_Match) Then
'
' Exact_Match defaults to False if Sorted, #N/A if unsorted
'
If lSorted <> 0 Then blExact = False
Else
'
' do exact match unless sorted and exact match=false
'
vNoMatch = Exact_Match
If VarType(vNoMatch) = vbBoolean And lSorted <> 0 Then
If Not vNoMatch Then blExact = False
End If
End If
'
' answer column must be convertible to long
'
jAnswerColumn = CLng(Answer_ColNum)
End Sub
Private Function lMemoryId(oCaller As Range) As Long
'
' get the ID from the calling cell and convert to long
'
Dim vID As Variant
lMemoryId = 0
On Error GoTo FuncFail
'
' ID property not available in Excel 97
'
If Val(Left(Application.Version, 2)) > 8 And Not oCaller Is Nothing Then
vID = oCaller.ID
If vID <> "" And IsNumeric(vID) Then
lMemoryId = CLng(vID)
End If
End If
FuncFail:
End Function
Private Sub StoreID(oCaller As Range, vRowNum As Variant)
'
' store the id in the calling cell
'
If Not oCaller Is Nothing And Val(Left(Application.Version, 2)) > 8 Then
oCaller.ID = CStr(vRowNum)
End If
End Sub

Performance Comparison of FLOOKUP and VLOOKUP

These comparisons are done using Excel 2010 32-bit, but with multi-threading  calculation switched off.
The data range is 32000 rows and timings are for 10 function calls.
Vlookup Error handling is using the double lookup method rather than IFERROR.

Case 1: Approx Match on sorted data

  • VLOOKUP 0.53 Millisecs
  • FLOOKUP 1.01 Millisecs

Case 2: Exact Match on Unsorted Data (Midpoint)

  • VLOOKUP 2.65 Millisecs
  • FLOOKUP 1.28 Millisecs (second call)

Case 3: Exact Match – Unsorted Data – No Match found – return #N/A

  • VLOOKUP 4.64 Millisecs
  • FLOOKUP 5.43 Millisecs

Case 4: Exact Match – Unsorted Data – with Error Handling (Midpoint)

  • VLOOKUP 4.5 Millisecs
  • FLOOKUP 1.29 Millisecs (second call)

Case 5: Exact Match – Unsorted Data – with Error Handling (No Match found)

  • VLOOKUP 4.66 Millisecs
  • FLOOKUP 5.48 Millisecs

Case 6: Exact Match – Sorted Data – With Error Handling (Midpoint)

  • VLOOKUP 4.56 millisecs (Double Exact Lookup)
  • VLOOKUP 0.6 Millisecs (Double Sorted Lookup trick)
  • FLOOKUP 1.35 Millisecs

Case 6: Exact Match – Sorted Data – With Error Handling (No Match Found)

  • VLOOKUP 4.63 Millisecs (Double Exact Lookup)
  • VLOOKUP 0.58 Millisecs (Double Sorted Lookup trick)
  • FLOOKUP 1.49 Millisecs

Summary

  • Although VBA UDFs have a higher overhead than Excel’s native functions carefully written functions can perform well.
  • Using memory with unsorted exact matches can significantly improve LOOKUP performance.
  • The double sorted lookup trick is significantly faster than Linear search on sorted data with missing values.
  • FLOOKUP and the double sorted lookup trick win 4 out of the 6 tests!

  I have now rewriten the AVLOOKUP family of functions into C++ to exploit multi-threading, improved performance, the lower UDF overhead of the XLL interface and to persist the lookup memory within a saved workbook.

This entry was posted in Calculation, UDF, VBA. Bookmark the permalink.

20 Responses to Developing Faster Lookups – Part 2 – How to build a faster VBA Lookup

  1. Steve says:

    You have an error in this line.

    If vRow If Lookup_Range(vRow, 1) = lookup_value Then

  2. Govert says:

    Hi Charles – I wonder if I could ask a big favour – for you to post a workbook with the VBA, your test data and some timing code? I’d like to see if I can translate your function to VB.NET with Excel-DNA and compare the speed with the VBA, and with your future C++ version…

  3. Jim Cone says:

    Charles,
    The Range.ID property is part of Excel vba that I was not really aware of. It appears intriguing.
    Can you list any other general areas where use of this could offer advantages?
    Thanks for keeping me awake,
    ‘—
    Jim Cone

    • fastexcel says:

      Jim,
      The problem with Range.ID is that it does not persist in a saved workbook (actually not sure I have tried it in an xlsx file). The FLOOKUP trick is the only way I use it, but there must be other uses for it.

  4. Jim Cone says:

    Charles,
    For what it is worth, I’ve come up with another use for the Range,ID property.
    In my Special Sort add-in, one sort method requires the position of blank cells within the data be maintained. To do this now, blank cells are filled with the value from cell above and a Range.ID is attached to the cell…
    rCell.Value2 = rCell(0, 1).Value2
    rCell.ID = “Blank”

    When the sort completes the ID’s are removed…
    For Each rCell in RangeToSort.Columns(num).Cells
    If Len(rCell.ID) Then
    rCell.ID = vbNullString
    rCell.ClearContents
    End if
    Next

    Checking the length of a non-existent Range.ID does not generate an error.
    As I said, FWIW.

  5. excelfeast says:

    Hi Charles. I love these articles. I’m planning to point readers of my book here, and wonder if you can unmangle the damage WordPress has done to the code in terms of the usual <> issue.

    (I’m also going to suggest they seriously look at the commercial option of FastExcel, which will be a stepchange again in efficiency. But pointing to these two posts is the prelude to that).

    Cheers

    Jeff

  6. Jeff Weir says:

    Sorry, another question: what does “Double Exact Lookup” mean in relation to the times under cases 6 and 7 above? You mean doing a double lookup on unsorted data?

  7. Jeff Weir says:

    Hmm. I understand what you mean by the Double Sorted Lookup trick, and why users would use it (Because it’s damn fast, that’s why). But I can’t see anything here that explaines what the Double Exact Lookup / Double Linear Search Lookup is, or when a user would use it.

    Can you elaborate further? I’m curious now…

    • Jeff Weir says:

      Hi Charles. I’m still curious about what a Double Exact Lookup / Double Linear Search Lookup is, and when a user would use it. Thought I’d give this thread a nudge in case you’re able to elaborate further. Apologies if you can’t spare the time just now… I expect you’re every bit as busy as I am. But if you can, there’s a glass of Central Otago Rabbit Ranch in it for you, provided you can pick it up from Wellington. Otherwise I’ll just toast your answer from here as I drain it in your honor. 🙂

  8. fastexcel says:

    Jeff, If you can’t use IFERROR with linear exact search for compatibility reasons (less likely now than when this post was written) then you have to use IF wrapping 2 exact match lookups.

  9. Jeff Weir says:

    Ahhh. Gotcha. Prost!

  10. Jeff Weir says:

    Charles, your SetDefaults function needs a slight tweak, because as coded it is impossible for blExact to be set to TRUE.

    Here’s the code you have. As you can see, blExact will remain False even if Exact_Match is true.

    ‘ do exact match unless sorted and exact match=false

    vNoMatch = Exact_Match
    If VarType(vNoMatch) = vbBoolean And lSorted 0 Then
    If Not vNoMatch Then blExact = False
    End If
    End If

  11. Jeff Weir says:

    Apologies, I’ve pointed at the wrong bit. (I hadn’t realised that earlier you’d set blExact = true)

    The bit that needs a tweak is to do with this:

    ‘ set blExact and vNoMatch
    vNoMatch = CVErr(xlErrNA)
    blExact = True

    …because just a little bit further below, you overwrite that vNoMatch variable in all cases:

    ' do exact match unless sorted and exact match=false
    '
    vNoMatch = Exact_Match
    If VarType(vNoMatch) = vbBoolean And lSorted 0 Then
    If Not vNoMatch Then blExact = False
    End If


    …meaning that even if the user passed in TRUE, if there is not an exact match they won’t see an error…they’ll just see the word TRUE.

    I think your intent was for something like this:

    vNoMatch = Exact_Match
    If VarType(vNoMatch) = vbBoolean And lSorted 0 Then
    If vNoMatch Then
    vNoMatch = CVErr(xlErrNA)
    Else
    blExact = False
    End If
    End If

    …and I’d probably be tempted to go one further and let the user pass in either TRUE or 1 just like the other functions do:


    ' do exact match unless sorted and exact match=false
    '
    vNoMatch = Exact_Match
    If VarType(vNoMatch) = vbBoolean Or vNoMatch = 1 Then
    If lSorted 0 Then
    If vNoMatch Then
    vNoMatch = CVErr(xlErrNA)
    Else
    blExact = False
    End If
    Else: blExact = False
    End If
    End If

    …although I’m sure that extra IF could be restructured out with a slight tweak.

  12. fastexcel says:

    @Jeff, Exact_Match is what to return if no exact match is found: for example “Not Found” (so that you don’t need IFERROR). You COULD pass in TRUE for Exact_Match but it would not make much sense to return TRUE if no match was found.

  13. Jeff Weir says:

    Yeah, I understand that if the data is sorted and an Exact Match is requested the UDF can do the Binary Search exact match trick. I’m just (still) a little confused about the intent behind the line If VarType(vNoMatch) = vbBoolean .

    Given the lookup array {1;2;3;5}, I see that:
    =FLOOKUP(4,$A$1:$B$4,1,1) returns 3 i.e. inexact match
    =FLOOKUP(4,$A$1:$B$4,1,1,”Not Found” returns “Not Found”
    =FLOOKUP(4,$A$1:$B$4,1,1,NA()) returns NA()

    I also see in terms of that If VarType(vNoMatch) = vbBoolean And lSorted 0 Then line that:
    =FLOOKUP(4,$A$1:$B$4,1,1,FALSE) returns 3, but that FALSE argument is redundant in this case, because =FLOOKUP(4,$A$1:$B$4,1,1) also returns 3
    =FLOOKUP(4,$A$1:$B$4,1,1,TRUE) returns “TRUE”, so you’re clearly not expecting TRUE to mean anything in the context of that VarType test.

    At the same time, in your post also says under the “Handling the #N/A from Missing Data” heading This also requires another optional parameter for the UDF, giving the Value to return if an exact match has been requested but not found (defaults to #N/A).

    And indeed you set vNoMatch = CVErr(xlErrNA). But it gets overwritten 4 instructions later with vNoMatch = Exact_Match

    So as coded, the only way I can get it to return #N/A for exact binary search is like this:
    =FLOOKUP(4,$A$1:$B$4,1,1,NA())

    If I understand correctly, then as coded, both that vNoMatch = CVErr(xlErrNA) line and that If VarType(vNoMatch) = vbBoolean And lSorted 0 Then line are redundant, and the statement if an exact match has been requested but not found (defaults to #N/A) is incorrect.

    Which is why I proposed the coding change above – because given what you say in the post I think that’s the intent. But I’m doing some serious second guessing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s