VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

OK, time for a post on VLOOKUP for Bill Jelen’s VLOOKUP week.

VLOOKUP Week – Brought to You by Mr. Excel
(Click Image for the Official Site of VLOOKUP Week)

For all you VLOOKUP junkies who can’t get enough of VLOOKUP, here is a wonderful reason to use 2 VLOOKUPs instead of 1 !!

Exact Match VLOOKUP is slow

When you have a lot of data in Excel (think 10,000 or 100,000 or 1,000,000 rows) you often need to detect when a value you are looking for does not exist in the data.
You can do this using VLOOKUP and the formula would look something like this, supposing you were trying to get information about a Part Number from the 4th column in a Parts List:

=VLOOKUP(PartNumber,PartsList,4,FALSE)

If Part Number is not in the Parts List then the formula will return #N/A. The final VLOOKUP argument (FALSE) tells VLOOKUP to do a linear exact match search of the Parts List, starting at the first row and looking at each row in turn until it finds the Part Number or reaches the end of the Parts List.

This works well with a small number of VLOOKUPs and when Parts List has a small number of rows, but for large numbers of VLOOKUPs with a large PartsList it can take minutes to calculate.

This problem is made even worse by the fan-out effect of making a change to the Parts List table.
Because of the way that Excel’s smart recalculation engine tracks dependencies, if you make a change to a particular row the Parts List table ALL the formulas that refer to Parts List will be recalculated, even if the result of the formula does not use the row that was changed.

So how do you speed up this slow calculation?

If you sort the PartsList table on Part Number then it would be nice to be able to use a formula like this:

=VLOOKUP(PartNumber,PartsList,4,TRUE)

The TRUE as the final VLOOKUP argument tells VLOOKUP to do a binary search of the parts list.

Binary Search is lightning fast compared to the linear search.

But it does not care about missing data: if the PartNumber does not exist VLOOKUP will happily find the next largest part number that is less than Part Number.

And it won’t tell you that it could not find the Part Number, so its extremely difficult to spot that you have got the wrong answer!

Thats why many people say you should never use the TRUE argument in VLOOKUP.

VLOOKUP has a major design error

Oh and by the way TRUE is the default if you don’t enter anything for the fourth argument.
So if your data is not sorted, or you have missing data, and you forget to add the FALSE argument you will get the wrong answer.

The magic solution: 2 VLOOKUPs are better than 1

OK, so here is the magic trick you can use to make safe fast VLOOKUPs on sorted data, even when there may be missing or mismatched data.

Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.

=VLOOKUP(PartNumber,PartsList,1,TRUE)

If the PartNumber exists this formula will return the same Part Number as the PartNumber you gave it.

But if the PartNumber does not exist the formula will give you a different part number!

So we can use IF to find missing Part Numbers:

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, “Exists”, “Missing”)

And so for the final magic formula you just replace the “Exists” with the VLOOKUP for the data you want!

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)

Because VLOOKUP binary search is so much faster than VLOOKUP linear search this trick is faster even with small numbers of rows (50 or so).

More LOOKUP Tricks

My website has a page showing many tricks for using Lookups:

http://www.decisionmodels.com/optspeede.htm

And there are 2 earlier posts in this blog on Lookups:

http://fastexcel.wordpress.com/2011/07/20/developing-faster-lookups-part-1-using-excels-functions-efficiently/

http://fastexcel.wordpress.com/2011/07/22/developing-faster-lookups-part-2-how-to-build-a-faster-vba-lookup/

About these ads
This entry was posted in Calculation, Lookups. Bookmark the permalink.

12 Responses to VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

  1. Pingback: Charles Williams: “Why 2 VLOOKUPS are better than 1 VLOOKUP” « VLOOKUP WEEK

  2. T. Valko says:

    If the data is going to be sorted why not just use the LOOKUP function which defaults to a binary search?

    =IF(LOOKUP(A1,E2:E1000)=A1,LOOKUP(A1,E2:H1000),”Missing”)

    • fastexcel says:

      Yes, could have used LOOKUP which , like VLOOKUP, defaults to MicroSofts implementation of Binary Search. But then it would have ruined the title of my Post!

  3. Ram says:

    =IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)…

    Shouldn’t the second Vlookup have FALSE instead of TRUE?

  4. @Ram, No. The idea is to to use fast binary search lookup rather than slow linear search lookup, so it has to be TRUE

  5. Doug Glancy says:

    Thanks! After watching the calculation percent message in the status bar for way too long today, I remembered this post. I used this technique with a two-dimensional Index/Match (only checking the row dimension, of course). Calculation on 50,000 rows went from several minutes to a few seconds. I was also able to remove the IFERROR wrapper for #N/A’s since this accomplishes the same thing more cleanly.

  6. evandro says:

    Wooowwww….thanks for your post. This tips really help me faster my work, before i took me 76 second vlookup 100000 entry using 1 vlookup, using your tips (2 vlookup) it took only 1 second…AMAZINGGG!!!!

  7. Andrew says:

    Totally lost with what you are saying here, maybe I don’t understand because I have limited knowledge of vlookup:

    “Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.”

    (Does this mean use vlookup to look at the same column you are returning the value from i.e. column 1?? You are returning column 1 right, but would this not always return an exact match like saying does x=x, y=y?)

    (If this doesn’t make sense it’s because I am totally misunderstanding the data you are using – perhaps a small display of the data would help as I need to see things before I understand it)

    In my own project I have two data sets, both columns 1 in each set has a part ID, lets use that, and one book has more rows than the other.

    I want to say in the book with more rows, look this value up (workbook1.A2:A700,200) and tell me if it does or does not exist in (workbook2.A2:A680,000)

    When it shows the ones that do not exist i.e. N/A Then I know these are the ones that do not exist in the second workbook.

    I used vlookup but it’s taking about half an hour to complete! Got about 700,000 rows in each workbook.

  8. fastexcel says:

    You can only do this VLOOKUP trick if your data is sorted. Then using True as the last argument in VLOOKUP is fast but won’t tell you if you have data that does not exist in the lookup table, unless you use this double lookup trick.
    Alternatively you could use the SpeedTools COMPARE.LIST function which is very fast and does not care if the data is sorted or not.

  9. Pingback: Does A-Z Sort Speed VLOOKUP? | Learn Excel from MrExcel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s