VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

OK, time for a post on VLOOKUP for Bill Jelen’s VLOOKUP week.

VLOOKUP Week – Brought to You by Mr. Excel
(Click Image for the Official Site of VLOOKUP Week)

For all you VLOOKUP junkies who can’t get enough of VLOOKUP, here is a wonderful reason to use 2 VLOOKUPs instead of 1 !!

Exact Match VLOOKUP is slow

When you have a lot of data in Excel (think 10,000 or 100,000 or 1,000,000 rows) you often need to detect when a value you are looking for does not exist in the data.
You can do this using VLOOKUP and the formula would look something like this, supposing you were trying to get information about a Part Number from the 4th column in a Parts List:

=VLOOKUP(PartNumber,PartsList,4,FALSE)

If Part Number is not in the Parts List then the formula will return #N/A. The final VLOOKUP argument (FALSE) tells VLOOKUP to do a linear exact match search of the Parts List, starting at the first row and looking at each row in turn until it finds the Part Number or reaches the end of the Parts List.

This works well with a small number of VLOOKUPs and when Parts List has a small number of rows, but for large numbers of VLOOKUPs with a large PartsList it can take minutes to calculate.

This problem is made even worse by the fan-out effect of making a change to the Parts List table.
Because of the way that Excel’s smart recalculation engine tracks dependencies, if you make a change to a particular row the Parts List table ALL the formulas that refer to Parts List will be recalculated, even if the result of the formula does not use the row that was changed.

So how do you speed up this slow calculation?

If you sort the PartsList table on Part Number then it would be nice to be able to use a formula like this:

=VLOOKUP(PartNumber,PartsList,4,TRUE)

The TRUE as the final VLOOKUP argument tells VLOOKUP to do a binary search of the parts list.

Binary Search is lightning fast compared to the linear search.

But it does not care about missing data: if the PartNumber does not exist VLOOKUP will happily find the next largest part number that is less than Part Number.

And it won’t tell you that it could not find the Part Number, so its extremely difficult to spot that you have got the wrong answer!

Thats why many people say you should never use the TRUE argument in VLOOKUP.

VLOOKUP has a major design error

Oh and by the way TRUE is the default if you don’t enter anything for the fourth argument.
So if your data is not sorted, or you have missing data, and you forget to add the FALSE argument you will get the wrong answer.

The magic solution: 2 VLOOKUPs are better than 1

OK, so here is the magic trick you can use to make safe fast VLOOKUPs on sorted data, even when there may be missing or mismatched data.

Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.

=VLOOKUP(PartNumber,PartsList,1,TRUE)

If the PartNumber exists this formula will return the same Part Number as the PartNumber you gave it.

But if the PartNumber does not exist the formula will give you a different part number!

So we can use IF to find missing Part Numbers:

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, “Exists”, “Missing”)

And so for the final magic formula you just replace the “Exists” with the VLOOKUP for the data you want!

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)

Because VLOOKUP binary search is so much faster than VLOOKUP linear search this trick is faster even with small numbers of rows (50 or so).

More LOOKUP Tricks

My website has a page showing many tricks for using Lookups:

http://www.decisionmodels.com/optspeede.htm

And there are 2 earlier posts in this blog on Lookups:

https://fastexcel.wordpress.com/2011/07/20/developing-faster-lookups-part-1-using-excels-functions-efficiently/

https://fastexcel.wordpress.com/2011/07/22/developing-faster-lookups-part-2-how-to-build-a-faster-vba-lookup/

This entry was posted in Calculation, Lookups. Bookmark the permalink.

44 Responses to VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

  1. Pingback: Charles Williams: “Why 2 VLOOKUPS are better than 1 VLOOKUP” « VLOOKUP WEEK

  2. T. Valko says:

    If the data is going to be sorted why not just use the LOOKUP function which defaults to a binary search?

    =IF(LOOKUP(A1,E2:E1000)=A1,LOOKUP(A1,E2:H1000),”Missing”)

    • fastexcel says:

      Yes, could have used LOOKUP which , like VLOOKUP, defaults to MicroSofts implementation of Binary Search. But then it would have ruined the title of my Post!

  3. Ram says:

    =IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)…

    Shouldn’t the second Vlookup have FALSE instead of TRUE?

  4. @Ram, No. The idea is to to use fast binary search lookup rather than slow linear search lookup, so it has to be TRUE

  5. Doug Glancy says:

    Thanks! After watching the calculation percent message in the status bar for way too long today, I remembered this post. I used this technique with a two-dimensional Index/Match (only checking the row dimension, of course). Calculation on 50,000 rows went from several minutes to a few seconds. I was also able to remove the IFERROR wrapper for #N/A’s since this accomplishes the same thing more cleanly.

    • fastexcel says:

      Thanks Doug – a good example

    • Col Delane says:

      Does the following not achieve all of the stated objectives in a more elegant and efficient manner?
      =IFERROR(VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)

      • fastexcel says:

        Col, VLOOKUP … TRUE does not return an error if what you are looking up does not exist, so using IFERROR does not work.

      • Col Delane says:

        Replace the TRUE with FALSE for an exact match and this formula will save Excel from having to do two VLOOKUPS if the required value is found.

      • fastexcel says:

        @Col – But 1 exact match lookup on 500K rows will take on average at least 4 orders of magnitude more time (factor of at least 10000) than 2 approx. match VLOOKUPS on sorted data. Try it and see!

      • Col Delane says:

        Ok Charles, I accept that justification for an application where there is sorted data and a humungous number of rows. However, for the majority of Excel users wanting to return a value from a much smaller list (which is often not sorted), there must be a tipping point prior to which a formula such as that I’ve suggested is just as fast (and where both Fx are probably so fast it doesn’t matter) AND is simpler and lower risk.

      • fastexcel says:

        If the data is unsorted you have to use exact match. If the data is sorted you can use either. The timing crossover between 2 approx. match Vlookups and 1 exact match Vlookup is somewhere around 15-20 rows of data, but as you say for that amount of data it will not matter unless you are doing a very large number of lookups.

      • jeffrey Weir says:

        Col Delane: I see an awful lot of spreadsheets from an awful lot of people that could really do with this approach. If only they knew it existed. You’re right that this is a select few users as a proportion, but with so many hundred million users stuffing data into Excel, the absolute numbers of folk who could use this tip to save their users (and themselves) real pain is significant. Two seconds to do a lookup is two seconds wasted, in my opinion…never mind something ridiculous like two minutes. This double vlookup combination will shave seconds to minutes off your recalc time in the blink of an eye.

        And the number at which the speed savings become apparent may be lower than you think. It’s a function of how many VLOOKUPs in the workbook times how large the ranges are. I wrote a blog post sometime back at http://dailydoseofexcel.com/archives/2015/04/23/how-much-faster-is-the-double-vlookup-trick/

        At one extreme, if your lookup table has 10,000 things in it, the double VLOOKUP trick on sorted data is 28 times faster than the standard VLOOKUP on unsorted data. At the other, if your lookup table has 1,000,000 things in it, the double VLOOKUP trick on sorted data is 3,600 times faster than the standard VLOOKUP on unsorted data.

        Kinda puts the ‘hassle’ of sorting your lookup table ascending into perspective, don’t it!

  6. evandro says:

    Wooowwww….thanks for your post. This tips really help me faster my work, before i took me 76 second vlookup 100000 entry using 1 vlookup, using your tips (2 vlookup) it took only 1 second…AMAZINGGG!!!!

  7. Andrew says:

    Totally lost with what you are saying here, maybe I don’t understand because I have limited knowledge of vlookup:

    “Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.”

    (Does this mean use vlookup to look at the same column you are returning the value from i.e. column 1?? You are returning column 1 right, but would this not always return an exact match like saying does x=x, y=y?)

    (If this doesn’t make sense it’s because I am totally misunderstanding the data you are using – perhaps a small display of the data would help as I need to see things before I understand it)

    In my own project I have two data sets, both columns 1 in each set has a part ID, lets use that, and one book has more rows than the other.

    I want to say in the book with more rows, look this value up (workbook1.A2:A700,200) and tell me if it does or does not exist in (workbook2.A2:A680,000)

    When it shows the ones that do not exist i.e. N/A Then I know these are the ones that do not exist in the second workbook.

    I used vlookup but it’s taking about half an hour to complete! Got about 700,000 rows in each workbook.

  8. fastexcel says:

    You can only do this VLOOKUP trick if your data is sorted. Then using True as the last argument in VLOOKUP is fast but won’t tell you if you have data that does not exist in the lookup table, unless you use this double lookup trick.
    Alternatively you could use the SpeedTools COMPARE.LIST function which is very fast and does not care if the data is sorted or not.

  9. Pingback: Does A-Z Sort Speed VLOOKUP? | Learn Excel from MrExcel

  10. Pingback: Alternative to Vlookup with hundred thousand lines

  11. My says:

    I tried that and it didn’t work for all the formulas. I sorted both worksheets to test out.
    In my file, I have this in formula. Result is missing. Col H is the suppose to be the output if column 1 matches. I copy the formula down and some of the matches find the name others show up as Missing when the name is the exact same.
    Most of the data I work with are generally not sorted so dont think it will work.

    =IF(VLOOKUP(C38,'[SAMPLE.xlsx]2014′!$A:$H,8,TRUE)='[SAMPLE.xlsx]2014′!$H:$H,VLOOKUP(C:C,'[SAMPLE.xlsx]2014′!$A:$H,8,TRUE),”MISSING”)
    I tried to download the speedtools but was able too.

  12. Pingback: Daily Dose of Excel » Blog Archive » How much faster is the double-VLOOKUP trick?

  13. Pingback: The Analyst Cave | Excel, analytics, programming and more

  14. Pingback: CP051: VLOOKUP FAQs – Most frequently asked questions about VLOOKUP – Answered | Chandoo.org - Learn Microsoft Excel Online

  15. Jason S says:

    I am confused by the sample formula. So I have my data in 2 areas Columns A & B have my new data I want to find a match on…columns F & G have my source data that I am trying to match.

    I am not sure if I should be replacing the “4” you have in your code.
    =IF(VLOOKUP(A2,F:G,1,TRUE)=A2, VLOOKUP(A2,F:G,4,TRUE), “Missing”)

    When I tried this it did not seem to help with speed. Now I am running a HUGE dataset of over 500k rows in each group.

    • fastexcel says:

      If what you want to do is to compare 2 lists to find out what the miss-matches are then the fastest way is to use FastExcel’s COMPARE.LISTS function.
      If you want to use the double VLOOKUP trick then make sure columns F:G are sorted ascending on column F and do not have a header row.
      Don’t think you should have a 4 in your VLOOKUP when you are only using 2 columns, surely it should be a 2 ?

  16. Wow, thanks for the trick ! I gave a shot to the double Vlookup function, and excel calculated my spreadsheet in about 10 seconds, while it had took 3 hours to complete for the same set of data with a simple Vlookup!
    You changed my life today 😀

  17. Irina says:

    The performance improvements using the binary vs linear search are monumental. I had a speadsheet with about 500,000 rows. It was barely crawling. The binary VLOOKUP brought it back to life. A linear VLOOKUP does each lookup in about 1/2 of the row number, requiring 250,000 compares. By contrast, binary search VLOOKUP performs the same taks in log 2 500,000 = 19 steps. That means 12,500 times improvement. For 1 million rows file, the number of linear searches is about 1/2 of the row size = 500,000 compares; whereas binary search does the same taks in log 2 1,000,000 = 20 searches, with the 25,000 times improvement.

  18. Pingback: More on Lookups | Newton Excel Bach, not (just) an Excel Blog

  19. victorng8 says:

    Fantastic formula. However what happens if I need to VLookup many comma separated values in one cell? This question perfectly sums up the problem I’m having. http://stackoverflow.com/q/43253677/1186038

    After doing that I need to drag it down to ~30k records but it would take minutes .

    • fastexcel says:

      @victorng8 – I would look for a way of refactoring the lookup table to separate out each row that contains multiple CSVs into individual rows (probably do this with PowerQuery) then sort the table and use the double lookup trick.

  20. Helen C. says:

    I tried the double vlookup against a regular vlookup as a test with my data sets. I compared asset numbers from one year to another–over 808K rows in one, almost 800K rows in the other–so you can imagine how long a regular vlookup takes. The double vlookup took 3 seconds whereas the regular one was calculating at 1% when I walked away at the 3 min mark. When I came back to compare the results, I found that all but 5 were the same between the two formulas. According to the double vlookup, 5 assets were not found whereas they were found with the regular vlookup. I checked manually–the latter was correct. This scares me. Any idea why the difference?

  21. fastexcel says:

    Helen’s problem is that there are duplicate asset numbers in the first year but with different labels, so sorted lookup gives the last label of the duplicates but non-sorted lookup gives the first label of the duplicates – so the labels don’t match but the asset numbers do.

  22. middlefastexcel says:

    Hello fastexcel & community,

    I found your blog and I’m really happy for all the Info, thanks a lot!
    I have a question concerning that topic and would be really happy for an answer.

    So that the formula works the table must be sorted. If ascending or descending doesn’t care? And does it play a role if the looked up Column is a mix between numbers and characters how many characters the cell has? For example I want to look up:
    exammpleword.6584587.48754585477-554774458-66658-5547854

    Thanks a lot

    • fastexcel says:

      For VLOOKUP the data needs to be sorted ascending and it does not care how many characters the cell has. You can of course use a double INDEX-MATCH combination instead of VLOOKUP and the you can specify in the MATCH whether the data is sorted ascending or descending.

  23. VS Rawat says:

    Good one. I had to think it over to comprehend what happens when partial match is found, but exact match may be there later down in the table. But no, if entire PartNumber matches, then it has to be an exact match even with an approximate search, even though PartNumber1, PartNumber2, etc. further variations of PartNumber exist down below. Insightful. Thanks.

  24. Peter says:

    Hi, yes I agree the double vlookup is great but it has its limitations, because I have found instances where no matches are returned but there is a match. For example numbers such as these – P055AZOR01A0002, P055AZOR01A0002DS & P055AZOR01C0001, P055AZOR01C0001DW. I would be grateful if you can advise how to remedy this. Thanks

  25. Peter says:

    Yeah, the columns for both files are sorted in ascending order A-Z etc. and the DvLookUp I’m using keeps missing the examples above. =IF(VLOOKUP(B2,[MIMS_Docs_BoD_011320.xlsx]MIMS_Docs!$B$2:$B$238592,1,TRUE)=B2,VLOOKUP(B2,[MIMS_Docs_BoD_011320.xlsx]MIMS_Docs!$B$2:$B$238592,1,TRUE),”No Match”). I have around 1500 numbers that have been missed out of 250000, however on using the standard vLookUp it returns a match? However, it does take 30mins to do it. Therefore, any suggestions would be grateful.

  26. fastexcel says:

    Can you send me two example files that do not work? ( preferably a subset of the data, and I only need column B!)

  27. Peter says:

    Sure, what’s the best way for sending the data to you?

  28. Andrew DL says:

    Hi Sir, I am working on 495406 rows lookup on 478649 rows. 39 rows returned #N/A. Lookup values are unique and have been sorted in ascending order. They are composed of alphanumeric with * and – as special characters but it has worked before and does not seem to affect other results.

    • Andrew DL says:

      =IF(VLOOKUP(H2,$A$2:$A$478650,1,TRUE)=H2,VLOOKUP(H2,$A$2:$D$478650,4,TRUE),0)

      This is the formula i used, the 4 in the second vlookup only has 1 as value for every row so that it returns 1 for exact matches and then 0 otherwise.

  29. fastexcel says:

    The * is treated as a wildcard character for exact match lookups on text, but is treated as a * (not wildcard) in approximate match.

Leave a comment