In part 1 of Writing efficient VBA UDFs I looked at more efficient ways for the UDF to process a Range of data by reading it all into a Variant array. In this post I look at a case (using Excel Functions from within the UDF) where the most efficient way is somewhat different.
Linear interpolation is a commonly used technique for finding missing values or calculating a value that lies between the values given in a table.
Suppose you have a table of values like this:
Level |
Flow1 |
Flow2 |
64.00 |
0.10 |
2.59 |
64.50 |
0.77 |
3.18 |
65.00 |
2.19 |
3.73 |
65.50 |
4.02 |
4.28 |
66.00 |
6.19 |
6.88 |
66.50 |
8.64 |
12.04 |
67.00 |
11.36 |
13.85 |
67.50 |
13.45 |
14.84 |
68.00 |
15.00 |
16.37 |
68.50 |
16.41 |
21.12 |
69.00 |
17.71 |
21.68 |
And you want to find out what the value of Flow1 would be for a level of 66.25. Assuming that the value is on a straight line between the Flow for 66.0, which is 6.19, and the flow for 66.5, which is 8.64, you can calculate it like this:
The difference between 8.64 and 6.19 is 2.45, and 66.25 is half-way between 66.0 and 66.5, so add half of 2.45 to 6.19=7.415.
As a formula this becomes =6.19+(8.64-6.19)*(66.25-66.0)/(66.5-66.0)
So writing a UDF the same way as in Writing VBA UDFs Efficiently Part 1 you get this (ignoring error handling etc. for the sake of simplicity):
Function VINTERPOLATEA(Lookup_Value as Variant, Table_Array as Range, Col_Num as long) Dim vArr As Variant Dim j As Long ' get values vArr = Table_Array.Value2 ' find first value greater than lookup value For j = 1 To UBound(vArr) If vArr(j, 1) > Lookup_Value Then Exit For End If Next j ' Interpolate VINTERPOLATEA = (vArr(j - 1, Col_Num) + _ (vArr(j, Col_Num) - vArr(j - 1, Col_Num)) * _ (Lookup_Value - vArr(j - 1, 1)) / (vArr(j, 1) - vArr(j - 1, 1))) End Function
Where the Lookup_value is the value to find in the first column of the range Table_Array and Col_Num gives the column number index of the data to be interpolated (in this example 2).
This is reasonably efficient: 20 formulas interpolating on a table 10000 rows long takes 323 millisecs.
But of course we can do better!
When you look at this UDF you can see that the actual calculation only uses 2 rows of data, but to get that 2 rows it has to:
- import 10000 rows by 3 columns of data into an array
- do a linear search on the first column.
This sounds suspiciously like a LOOKUP or MATCH.
So lets try using Excel’s MATCH function instead: you can call MATCH from inside your VBA UDF using Application.WorksheetFunction.MATCH. And since the data is sorted we can use approximate MATCH. Once we have got the row number from MATCH we can get the 2 rows we are interested in.
The new UDF looks like this (again I am ignoring error handling for the sake of simplicity):
Function VINTERPOLATEB(Lookup_Value As Variant, Table_Array As Range, Col_Num As Long) Dim jRow As Long Dim rng As Range Dim vArr As Variant ' ' create range for column 1 of input ' Set rng = Table_Array.Columns(1) ' ' lookup Row number using MATCH ' Approx Match finds row for the largest value < the lookup value jRow = Application.WorksheetFunction.Match(Lookup_Value, rng, 1) ' ' get 2 rows of data ' vArr = Table_Array.Resize(2).Offset(jRow - 1, 0).Value2 ' ' Interpolate ' VINTERPOLATEB = (vArr(1, Col_Num) + _ (vArr(2, Col_Num) - vArr(1, Col_Num)) * _ (Lookup_Value - vArr(1, 1)) / (vArr(2, 1) - vArr(1, 1))) End Function
Once MATCH has found the row we can use Resize and Offset to subset the range to just the 2 required rows. (Thanks to Jim Cone for reminding me that you need to do the Resize before the Offset to make sure that the Offset does not cause an error by moving off the boundaries of the Worksheet).
This UDF takes 2 milliseconds on the test data, 160 times faster!
Update: As Peter Sestoft points out, much of the improvement is due to approximate Match using binary search rather than the linear search used in VINTERPOLATEA. In fact if you program the binary search in VBA inside the UDF it takes 3.3 milliseconds which is only 1.7 times slower than using MATCH.
Note: there are 2 ways of calling Excel Functions such as MATCH from VBA: Application.Match and Application.WorksheetFunction.Match. The differences are mostly in error handling (for instance when no match is found for the exact match option):
- Application.Match returns a Variant containing an error, which allows the use of IsError: If IsError(Application.Match …)
- Application.WorksheetFunction.Match raises a VBA error which requires an On Error handler.
Also WorksheetFunction.Match is somewhat faster.
So we need to add some error handling and boundary case handling:
- Use On Error to trap non-numeric data
- Check for the lookup value being outside the range of data in the table
- Check if the lookup value is the last value in the table
Then the UDF looks like this:
Function VINTERPOLATEC(Lookup_Value As Variant, Table_Array As Range, Col_Num As Long) Dim jRow As Long Dim rng As Range Dim vArr As Variant Dim vValue As Variant On Error GoTo FuncFail Set rng = Table_Array.Columns(1) ' Check for case if val = last value in rng vValue = rng.Cells(rng.Rows.Count, 1).Value2 If Lookup_Value = vValue Then VINTERPOLATEC = Table_Array.Cells(rng.Rows.Count, Col_Num).Value2 Exit Function End If ' Return an error if lookup_value is not within rng If Lookup_Value > vValue Or Lookup_Value < rng.Cells(1).Value2 Then VINTERPOLATEC = CVErr(xlErrNA) Exit Function End If ' lookup Row number using MATCH jRow = Application.WorksheetFunction.Match(Lookup_Value, rng, 1) ' get 2 rows of data vArr = Table_Array.Resize(2).Offset(jRow - 1, 0).Value2 ' Interpolate VINTERPOLATEC = (vArr(1, Col_Num) + _ (vArr(2, Col_Num) - vArr(1, Col_Num)) * _ (Lookup_Value - vArr(1, 1)) / (vArr(2, 1) - vArr(1, 1))) Exit Function FuncFail: VINTERPOLATEC = CVErr(xlErrValue) End Function
Conclusion:
The only thing faster than bringing all the data across to VBA in one lump is to use Excel functions to bring across only the minimum data your function needs.
Charles,
Very glad to see you are now sharing your insights on this blog. It should be of benefit to many, including myself.
FWIW, I bet there is a whole subset of Excel users who place their tables at the bottom of the worksheet? If so, that would cause problems with the use of Offset.
Jim,
Well spotted: I forgot to do the resize first – I will correct it
Excel’s built-in MATCH uses binary search which in this case should take log2(10000) or roughly 14 comparisons, rather than on average 5,000 for your linear search. It would be interesting to separate the effect of this improvement from the effect of “bringing across the data” since it is not obvious that the latter need take any time at all.
I agree that the Binary search is much faster than linear search, and that this is responsible for a significant part of the speedup. But in fact there is a high overhead associated with each call to the Excel object model to transfer data to VBA. You can see this effect in Part 1 where reading the entire block of data in one operation is much faster than reading data cell by cell.
If you program the UDF to use a VBA binary search it is 1.7 times slower than using MATCH, so most of the speedup is down to using a better algorithm but its still faster to only transport the minimum of data across to VBA
Here is another way of using an Excel function in VBA
[B1].Value2 = [Index(Grade,Match(Score,Marks,-1))]
Not sure if this is slower than Application. but it is certainly more concise…
Sam,
the [ ] syntax is short for Application.Evaluate.
Evaluate is slower than using WorksheetFunction and has a number of “quirks” that you need to be aware of: see http://www.decisionmodels.com/calcsecretsh.htm
“In fact if you program the binary search in VBA…”
Charles… Could you kindly share the routine for this….
Sam,
I will try to cover Binary search and Evaluate in future Posts.
I haven’t had time to test but would be interested to see how the following worksheet function equivalents compare:
=FORECAST(D2,OFFSET(B:B,MATCH(D2,A:A),,-2),OFFSET(A:A,MATCH(D2,A:A),,-2))
=PERCENTILE(B:B,PERCENTRANK(A:A,D2,20))
The second formula assumes an increasing relationship (use 1-percentrank if relationship is decreasing) and I’m guessing is linear as opposed to log linear for binary search. Using full columns can be more convenient and doesn’t seem to add much overhead.
The Forecast formula takes 26 milliseconds so its quite a lot slower (and its volatile). I think the PERCENTILE would be even slower.
Seems a little slower than expected, so i tried on my 2010 setup and found Forecast quicker by a factor of at least ten (0.03 vs 0.42 filling down 10000 adjacent formulas and running t=timer:selection.dirty:?timer-t). However, I’m not an expert and there are clearly other factors in play.
This is shaping up to be a great blog: VBA refresh bug and read-write analysis very useful – many thanks!
I retried it using Excel 2010 and got 1.1 compared to 2 millisecs, so you are right, it is faster. But using Excel 2003 it takes 22 millisecs! I have no idea why its so much faster in Excel 2010.
The easiest way to compare the time of formulas is to use my free Rangecalc addin (usually in Manual Calc mode).
You can download it from my Downloads Page.
One reason why Excel 2010 is so much faster than Excel 2003 with Lori’s FORECAST formula is that Excel 2010 is much better at handling full-column references with MATCH.
When I shorten the full column references to just the 10200 rows of data the Forecast formula calculates in 1.0 millisecs compared to 22 with a full column.
Don’t yet know if this added cleverness of Excel 2010 applies generally or just to MATCH.
How do you go about speed testing? I’m trying to speed test my UDF’s and can’t seem to get a good way to report a millisecond timestamp.
I use the Windows high-res API wrapped around Range.Calculate: you can download a RangeCalc addin from my website that does this: download from http://www.decisionmodels.com/downloads.htm
What do you think about creating an array version? I mean I interpolate 14,000 query points at a time in a table that 20 breakpoints. I think an array version may be the best method in this case.
Yes array versions can be fast because your UDF only needs to read the input data once and the overhead of parsing the formula and function syntax only happens once. See https://fastexcel.wordpress.com/2011/06/20/writing-efiicient-vba-udfs-part5-udf-array-formulas-go-faster/
Thank you! This was helpful!