Excel 2007/2010 Used Range Voodoo – Big Grid used range woes

June 27, 2011

I have been struggling with cleaning the used range on large workbooks with Excel 2007 and 2010.

Besides the UsedRange.Rows.Count problem discussed in the previous post I have met:

  • out of memory
  • Excel hangs
  • performance problems
  • Statusbar not updating

Out of Memory

One of the basic operations in cleaning the used range is to find the real last row and column (the last row and column containing data as opposed to formatting) and then to delete or clear the excess rows and columns beyond the real last row and column. Its better to use clear rather delete because deleting will truncate any references to the rows and columns being deleted, which may cause problems or errors. But sometimes (particularly with Excel 2007/2010) clear does not reset the used range so you may have to use delete.

With the big grid of 2007 and 2010 if the real last row and column are a long way from the used range last row and column you may have to delete a very large range of cells, and guess what: you get an Out of Memory message. So I started to try and find out what sized block of rows or columns would bypass this. But then I hit another problem:

Deleting rows not resetting the used range.

Deleteing the rows was not resetting the used range! And that was not the only problem:

Excel hung on Sheet.UsedRange

After several delete operations when trying to check the used range Excel hung.

Statusbar not updating

Trying to show the progress of the cleaning operation in the statusbar was frustrating: the statusbar message did not update after I set it from VBA.

The solution

After a lot of trial and error(s) I eventually found an instruction sequence that seems to work:

  • Use DoEvents before and after each call to Worksheet.UsedRange
  • Delete rows before deleting columns
  • Check the UsedRange after each clear or delete operation
  • Use blocksizes of 4096 rows and 512 64 columns.

The cleaning operation can still be very slow on large complex workbooks, but at least it seems to work!


Excel 2007/2010 range.count too many cells bug

June 22, 2011

Ian Bennett just sent me details of a bug in build 501 of FastExcel: the error message he got is -

VBA Error 6 is Overflow and line 1630 says

nCells = Worksheets(j).UsedRange.Count

nCells is dimmed as a Long and the maximum number a Long can hold is 2,147,483,647, and in the Excel 2007/2010 big grid you can have 16384 columns x 1048576 rows = 17,179,869,184 cells.
So it won’t fit in a Long!
No problem I thought, I will dim nCells as Currency (Currency data type can hold 922,337,203,685,477.5807).

Well it does not work: still gives overflow. The reason is a fundamental problem in the Excel Object model:
Range.Count is defined as a long so it will always overflow (before it returns the answer to your variable) if the Range is too large .

So it seems that the only answer is to roll your own function:

EDIT: Eric points out that you need to cope with multi-area disjoint ranges. Originally I was thinking of the UsedRange which is always rectangular. But Eric is right, so here is a revised function that handles multi-area ranges.

Function RangeCount(theRange As Range) As Currency
Dim nRows As Currency
Dim nCols As Currency
Dim rng As Range
Dim UnionRange As Range
If Not theRange Is Nothing Then

‘ eliminate area overlaps

Set UnionRange = theRange.Areas(1)
If theRange.Areas.Count > 1 Then
For Each rng In theRange.Areas
Set UnionRange = Union(UnionRange, rng)
Next rng
End If

‘ count each area separately

For Each rng In UnionRange.Areas
nRows = rng.Rows.Count
nCols = rng.Columns.Count
RangeCount = RangeCount + nRows * nCols
Next rng
End If
End Function

This works! So now all I have to do is fix all the places in FastExcel where this problem might occur, create a new build, test it and upload it …


Writing efficient VBA UDFs (Part5) – UDF Array Formulas go faster!

June 20, 2011

Just in case you thought the previous posts on writing efficient VBA UDfs (Part1, Part2, Part3, Part 4) meant we had finished making UDFs run faster, think again – its time to explore UDF Array Formulas.

Single and Multi-Cell Array Formulas

Excel array formulas can do amazing things. They are like ordinary formulas except that you enter them with Control/Shift/Enter rather than just enter.

There are two kinds of array formulae:

  • Single cell array formulae are entered into a single cell, loop through their arguments (which are often calculated arguments) and return a single answer.
  • Multi-cell array formulae are entered into multiple cells and return an answer to each of the cells.

With this power comes a cost: because array formulae are doing a lot of work they can be slow to calculate (particularly single-cell array formulas).

UDF Multi-cell Array Formulas go Faster!

You can break down the time taken by a VBA UDF into these components:

  • Overhead time to call the UDF.
  • Time to fetch the data thats going to be used by the UDF.
  • Time to do the calculations.
  • Overhead time to return the answer(s).

In the post on Excel VBA Read/Write timeings you could see that there was quite a significant overhead on each VBA read and write call, so that its usually much faster to read and write large blocks of data at a time.

So it sounds like a good idea to make your VBA UDF read as much data as possible in a single block and return data to Excel in as large a block as possible.
Enter the Multi-cell array formula – it does exactly that – and also minimises the calling overhead – and often it can read the data once and re-use it lots of times.

So how do you make a Multi-Cell Array formula?

Lets create an array version of the AverageTolE function shown in the first Writing Efficient VBA UDFs post.
The scenario is that you want to find the Averages of the data excluding a number of different tolerances rather just one tolerance.
To keep things simple I am assuming that

  • the tolerances are all in one row
  • both the data and the tolerances will be supplied as ranges
  • error-handling is largely ignored
  • the function returns a row of answers that correspond to the row of tolerances.

Public Function AverageTolM(theRange As Range, theTols As Range) As Variant
Dim vArr As Variant
Dim vArrTols As Variant
Dim v As Variant
Dim d As Double
Dim r As Double
Dim k As Long
Dim vOut() As Variant
Dim dTol As Double
Dim lCount As Long
On Error GoTo FuncFail
vArr = theRange.Value2
vArrTols = theTols.Value2
ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
On Error GoTo skip
For k = 1 To UBound(vArrTols, 2)
dTol = CDbl(vArrTols(1, k))
r = 0#
lCount = 0
For Each v In vArr
d = CDbl(v)
If Abs(d) > dTol Then
r = r + d
lCount = lCount + 1
End If
skip:
Next v
vOut(1, k) = r / lCount
Next k
AverageTolM = vOut
Exit Function
FuncFail:
AverageTolM = CVErr(xlErrNA)
End Function

The changes to the UDF are quite simple:

  • theTols range is coerced into a variant array: vArrTols = theTols.Value2
  • an output array of the same size is created: ReDim vOut(1 To 1, 1 To UBound(vArrTols, 2))
  • The UDF loops on the tolerance array and populates the output array
  • The output array is assigned to the function variable: AverageTolM = vOut

Note that the Function is declared as returning a variant (which will contain an array) rather than being declared as returning an array of variants.

Assuming that the data is in H27:AA27 then enter the array function with Ctrl/Shift/Enter as

{=AVERAGETOLM(Data!$A$1:$A$32000,$H$27:$AA$27)} into 20 rows (so we will get 20 x 20 = 400 cells of answers).

Calculating this 20 formulas takes 975 milliseconds.

Using the original AVERAGETOLE formula for the 400 cells takes 1660 milliseconds, an improvement factor of 1.7

Summary

  • In many real-life cases using multi-cell array UDFs can be the fastest way to calculate.
  • Converting a conventional UDF to a multi-cell array UDf is straightforward.

Writing efficient VBA UDFs (Part 4) – Variants, References, Arrays, Calculated Expressions, Scalars

June 20, 2011

In part 1 and part 2 of “Writing efficient UDFs” I used parameters defined as Range to get data from Excel.

Function VINTERPOLATEB(Lookup_Value As Variant, Table_Array As Range, Col_Num As Long)

This works OK if the function is called from a formula using a range:

=VINTERPOLATEB($E5,$A$10:$C$10200,2)

but results in #Value if you use a calculated expression or an array of constants:

{=VINTERPOLATEB($E5,($A$10:$C$10200*1),2)}

This formula has to be entered as an array formula using Control/Shift/Enter (don’t enter the { … } , Excel will add them).

=VINTERPOLATEB(4.5,{1,3,3.5;4,4,4.5;5,4.5,5},2)

This uses a 3 column 3 row array constant. You do have to enter the { … } surrounding the constants, but it does NOT have to be entered as an array formula. The , separates the columns and the ; separates the rows.

Excel detects that these parameters are not Ranges before even calling the function.

You can fix this by defining the parameter as a variant rather than a range: a variant parameter can hold virtually anything! But the UDF now has to handle all the different types of data that the Variant might contain.

One simple approach is to assign the parameter to a Variant: this will coerce everything to values:
Function TestFunc(theParameter As Variant)
Dim vArr As Variant
vArr = theParameter
TestFunc = vArr
End Function

=TestFunc($A$10:$A$15*1)

In the VBE putting a breakpoint (use F9) on the return line and showing the Locals window results in this: you can see that Varr contains Error 2015 which is #Value

Thats because I forgot to array-enter the formula, here is the Locals for the array-entered formula:Now you can see in the Locals window that the vArr variant contains a 2-dimensional array of variants with a sub-type of double.

Entering =testfunc({1,2,3;5,6,7}) also results in a 2 dimensional array:

but =testfunc({1,2,3}) results in a 1-dimensional array!:

whereas =testfunc({1;2;3}) gives a 2-dimensional array!;and =testfunc(45) gives a scalar, not an array;

If you give a range as the parameter =testfunc($A$10:$A$15) then you get this

Notice that theParameter variant contains an object of sub-type Range, which means you have to treat it as a Range Variable, whereas the vArr contains the values extracted from the Range.

Determining Type and Dimensions for a Variant parameter

So in a general purpose UDF you want to use Variant parameters, and you often need to determine the type and upper and lower bounds of the variant.

For maximum efficiency you cannot just use vArr=theVariant, because:

  • You cannot use .Value2 because it might not be a range.
  • In many cases you want to manipulate the Range object before/instead of just coercing all its values.

So here is a function to determine what has been passed, and how large it is:

Function Variant_Type(theVariant As Variant)
Dim jRowL As Long
Dim jRowU As Long
Dim jColL As Long
Dim jColU As Long
Dim jType As Long
Dim varr As Variant
'
' theVariant could contain a scalar, an array, or a range
' find the upper and lower bounds and type
' type=1 range, 2 2-d variant array, 3 1-d variant array (single row of columns), 4 scalar
'
On Error GoTo FuncFail
jType = 0
jRowL = 0
jColL = 0
jRowU = -1
jColU = -1
If TypeName(theVariant) = "Range" Then
jRowL = 1
jColL = 1
jRowU = theVariant.Rows.Count
jColU = theVariant.Columns.Count
jType = 1
ElseIf IsArray(theVariant) Then
jRowL = LBound(theVariant, 1)
jRowU = UBound(theVariant, 1)
On Error Resume Next
jColL = LBound(theVariant, 2)
jColU = UBound(theVariant, 2)
On Error GoTo FuncFail
If jColU < 0 Then
jType = 3
jColL = jRowL
jColU = jRowU
jRowL = 0
jRowU = -1
Else
jType = 2
End If
Else
jRowL = 1
jRowU = 1
jColL = 1
jColU = 1
jType = 4
End If
Variant_Type = jType
Exit Function
FuncFail:
Variant_Type = CVErr(xlErrValue)
jType = 0
jRowU = -1
jColU = -1
End Function

Note that the first test is whether the variant contains a Range. This is to avoid inadvertently coercing a Range to its values. Also there are several ways in VBA in determining the sub-type of a variant:

  • If TypeOf theVariant Is Range Then
  • If TypeName(theVariant) = “Range” Then

Beware of trying to use VarType(theVariant) : this does an under-the-covers coerce of a Range and then throws the resulting values away! (Expensive for large ranges).

Conclusion:

In a general purpose UDF you have to use Variant type parameters rather than Range type.
You can handle this efficiently by determining what the variant contains before processing it.


Multi-threaded UDFs – Technology, Locking, Race conditions and Deadlocks

June 19, 2011

One of the major disadvantages of VBA (and VB6) UDFs is that they cannot be multi-threaded. And since everyone now has PCs with multiple cores and Excel 2007 or Excel 2010 (well except for my better half Jane who is currently struggling with a rather ancient laptop) this is starting to be a problem for us Excel speed freaks/geeks.

Thats one of the reasons why I decided to rewrite my next generation of fast UDFs as C++ XLLs.
I am using XLL+ Version 7 and VS 2010: using these tools to make a UDF multithreaded is easy: just tick the multi-threaded option and away you go.

I am now about 6 months into learning how to write these C++ UDFs things and its mostly going OK. Of course there are minor hiccups along the way: like when it took me half a day to figure out why this numeric parameter always arrived in the UDF as 1 regardless of whatever value was passed in! (If you really want to know its because Excel and the Oper data structure don’t really know about integers: they insist on thinking they are primitive booleans. I knew that happened on output – always use doubles – but stupidly had not considered that it would also happen on input …).

Anyway I have now written most of my simple functions: they all work OK and calculate fast, so obviously its now time to get overconfident and go for a real challenge.

Multi-threaded UDFs sharing a global resource.

One of the tricks involved in writing faster Lookup and Matches is to be able to store the row number in the input range where the answer was found and use it the next time the Lookup or Match is executed. This technique is fast with unsorted data.

For unthreaded UDFs this is conceptually straightforward: just store the row number in a global container of some kind. But for multithreaded UDFs its not so simple because you can have multiple instances of the UDF being executed on different threads and all trying to update or read the container at the same time – a definite no-no for multi-threading. The answer is to be able to set a lock on the global container.

After a lot of Googling I decided to use a shared lock from the Boost library and either a Map container from the Standard Template Library (STL), or an unordered Map from Boost. (Both of these libraries contain an enormous framework of containers and algorithms and other excellent stuff). The shared lock allows only 1 thread to update but multiple threads to read.

There were a few glitches getting VS2010 to acknowledge the Boost libraries, but once I had figured that out it all works and seems quite efficient (half a million updates and half a million reads in about 1 second using 8 cores). The unordered Map using integer keys is fastest.

The next thing to get my head around is how to handle multiple global containers at the same time. Simples! I thought, you just have a separate Lock for each container. But I suspect this could lead to the dreaded “race condition”  “deadlock” where one thread holds one lock and is waiting for another lock, and another thread holds that lock and is waiting for the first thread: the corresponding image is 2 meerkats chasing each other’s tails.
(Mike Woodhouse points out that a “race condition” is having 2 threads trying to update the same thing at the same time.)

Time for a cold towel wrapped round my head, swiftly followed by a large glass of Central Otago Pinot Noir (fantastic stuff).

Afterthought:

Having slept on it I think I need a timeout on the locks as well (I wondered why there was a timeout parameter on the shared lock). The problem is that even if all MY udfs depend on a single lock the user could be using someone elses UDFs at the same time that could have their own lock:

If

  • MyXLL creates MyLockUDF and OtherXLL creates OtherLockUDF
  • Formula1 =MyLockUDF()+OtherLockUDF()
  • Formula2 =OtherLockUDF()+MyLockUDF()
  • Formula1 and Formula2 are on separate threads being calculated at the same time

then there could be a deadlock.


VBA Read/write Speed Formula – Benchmarking Excel Versions

June 13, 2011

Previous posts on Writing Efficient UDFs Part 1 and Part 2 have looked at the importance of optimising the way you transfer data between Excel and VBA.

This is often a major factor in the speed (or lack of it) of VBA Macros. So this post shows you how the transfer time varies according to the amount of data you transfer, compares these times for recent Excel versions and derives a simple formula that enables you to predict the times.

The timings in this post were done using my Dell Latitude D830 laptop, which has a 2.39 GHz Intel Core 2 T7700.
The method used is to read a block of data from Excel cells into a Variant array and write a block a block of data from a Variant array into Excel cells. The size of the blocks is varied, and the time is measured for each read and write block.

When you plot time versus block size for this data you get a straight line, and you can use Excel’s LINEST function to find out what the slope and intercept of the line is:

Read/Write Times versus block size


This chart is for Excel 2010 32-bit (Excel 14). The blue line is for Read and the red line is for Write, and you can see that Reading data is much faster than writing data.
The formulae are:

  • Read: Time in Milliseconds=0.0023 + 0.00042 * Number of cells
  • Write: Time in Millseconds=0.028 + 0.00174 * Number of cells

The straight line is a good fit (Rsquared=0.996).

The overhead time associated with each read/write operation is significant: this is why it is so much faster to read/write large blocks of cells rather than read/write one cell at a time.

The formulae predict that reading and then writing a Range of 20 columns by 200 rows would take:

  • Single read and write of the entire range: 9 Millisecs
  • Reading and writing each cell in the range one-by-one: 132 Millisecs

Looking at how these times have varied by Excel version gives this chart:

Read Write times by Excel Version

I have left out the timings for Excel 2007 before the SP2 update: they were extremely slow!

You can download a workbook from here that contains all the code you need to run these benchmarks, charts and derive the equations on your system.


Writing efficient VBA UDFs (Part 3) – Avoiding the VBE refresh bug – Updated

June 13, 2011

In Part 1 and Part 2 of Writing Efficient VBA UDFs I looked at some simple ways of changing the VBA code you write to make it run massively faster. In this post I look at a bug in Excel that slows down your UDFs and show you how to avoid it.

A few years back I was having trouble working out why VBA UDFs ran so much faster on my PC than on someone else’s system (I think it was Jan Karel Pieterse’s). Eventually I tracked it down to the fact that I had FastExcel installed. And further investigation showed that FastExcel made the VBA UDFs ran faster because the calculation was initiated by FastExcel using VBA.

It turned out that the underlying reason was a little bug in the EXCEL Visual Basic Editor (the VBE) : each formula that contains a UDF changes the VBE title bar to say “Running” whenever a UDF is being executed during the calculation, and then switches it back again when the UDF has finished.

The VBE Title Bar

The word [Running] gets inserted into the title bar after the name of the workbook. To make this happen requires the VBE to send a message to the Windows screen handler and the window containing the VBE gets refreshed:

VBE Title Bar with [Running}

This is a CPU intensive operation: on my system I created a volatile do-nothing UDF (using Excel 2010 32-bit):

Function VolUDF()
Application.Volatile
End Function

And entered it A1:A10000 (ten thousand cells). Then I resized the windows so that I could see both the Excel window and the VBE windows at the same time and triggered a calculation: I could see the VBE title bar flashing.

The calculation took 7.3 seconds.

Then I closed the VBE window and triggered another calculation.
This time it took 3.6 seconds.

Then I saved the workbook with the VBE window closed, closed Excel, reopened Excel and the workbook and triggered another calculation:
This time it took 1.1 seconds.

Then I initiated the calculation from VBA using Application.Calculate:
This time it took 0.058 seconds. That’s an improvement factor of 125.

And it still takes 0.058 seconds using Application.Calculate even with the VBE window open and visible.

Of course if you only have a few formulae using UDFs you will never notice this slowdown in calculation time, but on my fast system even 1000 formulas using UDFs will take an additional 0.7 seconds to calculate.

Bypassing the refresh bug

OK, so now you know what’s happening how do you avoid this problem? (By the way this bug is present in all versions of Excel from Excel 97 to Excel 2010 (both 32 bit and 64 bit).)

Trapping Calculation in Manual calculation mode.

If Excel is in Manual Calculation mode you can trap all the keystrokes that trigger a calculation, and initiate the calculation from your VBA code.

You need calculation subs and keytraps for

  • Shift/F9 – Activesheet.Calculate
  • F9 – Application.Calculate
  • Ctrl/Alt/F9 – Application.CalculateFull
  • Ctrl/Alt/Shift/F9 – Application.CalculateFullRebuild

In the ThisWorkbook module add the key trapping subs:

Private Sub Workbook_Open()
Application.OnKey "+{F9}", "SheetCalc"
Application.OnKey "{F9}", "ReCalc"
Application.OnKey "^%{F9}", "FullCalc"
Application.OnKey "+^%{F9}", "FullDependCalc"
End Sub

and in an ordinary module add the corresponding calculation subs:


Sub SheetCalc()
ActiveSheet.Calculate
End Sub
Sub ReCalc()
Application.Calculate
End Sub
Sub FullCalc()
Application.CalculateFull
End Sub
Sub FullDependCalc()
Application.CalculateFullRebuild
End Sub

(The equivalent of these procedures is built-in to the FastExcel add-in)

In Automatic Calculation Mode

Unfortunately I do not know of a way of bypassing the problem using VBA in Automatic Calculation Mode.
Closing the VBE window and reopening Excel will improve things substantially, but if you are creating workbooks with UDFs for other people to use you cannot control whether they will have the VBE open or not.

The only solution seems to be to use some non-vba technology such as a VB6 Automation Addin, .NET addin, or XLL addin, but each of these approaches has their own difficulties, which I plan to cover in a future post comparing Excel UDF Technologies.

Update

A useful idea about handling the problem in Automatic Mode comes from Francisco Aller Labandeira.

He suggests adding this code to the Workbook module.

Private Sub Workbook_SheetChange(ByVal Sh As Object, ByVal Target As Range)
If Application.Calculation = xlCalculationAutomatic Then
Application.Calculation = xlCalculationManual
Exit Sub
End If
Calculate
End Sub

This has the drawback that it won’t stop the first automatic calculation, but will catch subsequent calculations. You could also perhaps add some code to the Workbook BeforeSave event to switch back to Automatic if appropriate.

Conclusion:

If you need to use a large number (>1000) of formulas referencing VBA UDFs you will need to use Manual Calculation mode and add calculation key-trappers and handlers to your workbooks.


Writing efficient VBA UDFs (Part 2) – using Excel Functions inside a UDF

June 6, 2011

In part 1 of Writing efficient VBA UDFs I looked at more efficient ways for the UDF to process a Range of data by reading it all into a Variant array. In this post I look at a case (using Excel Functions from within the UDF) where the most efficient way is somewhat different.

Linear interpolation is a commonly used technique for finding missing values or calculating a value that lies between the values given in a table.
Suppose you have a table of values like this:

Level

Flow1

Flow2

64.00

0.10

2.59

64.50

0.77

3.18

65.00

2.19

3.73

65.50

4.02

4.28

66.00

6.19

6.88

66.50

8.64

12.04

67.00

11.36

13.85

67.50

13.45

14.84

68.00

15.00

16.37

68.50

16.41

21.12

69.00

17.71

21.68

And you want to find out what the value of Flow1 would be for a level of 66.25. Assuming that the value is on a straight line between the Flow for 66.0, which is 6.19, and the flow for 66.5, which is 8.64, you can calculate it like this:
The difference between 8.64 and 6.19 is 2.45, and 66.25 is half-way between 66.0 and 66.5, so add half of 2.45 to 6.19=7.415.

As a formula this becomes =6.19+(8.64-6.19)*(66.25-66.0)/(66.5-66.0)

So writing a UDF the same way as in Writing VBA UDFs Efficiently Part 1 you get this (ignoring error handling etc. for the sake of simplicity):

Function VINTERPOLATEA(Lookup_Value as Variant, Table_Array as Range, Col_Num as long)
Dim vArr As Variant
Dim j As Long
' get values
vArr = Table_Array.Value2
' find first value greater than lookup value
For j = 1 To UBound(vArr)
If vArr(j, 1) > Lookup_Value Then
Exit For
End If
Next j
' Interpolate
VINTERPOLATEA = (vArr(j - 1, Col_Num) + _
(vArr(j, Col_Num) - vArr(j - 1, Col_Num)) * _
(Lookup_Value - vArr(j - 1, 1)) / (vArr(j, 1) - vArr(j - 1, 1)))
End Function

Where the Lookup_value is the value to find in the first column of the range Table_Array and Col_Num gives the column number index of the data to be interpolated (in this example 2).

This is reasonably efficient: 20 formulas interpolating on a table 10000 rows long takes 323 millisecs.

But of course we can do better!
When you look at this UDF you can see that the actual calculation only uses 2 rows of data, but to get that 2 rows it has to:

  • import 10000 rows by 3 columns of data into an array
  • do a linear search on the first column.

This sounds suspiciously like a LOOKUP or MATCH.

So lets try using Excel’s MATCH function instead: you can call MATCH from inside your VBA UDF using Application.WorksheetFunction.MATCH. And since the data is sorted we can use approximate MATCH. Once we have got the row number from MATCH we can get the 2 rows we are interested in.

The new UDF looks like this (again I am ignoring error handling for the sake of simplicity):


Function VINTERPOLATEB(Lookup_Value As Variant, Table_Array As Range, Col_Num As Long)
Dim jRow As Long
Dim rng As Range
Dim vArr As Variant
'
' create range for column 1 of input
'
Set rng = Table_Array.Columns(1)
'
' lookup Row number using MATCH
' Approx Match finds row for the largest value < the lookup value
jRow = Application.WorksheetFunction.Match(Lookup_Value, rng, 1)
'
' get 2 rows of data
'
vArr = Table_Array.Resize(2).Offset(jRow - 1, 0).Value2
'
' Interpolate
'
VINTERPOLATEB = (vArr(1, Col_Num) + _
(vArr(2, Col_Num) - vArr(1, Col_Num)) * _
(Lookup_Value - vArr(1, 1)) / (vArr(2, 1) - vArr(1, 1)))
End Function

Once MATCH has found the row we can use Resize and Offset to subset the range to just the 2 required rows. (Thanks to Jim Cone for reminding me that you need to do the Resize before the Offset to make sure that the Offset does not cause an error by moving off the boundaries of the Worksheet).

This UDF takes 2 milliseconds on the test data, 160 times faster!

Update: As Peter Sestoft points out, much of the improvement is due to approximate Match using binary search rather than the linear search used in VINTERPOLATEA. In fact if you program the binary search in VBA inside the UDF it takes 3.3 milliseconds which is only 1.7 times slower than using MATCH.

Note: there are 2 ways of calling Excel Functions such as MATCH from VBA: Application.Match and Application.WorksheetFunction.Match. The differences are mostly in error handling (for instance when no match is found for the exact match option):

  • Application.Match returns a Variant containing an error, which allows the use of IsError: If IsError(Application.Match …)
  • Application.WorksheetFunction.Match raises a VBA error which requires an On Error handler.

Also WorksheetFunction.Match is somewhat faster.

So we need to add some error handling and boundary case handling:

  • Use On Error to trap non-numeric data
  • Check for the lookup value being outside the range of data in the table
  • Check if the lookup value is the last value in the table

Then the UDF looks like this:


Function VINTERPOLATEC(Lookup_Value As Variant, Table_Array As Range, Col_Num As Long)
Dim jRow As Long
Dim rng As Range
Dim vArr As Variant
Dim vValue As Variant
On Error GoTo FuncFail
Set rng = Table_Array.Columns(1)
' Check for case if val = last value in rng
vValue = rng.Cells(rng.Rows.Count, 1).Value2
If Lookup_Value = vValue Then
VINTERPOLATEC = Table_Array.Cells(rng.Rows.Count, Col_Num).Value2
Exit Function
End If
' Return an error if lookup_value is not within rng
If Lookup_Value > vValue Or Lookup_Value < rng.Cells(1).Value2 Then
VINTERPOLATEC = CVErr(xlErrNA)
Exit Function
End If
' lookup Row number using MATCH
jRow = Application.WorksheetFunction.Match(Lookup_Value, rng, 1)
' get 2 rows of data
vArr = Table_Array.Resize(2).Offset(jRow - 1, 0).Value2
' Interpolate
VINTERPOLATEC = (vArr(1, Col_Num) + _
(vArr(2, Col_Num) - vArr(1, Col_Num)) * _
(Lookup_Value - vArr(1, 1)) / (vArr(2, 1) - vArr(1, 1)))
Exit Function
FuncFail:
VINTERPOLATEC = CVErr(xlErrValue)

End Function

Conclusion:
The only thing faster than bringing all the data across to VBA in one lump is to use Excel functions to bring across only the minimum data your function needs.


Follow

Get every new post delivered to your Inbox.