Writing Efficient UDFs Part 11 – Full-Column References in UDFs: Used Range is Slow

Posted on December 2, 2012 by fastexcel

Excel users often find it convenient to use full-column references in formulas to avoid having to adjust the formulas every time new data is added. So when you write a User Defined Function (UDF) you can expect that sooner or later someone will try to use it with a full-column reference:

=MyUDF(A:A,42)

When Excel 2007 introduced the “Big Grid” with just over 1 million rows it became even more important to handle these full-column references efficiently. The standard way to handle this in a VBA UDF is to get the INTERSECT of the full-column reference and the used-range so that the UDF only has to process the part of the full-column that has actually been used. The example VBA code below does this intersection and then returns the smaller of the number of rows in the input range and the number of rows in the used range.


Public Function GetUsedRows(theRng As Range)
 Dim oRng As Range
 Set oRng = Intersect(theRng, theRng.Parent.UsedRange)
 GetUsedRows = oRng.Rows.Count
 End Function

The parent of theRng is the worksheet that contains it, so theRng.Parent.UsedRange gets the used range of the worksheet you want.

Two problems with this technique are:

Getting the Used Range can be slow.
The XLL interface does not have a direct way to access the Used Range, so you have to get it via a single-thread-locked COM call. (More on this later).

So just how slow is it to get the used Range?

I created a very simple UDF and timed the calculation of 1000 calls to this UDF for filled used ranges of between 10K rows and 640K rows.


Public Function CountUsedRows()
 CountUsedRows = ActiveSheet.UsedRange.Rows.Count
 End Function

It turns out that the time taken to execute this UDF is a linear function of the number of used rows in the used range.

And its quite slow, 1000 calls to this UDF with 640K rows of data takes 33 seconds!

When the used range is small you won’t notice the time taken, but for large used ranges with the big grid you certainly will. And the problem is that your UDF will do this check on every range that is passed to the UDF, even if its not really needed.

Colin points out that what affects the time is actually the number of cells containing data or formatting (or that previously contained data or formatting) rather than the last cell in the used range.

Speeding up finding the used range.

So you could start by only doing the used-range check when theRng parameter has a large number of rows:


Public Function GetUsedRows2(theRng As Range)
 Dim oRng As Range
 If theRng.Rows.Count > 500000 Then
 Set oRng = Intersect(theRng, theRng.Parent.UsedRange)
 GetUsedRows = oRng.Rows.Count
 Else
 GetUsedRows = theRng.Rows.Count
 End If
 End Function

This example only does the check if the user gives the UDF a range referring to more than half a million rows.

Another, more complicated, way of minimising the time is to store the number of rows in the used range in a cache somewhere and retrieve it from the cache when needed. The tricky part of this is to make sure that the used-range row cache always is either empty (in which case go and get the number) or contains an up-to-date number.

One way of doing this would be to use the Application AfterCalculate event (which was introduced in Excel 2007) to empty the cache. Then only the first UDF that requested the used range for each worksheet would use time to find the used range, and (assuming that the calculation itself did nothing to alter the used range) the correct number would always be retrieved.

The equivalent for Excel versions before Excel 2007 would be to use the Application SheetCalculate event to empty the cache for that particular worksheet. This technique would be less efficient since a worsheet may well be calculated several times in each calculation cycle.

As Colin points out, if you want to find the last row containing data it is faster to use Range.Find when you have many cells containing data.
Note that you can only use Range.Find in UDFS in Excel 2002 and later, and you cannot use the Find method at all from an XLL except in a command macro or via COM.


Public Function CountUsedRows2()
 CountUsedRows2 = ActiveSheet.Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
 End Function

So have you got any better ideas on how to process full-column references efficiently?

This entry was posted in Calculation, UDF, VBA, XLL. Bookmark the permalink.

17 Responses to Writing Efficient UDFs Part 11 – Full-Column References in UDFs: Used Range is Slow

John says:

December 3, 2012 at 1:57 pm

If the data is continuous (ie does not have any gaps in the data in the range), a binary check for a cell which is empty and for which the previous cell has data in it was a lot quicker for me by several orders of magnitude (my code was a bit rubbish but seemed sound enough).

Reply
- fastexcel says:
  
  December 3, 2012 at 2:43 pm
  
  @John, Good idea, but too risky in the general case.
  
  Reply
JAM says:

December 4, 2012 at 11:09 am

Would using a (dynamic) named range be a sensible way round this? Or are you specifically thinking of situations where you don’t have that level of control over the underlying source?

Reply
- fastexcel says:
  
  December 4, 2012 at 11:23 am
  
  @JAM, I was thinking of UDFs in a library being used by a variety of end-users. So yes, the user could create a dynamic range (or have one created for them in a template/solution), and in many cases this is an excellent solution, but is not suitable for a general-purpose UDF.
  
  Reply
Colin says:

December 4, 2012 at 11:32 am

Hi Charles,

It seems that the UsedRange property calculation speed is hugely affected not by the size of the used range but by how much of the used range actually contains data.

For example, if I start with an empty sheet and put values in A1 and A640000 only, 100 calls to the CountUsedRows UDF runs in 0.06 seconds. Then, if I put values in all the cells in A1:A640000, 100 calls to the CountUsedRows UDF runs in 9 seconds. A huge increase even though the usedrange address itself has not changed.

Conversely, if I put values in A1 and XFD1048576 only (so the usedrange is now the whole sheet), 100 calls to the CountUsedRows UDF runs in 0.09 seconds – not much of a change considering how much the usedrange has increased in size.

I’d speculate that this is because the UsedRange property is reading from the internal cells record, which only holds references to cells which actually have values/formatting etc.

For large ranges containing data and depending on what the UDF actually is meant to do, it might be better to use Range.Find() to determine the last used row, and then resize based on that. Range.Find() can be used in worksheet-called UDF’s from Excel 2002 and later.

Regards,
Colin

Reply
- fastexcel says:
  
  December 4, 2012 at 12:02 pm
  
  Hi Colin,
  
  Thank you: I kept meaning to do that test but never did! I think you are correct that its reading the Cell table which contains entries for all cells that have been touched even if the formatting and values etc have been cleared. If you create a large range in a1:a640000 with each cell containing data and then delete all the data except the last cell it is still slow.
  Range.Find could well be faster (although it won’t work in my XLL UDFs using the old XLM interface).
  
  Incidentally, have you noticed that since Excel 2007 referring to the usedrange does not reset it like it does in Excel 2003?
  Regards
  Charles
  
  Reply
Colin says:

December 4, 2012 at 1:12 pm

Hi Charles,

I hadn’t noticed that referring to the usedrange in XL 2007+ does not reset it like it did in XL 2003. I’ll have to investigate that when I have some time!

I forgot to mention on my previous comment that there is a drawback to using Range.Find() like that – it won’t find the last used row of data if that row has been hidden by filtering.

Regards,
Colin

Reply
Pingback: Count Distinct Or Unique Values – VBA UDF | RAD Excel
Johny Why says:

July 22, 2014 at 11:21 am

I needed to find the used range of a single column (ignoring other columns). Assuming the column data is contiguous, i find it by:
lastRow = worksheetfunction.countif(columnRange, “*”).

To get the last cell ref, i use:
columnRange.cells(lastRow).

And get the used range of the column with:
Range(columnRange.Cells(1), columnRange.Cells(iLastCel))

If there’s a faster way to accomplish this, i’d love to hear it!

Reply
- Johny Why says:
  
  July 22, 2014 at 11:23 am
  
  *i meant “lastRow”, not “iLastCell”
  
  Reply
- Johny Why says:
  
  July 22, 2014 at 11:45 am
  
  Another method to get the last cell containing data in a single column is:
  
  columnRange.Cells(1).End(xlDown)
  
  In 2010, with the column filled to row 1,048,576, the .End method took .03 seconds, and the CountIf method took .07 seconds.
  
  Reply
  - Johny Why says:
    
    July 22, 2014 at 11:48 am
    
    (times based just on finding the last cell containing data, and not including the extra step of getting the range of all the filled cells.)
Johny Why says:

July 22, 2014 at 12:28 pm

Regarding getting the used range of the entire sheet, this method appears to be orders of magnitude faster than .UsedRange:

Set rLast = Range(“a1”).SpecialCells(xlCellTypeLastCell)
Set LastUsed = Range(Cells(1), rLast)

Sadly, it does not work in a UDF 😦 Why?

Reply
- Johny Why says:
  
  July 22, 2014 at 12:32 pm
  
  is it related to the fact that a UDF cannot alter a worksheet? just guessing….
  
  Reply
Landon Ostraff says:

April 7, 2015 at 6:54 pm

In a similar vein to Johny Why’s use of End…

Public Function GetUsedRows(theRng As Range)
With theRng.Parent
GetUsedRows = .Cells(.Rows.Count, theRng.Column).End(xlUp).Row
End With
End Function

I’d also note that you might want to use CountLarge instead of Count to avoid an overflow error when more than 2.1 billion cells are in the passed range (that’s 2047 full columns). It depends on if you can trust your users not to try “A:BZT”.

Reply
fastexcel says:

April 7, 2015 at 10:07 pm

Using End(xlUp) does not find the used range, it finds the last VISIBLE cell containing data. So it does not work in the general case where stuff may be hidden or filtered, and in any case it does not find the used range. But yes, it is very fast. and often finding the last cell containing data is what you want to do anyway.

Reply
- Landon Ostraff says:
  
  April 7, 2015 at 10:24 pm
  
  Ah, good call. My code also makes the assumption that the left-most column in the passed range has the same non-blank visible cell as the other columns in the range (which is where doing a Find for a wildcard in the previous direction comes in).
  
  Reply

	Jim on VBA Corruption Registry F…
	fastexcel on VBA Corruption Registry F…
	Jean-Michel DITNER on VBA Corruption Registry F…
	Sandeep Kothari on Short-circuiting Excel Formula…
	Sandeep Kothari on Short-circuiting Excel Formula…
	Veit Stimpfl on Excel Memory Checking Tool: Us…
	fastexcel on Short-circuiting in Array…
	Sergei on Short-circuiting in Array…