Excel Full Column References and Used Range: Good Idea or Bad Idea?

Some people love using full column references such as A:A, other people say they slow down your workbook to a crawl and lead to workbook and memory bloat.

The attraction of using whole column references is that you don’t have to adjust your formulas when additional rows of data are added or deleted.

So who is right?

Let’s run some simple tests to find out.

We need to test a variety of functions and formulas. I picked

  • SUM(A:A)
  • MATCH(99,A:A,0)
  • SUMPRODUCT(A:A)
  • COUNTIF(A:A,99)
  • {IF(A:A=1,1,0)} (array entered into each cell)
  • {IF(A:A=1,1,0)} (one formula array entered into D1:D100)
  • IF(A:A=1,1,0) (not array entered)

All the test cases have cell A1 set to a value of 1, and the formula is entered into D1:D1000.

To explore the effect of changing the used range and the number of cells containing formatting or data I used a variety of test cases:

  • Test Case 1 – Base case: no other data or formatting in the worksheet.

All the other test cases are derived from the base case, making only the changes mentioned for each case.

Formatting test cases:

  • Test Case 2 – entire column A is colored yellow
  • Test Case 3 – cells A2000:A1048576 are colored yellow
  • Test Case 4 – cell XFD1048576 empty but colored yellow
  • Test Case 5 – every other row in column A colored yellow

Cells with data test cases:

  • Test Case 6 – cell A1048576 contains 98
  • Test Case 7 – cells A2000:A1048576 contain 1
  • Test Case 8 – every other row in column A contains 1

The calculation times are given in Milli-seconds for the median of 3 single-threaded Range.Calculates on the 1000 formulas in D1:D1000. Screen updating time is excluded.

Test Results

FullColInterpretation of Test Results

1. Base Case: No Additional Formatting or Data

The most efficient formula is =IF(A:A=0,1,0)
This formula uses Excel’s implicit intersection method so that each cell containing the formula in D1:D1000 only looks at a single cell in the same row in column A.

The SUM, MATCH and COUNTIF formulas are efficient because each formula only looks at the 1000 cells in column A bounded by the used range (row 1000).

The SUMPRODUCT and single-cell array formula version of =IF(A:A=0,1,0) force Excel to look at every single one of the million cells in column A, so there are 1000*10000000 cells looked at. The IF formula is slower than the SUMPRODUCT because it has to create additional virtual columns.

The multi-cell array version of the IF is a single formula array-entered into the 1000 cells in column D. This single formula looks at 1000000 cells and then returns 1000 results. Because it looks at 1000 times fewer cells it is a lot faster.

2. Column A:A formatted Yellow

Formatting the entire column does not increase the used range and so has virtually no effect on the timings.

3. Format over a Million cells from from A2000:A1048576 Yellow

Formatting a large but contiguous block of cells does increase the used range if the start cell address is beyond the previous last used cell. But (surprisingly) it does not extend the used range to the last formatted row!
The timing results show a small increase for those formulas that are dependent on the used range, and the file size increases slightly.

4. Format the Last Cell in the worksheet (XFD1048576) Yellow.

This does dramatically extend the used range: the last used row is now the last row in the worksheet.
The timing results that depend on the size of the used range increase significantly, but the file size does not change.

The results are the same if you color the entire row 1048576 or just the single cell A1048576.

5. Format every other row in column A Yellow.

This creates a much more complex used range, even though the last used row is actually 1 less than test case 4. Excel has to store formatting information for each of half a million rows!
All the timings increase significantly
(not just the formulas that are dependent on the used range) because of the additional work need to traverse the much larger cell table, and the file size shoots up.

6. Set the last cell in column A to 98.

This extends the used range in the same way as formatting the last cell, and the timings and file size are virtually identical to test case 4.

7. Set cells A2000:A1048576 to 1.

You might expect this to behave the same way as test case 3 which just formats this block of cells rather than inserting a constant value.

But the time taken by the formulas dependent on the used range increases by a factor of around 1000, and even the timings for SUMPRODUCT and the array formulas about double. The file size also increases significantly.

Clearly adding data to a contiguous block of cells has a much bigger impact than just formatting a contiguous block of otherwise empty cells.

8. Set every other cell in column A to 1.

This results in about half as many cells containing data as test case 7, but the last used row is only 1 less.
The timings for formulas dependent on the used range increases by about a factor of 2 compared to test case 5, which only formatted the same cells, but decreases by about  a factor of 2 compared to test case 7 (which had about double the number of cells containing data). The file size is not much larger than test case 5.

This seems to show that formula calculation time is dependent on both the complexity of the used range that formula has to traverse and the number of cells containing data.

Conclusions

Excel’s sparse storage methods are efficient and allow for the use of whole column references and contiguous blocks of identical formats. But you need to be careful to minimize both the extent and complexity of the used range.

  • Formulas other than array formulas and SUMPRODUCT handle whole column references efficiently.
  • But avoid array formula and SUMPRODUCT usage of whole column references.
  • Non-array formulas using whole column references are much slower with large used ranges, and even slower with complex used ranges.
  • Formulas handle empty cells more efficiently than cells containing data.
  • Excel is optimized to handle contiguous blocks of identical formatting efficiently.
  • File size is not dependent on the size of the used range.
  • File size is largely dependent on the number of cells containing data/formulas and also on the number of non-contiguous formatted cells.

 

 

 

This entry was posted in arrays, Calculation, Formatting, Memory. Bookmark the permalink.

5 Responses to Excel Full Column References and Used Range: Good Idea or Bad Idea?

  1. Michael Tollefson says:

    Hi Charles — Has it changed in newer versions of Excel? In older versions, SUMPRODUCT() couldn’t handle whole-column references.

  2. MF says:

    Hi CHarles,
    Very insightful articles. Thanks for sharing.
    May I ask if there is significant impact of whole column reference on VLOOKUP?
    Cheers,

Leave a comment