In the previous post The Xips challenge – How fast does Excel Calculate I managed to make Excel calculate 6.6 million simple formulae in under a second, and promised to come back in future posts to try to answer my question:
If Excel can calculate formulae this fast how come my spreadsheet takes several seconds to calculate?
And of course the follow-up to that is:
And how do I make it go faster?
So this is the first in what will hopefully be a series of posts trying to answer both these questions.
Lets start with a simple example:
Calculating cumulative sums.
Suppose you have a series of dated payments and you want to calculate the cumulative sum of all the prior payments for each date.
The formula in D6 is =SUM($B$6:$B6) and this is then copied down for all the 10000 rows, so the formula in row 5000 is =SUM($B$6:$B5000), and the formula in row 10005 is =SUM($B$6:$B10005): so each formula sums all the previous rows.
So how do you count the XIPS?
There are only 10000 formulas, but they take 1.2 seconds to calculate (just over 8000 formulas per second: thats a long way off the potential of 6.6 million XIPS).
Well of course the reason this is so slow is that each formula refers, on average, to a large number of cells.
On average each formula sums 5000 cells, so thats a total of 10000 x 5000 = 50 million operations Excel has to do. Its probably time to change the terminology here: lets call this 50 MXOPS (Millions of eXcel Operations per Second). This is much faster than the 6.6 Million XIPS in the previous post, because we only have 10000 formulas instead of 6.6 million formulas and the overhead of interpreting the formulas is making the difference.
Counting the number of operations Excel has to do on cells (XOPS) (or virtual cells when using array formulae) can be a useful proxy predictor of calculation time, and one that should make you start thinking about how you might improve things.
OK, but do you make Excel calculate this faster?
If you look at what the formula are doing you soon discover that the cells near the beginning of column B are each being summed thousands of times. So if you can avoid this duplication of XOPS things should go faster. This is easy, just make each formula add the value from this row to the cumulative sum result from the previous row:
The formula in F6 becomes =B6, and the formula in F7 becomes =F6+B7. The F7 formula is then copied down the remaining rows.
Now each formula references just 2 cells, so thats 2 x 10000 = 20000 operations. And now the calculation takes just 0.011 seconds, an improvement factor of 110. Notice that the XOPS went from 50 million to 20 thousand, and improvement factor of 2500, but the calculation time only improved by a facor of 110. So there are other things besides XOPS that consume calculation time, such as:
- interpreting the formulae
- calling the SUM function
- returning the results to the cells
- reformatting the results
- repainting the screen
but counting the cell operations is still a good way of thinking about calculation time.
Conclusion
If you have got slow formulae its usually because the formulas are making Excel do a lot of work.
A useful way of quantifying that is to estimate the the number of real or virtual cell operations Excel has to do (XOPS).
Because Excel formulas are so flexible there is usually a different way of achieveing the same result, but using much fewer XOPS.
You just have to think creatively!