2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 120,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 5 days for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment

Stacking (Appending) Arrays/Ranges in Excel: Formulas versus Functions Shootout

This post is a follow up to my last post Using Constant Arrays and Expressions in Excel Formulas.
I will explore how some of the general purpose array-handling functions in FastExcel SpeedTools Extras compare with formulas using native Excel functions.
I asked Sam (sgbhide@gmail.com who frequently comments on these blog posts) to collaborate by creating the formulas using native Excel functions. He has done a great job!

Why would you want to stack/append Arrays/Ranges anyway?

A few functions allow you to input multiple arguments that effectively get stacked together. For example this formula works well:

=SUM(Sheet1!$A$1:$B$10, Sheet2!$Y$43:$Z$99)

But if you need to do a Lookup across multiple sheets this formula does not work:

=VLOOKUP($C$32,(Sheet1!$A$1:$B$10, Sheet2!$Y$43:$Z$99),2,False)

And you cannot use SUMIF like this:

=SUMIF((A2:A5,D7:D10),”<3″,(B2:B5,E7:E10))

Similarly if you are using a separate worksheet for each time period (week, month …) you often want to stack the time periods alongside one another.

In many cases there are ways of handling these situations using standard Excel Formulas, but often these methods are slow or complex.

Creating arrays containing a mixture of Array Constants and Ranges, stacking row-wise or column-wise.

As the last post showed you cannot put range references directly between the { … } of an array constant.
But Sam has some cunning formulas using the array version of CHOOSE:

Stacking Rows:

arrays21

Stacking Columns:

arrays22

The first argument for CHOOSE indicates which of the following arguments to return. But if you give an array as the first argument all the value choices are returned.
If you want to stack rows above one another you use a semi-colon (;) separator in the first CHOOSE argument, and if you want to stack columns alongside one another you use a comma (,), (if you use the wrong separator CHOOSE returns weird answers).

Note: you have to enter these formulas as multi-cell array formulas by selecting multiple cells, typing the formula and then pressing Control+Shift+Enter.

Unfortunately this method only works with either single columns or single rows: it does not handle 2 dimensional ranges.

So Sam came up with this method of handling 2-D ranges instead (but I have not got space to explain it!):

Suppose we have 2 ranges with defined names of Arry1 and Arry2:

Arrays24

And we have some more defined names containing formulas like this:

Arry1 =Sheet1!$D$6:$E$7
Arry2 =Sheet1!$D$10:$F$13
PadArry1 =(–(ROW(INDIRECT(“1:”&TotRows))*IF(COLUMN(INDIRECT(“RC1:RC”&TotCols,0)),1)<=rRows))
PadArry2 =(–(ROW(INDIRECT(“1:”&TotRows))*IF(COLUMN(INDIRECT(“RC1:RC”&TotCols)),1)>rRows))
rRows =ROWS(Arry1)
TotCols =MAX(COLUMNS(Arry1),COLUMNS(Arry2))
TotRows =ROWS(Arry1)+ROWS(Arry2)

Then this formula, entered as a multi-cell array formula into 6 rows by 3 columns:

{=PadArry1*OFFSET(Arry1,,,TotRows,TotCols)+PadArry2*OFFSET(Arry2,-rRows,0,TotRows)}

Produces this:

arrays26

Notice that Sam’s cunning set of formulas has padded the gaps in the top 2 rows of column 3 with zeros, rather than leaving them as #N/A.
This approach only works when all the cells in the ranges are numeric because Sam has used + to add the numbers together.

If some of the columns in the ranges are alphabetic and some are numeric life gets a lot more difficult.

Using FastExcel SpeedTools ROW.ARRAY and COL.ARRAY Functions

Well Sam’s formulas are difficult for many users so I developed some FastExcel SpeedTools array functions to make life easier.
To stack ranges below one another and pad out the gaps use ROW.ARRAY. The equivalent to Sam’s set of formulas is

{=ROW.ARRAY(0,Arry1,Arry2)}

The first argument (0 in this case) dictates what to use to pad out missing values.
To do a lookup on both these arrays use

=VLOOKUP(95,ROW.ARRAY(0,Arry1,Arry2),3,False) returns 7 (this does not need to be array entered).

You can use COL.ARRAY in a similar way to stack ranges alongside one another.

Note: ROW.ARRAY and COL.ARRAY are limited in total to the Excel row and column limits.

More FastExcel V3 Array functions

The FastExcel Arrays functions include:

  • ROW.ARRAY
  • COL.ARRAY
  • VECTOR
  • PAD.ARRAY
  • REVERSE.ARRAY

Conclusion

It is useful to have a set of efficient functions to stack/append ranges and arrays.

Challenge: can you provide more efficient methods using standard Excel formulas?

Posted in arrays, Calculation, Lookups, Uncategorized, XLL | Tagged , , , , , , , | 4 Comments

Using Constant Arrays and Array Expressions in Excel Formulas

How Excel handles array expressions and constants in single-cell and multi-cell formulas is not well documented.
So here is my attempt to shed some light on the subject.

What are Constant Arrays?

Constant arrays are Excel formula expressions that allow you to create arrays containing numbers, text, logical values or error values. Constant arrays are enclosed in braces ({}).
{1,”Q”,#N/A,TRUE}
Columns are separated from one another by , and rows are separated by ; so
{1,2,3;”A”,”B”,45.6;#N/A,”Z”,99} creates

Arrays1

You have to enter all the items in a row first before adding ; to move to the next row.
Constant arrays must be rectangular: each row must contain the same number of columns.

Constant arrays do not allow you to create arrays containing Dates or Ranges or Defined Names or Expressions or other Constant Arrays.

What are Array Expressions?

Array expressions are parts of Excel formulas that produce a result array of values when they are evaluated. Some examples are:

A3:A6 – returns a column array of the 4 values in cells A3 to A6
A3:A6*10 – returns a column array of the 4 values in cells A3 to A6, each multiplied by 10
A3:A6*{1;2;3;4} – returns a column array of the 4 values A3*1 , A4*2, A5*3, A6*4
Each element gets multiplied by the corresponding element in the other array

A3:A6+B1:B4 – returns a 4 value column array – A3+B1, A4+B2, A5+B3, A6+B4

What happens when the Arrays in an expression are  different sizes or shapes?

When combining 2 or more component array expressions that are different sizes or shapes Excel expands each of the component arrays so that they are all the same size.

  • The expanded arrays will have the same number of rows as the greatest number of rows across all the components.
  • The expanded arrays will have the same number of columns as the greatest number of columns across all the components.

Suppose you have 2 ranges A1:B4 and C1:D1

Arrays3     Arrays2

Then the array expression A1:B4*C1:D1 will expand each of the component arrays to 4 rows by 2 columns:

Arrays4 and  Arrays5

Then, since the array expression says multiply, each corresponding element in the first array gets multiplied by the corresponding element in the second array, like this:

Arrays7 which creates this 4 row 2 column result array  Arrays6

The Expansion Rules

Expanding two component arrays is straightforward if one component is only a single value or a single cell: the single value/cell gets duplicated as required.
This method is also used when one component array is a single row and the other component is a single column.

But Excel does something different when one or more of the components is a 2-dimensional array and the other component array has a miss-matching but greater than 1 number of rows or columns.
Rather than repeat it fills the missing cells with #N/A.

So if you have a range A1:C4 with 3 columns and 4 rows multiplied by a range E1:F1 with 1 row and 2 columns (A1:C4*E1:F1), the E1:F1 gets expanded across by adding a #N/A:

Arrays8 and Arrays9 you get this expansion

Arrays10 which gives this result  arrays11

Note that Excel expanded E1:F1 downwards because it was a single row.

Multi-cell Array Formulas

To return the resulting array from an array expression to Excel you need to enter the formula as a multi-cell array formula:

  • Select the cells required to match the dimensions of the result array, for instance J1:L4
  • Type the formula in the formula bar: Arrays12
  • Enter the formula by holding down Control and Shift and then press Enter.
  • The formula bar now shows Arrays13 Excel has added the curly braces to the formula to show you that its an array formula: don’t add the curly braces yourself.

If selected cells smaller than the result array

If the range selected as the output for the multi-cell array formula has less rows or columns than the result array then Excel still calculates the complete result array but only returns the subset of the array values that match the selected cells positions.

If selected cells larger than the result array

If the range selected as the output for the multi-cell array formula has more rows or columns than the result array then Excel pads out the excess cells with #N/A.

Built-in Excel functions designed to return Arrays

Some of the native Excel functions are designed to return multi-cell arrays, including:

  • TRANSPOSE
  • MODE.MULT
  • TREND
  • FREQUENCY
  • LINEST
  • MMULT
  • MUNIT
  • MINVERSE

Single-cell Array Formulas

Many of Excel’s built-in functions such as SUM, AGGREGATE, MAX, MIN, AVERAGE etc. can calculate array expressions  when entered into a single cell using Ctrl+Shift+Enter.

For example =SUM(A1:A4) can be entered as an ordinary formula, but SUM(A1:A4*10) has to be entered using Control+Shift+Enter, otherwise it just returns #Value.

A few of Excel’s built-in functions such as SUMPRODUCT can calculate array expressions without being entered as an array formula.

Posted in arrays, Calculation | Tagged , , , , | 8 Comments

Calling XLAM/XLL/Automation UDFs from VBA: Evaluate, Run, or Reference?

I have this nagging feeling that I do not know the best way to call UDFs from VBA.

So after a couple of days of research here is what I have found out so far.

The Available Methods

Application.ExecuteExcel4Macro

This method requires converting your UDF call and its parameters to strings. I have not explored it further.


Dim FuncString as String

FuncString="XLLFunction(" & FirstParam
FuncString=FuncString & "," & SecondParam

FuncString=FuncString & ")"

Result=Application.ExecuteExcel4Macro(FuncString)

Declare

You can use Declare to make functions in an external DLL available to VBA. Declare works well with XLL UDFs that only require simple parameters but I don’t know how to use Declare if you want to pass something like a variant array as a parameter to an XLL.


Declare Function DoNothingX_12 Lib "MarshalTest.xll" () As Long

dNothing = DoNothingX_12()

References

You can add references (VBE->Tools->References) to any COM based UDF (XLAM, Automation …) to your VBA project and then CALL the UDFs from VBA as though they were included in your project. This is very efficient, but does not work with XLL UDFs.

References

Here I have added references to 2 files – TestProj.xlam and AutoProj.dll (a VB6 automation addin).

Evaluate

You can use Application.Evaluate or Worksheet.Evaluate to get the result of an Excel formula string. Like ExecuteExcel4Macro you have to convert your UDF call and its parameters to strings. This makes it difficult to pass something like a variant array but easy to pass the address of a range using Excel formula syntax.
Evaluate has some quirks that you need to be aware of: see this post for details.


Sub TimeEval2()
Dim dTime As Double
Dim j As Long
Dim dNothing As Double
Dim str As String
str = "=DoSomething(A1:F10301)"
dTime = microtimer()
For j = 1 To 100000
dNothing = ActiveSheet.Evaluate(str)
Next j
dTime = microtimer - dTime
End Sub

Application.Run

This method is the easiest one to use and works with both COM-based and XLL-based UDFs. It will convert your function parameters to an appropriate type as required.
The only major drawback to Application.Run is that the parameters are handled By Value as opposed to By Reference, which means that each parameter is copied before being passed to the function. This is fine for scalar values and objects such as a Range object but is slow for arrays (objects get passed as pointers so your UDF has to know how to handle whatever data structure or object the pointer points to).


Sub TimeRunVarrX2()
Dim dTime As Double
Dim j As Long
Dim dNothing As Double
Dim str As String
Dim rng As Range
Dim varr As Variant
Dim jFunc As Long

str = "DoSomethingVarrX"
Set rng = Range("A1:F10301")
' get the register ID of the XLL function
jFunc = Evaluate(str)

dTime = microtimer()
dTime = microtimer()
For j = 1 To 100
' pass range object to be converted
dNothing = Application.Run(jFunc, rng)
Next j
dTime = microtimer - dTime

End Sub

You can pass Application.Run either the name of the function/sub as a string or as a Register ID if its an XLL function. And you can use Evaluate to convert the name of the XLL function to its Register ID, by calling Evaluate(“FunctionName”) rather than Evaluate(“FunctionName()”).

If you are repeatedly calling the function using Application.Run its much faster to use the Register ID rather than the function name.

Performance

I ran a series of performance tests comparing Application.Run, Call (using References or Declare) and Evaluate for VBA XLAM UDFs, C++ XLL UDFs and VB6 Automation addin UDFs.

Each function was called 100000 times.

The DoNothing UDF

This UDF takes no parameters and returns a single digit. I used this to measure the calling overhead. The VBA version looks like this:


Public Function DoNothing() As Double
DoNothing = 1#
End Function

Here are the timings:
DoNothingTimesApplication.Run shows 2 timings for both the XLL and the VB6 Automation UDFs

  • The first XLL time is using the Name of the function, the second (nearly 50 times faster) is using the Register ID
  • The first Automation time is using the simple name of the function, the second is using the fully qualified name ( Application.Run(“AutoProj.TestFuncs.DoNothingA”) )

Considering that these timings are for 100K calls all the times are fast, but Evaluate is definitely the slowest.

The DoSomething UDF

This UDF takes a single parameter of a range object and just returns the number of rows in the range. The objective is to findout how the various methods handle an object.


Public Function DoSomething(theRange As Range) As Long
DoSomething = theRange.Rows.Count
End Function

DoSomethingTimesThe timings for passing a Range object (which is just a pointer) or a reference by value are very similar to the Do Nothing case.

The DoSomethingVArr UDF

This UDF takes a single parameter as a variant and can handle either a Range object or a variant array. If passed a Range object it coerces it to values.
The UDF returns the number of rows in the range or array.

The objective is to compare the time taken to pass a copy of a variant array with the time taken to pass a Range Object and then coerce it.

I ran the tests using a range which is 10301 rows by 6 columns conatining a mixture of text and numbers.

Public Function DoSomethingVArr(theArray As Variant) As Long
Dim theArray2 As Variant
If IsObject(theArray) Then
theArray2 = theArray.Value2
DoSomethingVArr = UBound(theArray2)
Else
DoSomethingVArr = UBound(theArray)
End If
End Function

Note that because Application.Run passes parameters by value the UDF has to coerce the range object to a local Variant array:  the code below does not work because you cannot change theArray: theArray does not get coerced to its values.


theArray = theArray.Value2

There are 3 sets of timings for RUN and CALL:

  • Variant Array – this shows the time to pass the large variant array as a parameter.
  • Range – coerce in function – this shows the time tp pass the range as an aobject or reference and then coerce it to its values inside the function.
  • Get Variant Array and Pass – this shows the time taken by VBA to convert the range to values and then to pass the resulting array to the function.

DoSomethingVarrThe results seem to show that it takes about 9.5 milliseconds (or 950 seconds for 100K calls) to coerce the range to its values or to pass the resulting array of values to the function.

But using Application.Run to pass the variant array to an XLL function takes 3 times as long as to pass it to a VBA or VB6 function (2700 versus 957). Presumably this is because the variant array needs to be converted to an xlOper.

If you just pass the range object/reference using Application.Run and then coerce inside the function the times are nearly the same (983 versus 1023).

Conclusions

  •  By far the most efficient method of calling UDFs is by using a Reference or Declare.
  • The least efficient method is generally EVALUATE.
  • Application.Run automatically handles converting UDF parameters between VBA and XLL, but this conversion can be relatively expensive for large arrays.
  • Application.Run passes parameters to the UDF by Value. This means that the UDF is not allowed to modify the parameters even locally within the scope of the UDF.
  • Objects are passed by Application.Run as read-only pointers by Value.
  • All these methods have acceptable performance unless they are used a large number of times.

OK – so what have I missed?

Posted in Calculation, UDF, VBA, XLL | Tagged , , , , | 4 Comments

Once in-a-lifetime Bordeaux wine Tasting: This time we made it!

Last year we had this once-in-a-lifetime wine tasting scheduled: but it did not happen.

(see the ones that got away)

This year we finally got our various acts together and had a terrific evening!

The Wines

BottlesThe line-up was fantastic: from left to right in increasing date order:

  • Haut-Batailley 1990 – the very last bottle from my late cousin Gary’s Wodehouse Wines.
  • Baron Pichon-Longueville 1995
  • Palmer 1996 – bought at the chateau several years ago
  • Phelan-Segur 1996
  • Haut Brion 1998 – the main excuse for the evening
  • Montrose 2003

I don’t expect to ever taste such a line-up again.

The Layout

We had decided that we would taste the wines blind so that we were not prejudiced in our evaluations. I had decanted 4 of the wines 2 hours before and these would be our first flight of 4.

DecantedWe always taste in parallel so that required 4 glasses labelled A B C D (and a diagram of which decanter was which!)

GlassesThe last 2 wines (the 1990 Haut Batailley (E) and the 1996 Phelan Segur (F)) I did not decant because I thought (correctly as it turned out) they needed to be opened only a short time before drinking. And anyway we only had 4 decanters!

The Tasters

There were 6 of us –

Tasters

The Tasting

We use detailed tasting sheets for each wine so that we can evaluate colour, taste, smell etc and build up a rating score for each wine. But the bottom line questions are really very simple:

  • Which wine do you like best?
  • Which wine do you like least?
  • Then repeat for the other 2 wines.

Tasters were also asked which of the 4 they thought was the oldest and which was the youngest. Nearly everyone got this right!

The first 4 wines were:

A – Montrose 2003 St Estephe
B – Baron Pichon Longueville 1995 Pauillac
C – Haut Brion 1998 Pessac – Graves
D – Palmer 1996 Margaux

I actually think a better sequence would have been oldest to youngest: everyone tastes left to right and starting with the youngest and most tannic was a bit of a challenge.

The top wine for everone was undoubtedly the Haut Brion. Still in its youth really I think it will continue to improve over the next 10 to 20 years.

The Palmer had a distinctive Brett smell (farmyard) which divided opinion so that it either came second or last. (I rated it last).

The Pichon-Longueville was well integrated and well into its drinking window.

The Montrose was the most powerful of the wines, but in the opinion of most of us will benefit from a few more years in bottle. But it got better and better as the evening wore on and its tannins softened.

After we had finished scoring the first 4 (accompanied by pate and biscuits) I opened the last 2 and we continued with a lovely whole fillet of beef.

E- Haut-Batailley 1990 Pauillac

The Haut-Batailley was terrific, although much lighter than the Montrose or the Haut Brion. Excellent colour and a great demonstration of what a well-aged (24 years!) Bordeaux should be. If only we had another bottle!

F- Phelan Segur 1996 St Estephe

The last bottle we drank of this wine was disappointing probably because it had been opened too long. This time we got it right and it was good: lovely perfume and finesse.

The evening finished at about 3am!

It really was the wine-tasting of a lifetime: many thanks to Jo for providing the Haut Brion, Pichon-Longueville and Haut-Batailley.

 

Posted in Uncategorized, Wine | Leave a comment

2014 MVP Summit

In the first week of November I was privileged to attend the MVP Summit at Microsoft in Redmond. I think it was the best Summit I have been to, with much more open interchange with the Excel team about things that might or might not make it into Excel one day.

Anyway I can’t tell you anything more about that, but I can say that all the Excel MVPs are a really nice bunch of people, who like to sit around in Rock Bottom, drink beer and talk Excel geek stuff (which I also like to do to!). Here are some photos of my fellow MVP geeks (you will have met many of these on various Excel forums):

Kevin1 - CopyFrom left to right: Jon Peltier, Roger Govier, Kevin “Zorvek” Jones, Brad Yundt, Jon Acampora, Mynda Treacy

Jan_Karel - Copy

Jan-Karel Pieterse pours the beer (of course he never drinks any himself!) for new Excel team member Joe McDaid

Claire - Copy

UK MVP Lead Claire (what does this button do?)

Mynda_Jon

Mynda Treacy and Jon Peltier ham it up for the camera with Ingeborg “Teylyn” Hawighorst in the background.

Zak_Smitty

Zack “Tables go this way” Barresse (aka FireFytr), Chris “Smitty” Smith and Frédéric (if its not Power Query I’m bored) Le Guen

Ron_Wim_Mandy

Ron (Wrong Bus) Coderre, Wim Gielis, Mandy Govier

XL_Beer

Excel beer (courtesy of Frederic)

And now for this year’s mystery photos:

Mystery1_2014

Mystery Photo 1: Where is this?

Mystery_2

Mystery Photo 2: What does this have to do with Excel?

 

Posted in Uncategorized | Tagged , | 6 Comments

Timing the Ins and Outs of User Defined Functions: Multi-Cell array formulas can be slow

I was looking at some multi-cell array formula UDFs with John and Rich and could not understand why they seemed so much slower than I expected.

Each UDF (written as C++ XLLs) read data from around 200 cells using around 40 parameters, did some very simple logic and arithmetic and then returned values to a different 200 cells. Each of these UDF was taking around 16-20 milliseconds and the workbook contained several thousand calls to these UDFs.

So I started to investigate:

First Hypothesis: Maybe marshalling input data from multiple parameters is slow?

We know that the way to speed up reading data from Excel Cells is to read data in as large a block as possible rather than in many small chunks. So maybe the number of parameters was the problem. To test this I wrote a couple of XLL functions.
MarshallAll takes a single parameter of a range of cells and returns the number of columns. The parameter is type Q so arrives in the function as values rather than a reference.
MarshallMany takes 40 parameters (also type Q) and returns a constant value of 88.


CXlOper* MarshallAll_Impl(CXlOper& xloResult, const CXlOper* Arg1)
{
long nCols=0;
nCols=Arg1->GetWidth2();
xloResult=(double)nCols;
return xloResult.Ret();
}

CXlOper* MarshallMany_Impl(CXlOper& xloResult, const CXlOper* Arg1, const
CXlOper* Arg2, const CXlOper* Arg3, const CXlOper* Arg4, const CXlOper*
Arg5, const CXlOper* Arg6, const CXlOper* Arg7, const CXlOper* Arg8, const
CXlOper* Arg9, const CXlOper* Arg10, const CXlOper* Arg11, const CXlOper*
Arg12, const CXlOper* Arg13, const CXlOper* Arg14, const CXlOper* Arg15,
const CXlOper* Arg16, const CXlOper* Arg17, const CXlOper* Arg18, const
CXlOper* Arg19, const CXlOper* Arg20, const CXlOper* Arg21, const CXlOper*
Arg22, const CXlOper* Arg23, const CXlOper* Arg24, const CXlOper* Arg25,
const CXlOper* Arg26, const CXlOper* Arg27, const CXlOper* Arg28, const
CXlOper* Arg29, const CXlOper* Arg30, const CXlOper* Arg31, const CXlOper*
Arg32, const CXlOper* Arg33, const CXlOper* Arg34, const CXlOper* Arg35,
const CXlOper* Arg36, const CXlOper* Arg37, const CXlOper* Arg38, const
CXlOper* Arg39, const CXlOper* Arg40)
{
xloResult=88.0;
return xloResult.Ret();
}

But when I compared the execution times of these functions they were both fast and there was not much difference in timings, although it was faster to read as many cells as possible with each parameter.

So hypothesis 1 failed.

I checked how the time taken varied with the number of cells read and their datatype.

Marshall3 Marshall4

As expected large strings take longer than small strings which take more time than numbers. And it is more efficient to read as many cells as possible for each parameter.

Second Hypothesis: Maybe returning results to multiple cells is slow?

We know that writing data back to Excel cells from VBA is significantly slower than reading data from cells.

(see VBA read-write speed and Getting cell data with VBA and C++)

I wrote another simple XLL UDF: MarshallOut. This took a single argument of a range and returned it.


CXlOper* MarshallOut_Impl(CXlOper& xloResult, const CXlOper* Arg1)
{
xloResult=*Arg1;
return xloResult.Ret();
}

Bingo: returning data to multiple cells is comparatively slow.

I used FastExcel Profile Formulas to time the tests:

Marshall6Reading and returning a 255 character string to each of 100 cells takes 13 milliseconds.

Marshall1Marshall2Notice that the time taken is NOT linear with respect to the number of cells.

Multi-Threading Effects

I also noticed that FastExcel Workbook Profiler showed that these functions were making inefficient use of multi-threaded recalculation. Presumably this is because they need an exclusive lock on Excel’s results table whilst writing out the results, and most of the time used is doing exactly that.

By contrast, the first set of “read-many cells but write one cell”  functions made efficient use of multi-threading.

Comparing XLL functions with VBA Functions.

I did a small number of comparison with VBA equivalents of these XLL functions. The VBA functions showed the same kind of performance behaviour as the XLL functions and were slightly slower. Of course VBA functions cannot multi-thread.

Conclusions

  • Large numbers of array formulas that return results to multiple cells can be slow.
  • Multi-cell array formulas do not multi-thread efficiently whilst writing their results back.
  • It is more efficient to have larger ranges and fewer parameters than many parameters referencing small ranges.

 

Posted in Calculation, UDF, Uncategorized, VBA, XLL | Tagged , , , , , | Leave a comment

Automatic or Manual Calculation?: Grouped Sheets causes Calculation Confusion.

Thanks to Simon Hurst and Paul Wakefield for telling me about this calculation weirdo, and credit to Chatman at Accounting Web for discovering it.

Grouped Sheets and Calculation Mode

Try these experiments:

Experiment 1

  1. Start Excel
  2. Open a workbook in automatic calculation mode and with only 1 sheet selected (so there are no grouped sheets).
  3. Add  =NOW() to a couple of sheets and format the cells to show seconds so that you can see when it calculates.
  4. Check that Calculation shows Automatic both in File-> Options->Formulas and in the Formulas Tab->Calculation Options.
  5. Set Calculation to Manual (does not matter how you do this)
  6. Select 2 or more Sheets (hold down Control and select 2 sheet tabs) so that the sheets are grouped (The Workbook title bar should show [Grouped] ).
  7. Calculation is still Manual
  8. Set the Calculation to automatic using File->Options->Formulas
  9. Notice that the workbook calculates, but Calculation immediately returns to manual (check both Formulas Tab->Calculation Options and File->Options->Formulas)
  10. Now set the Calculation to automatic using Formulas Tab->Calculation Options
  11. Now the workbook calculates and stays in Automatic Calculation Mode, but File->Options->Formulas says its Manual and Formulas Tab->Calculation Options correctly says its Automatic!
  12. Now ungroup the sheets and everything works normally

Experiment 2

  1. Start Excel
  2. Open a workbook in automatic calculation mode and with only 1 sheet selected (so there are no grouped sheets).
  3. Add  =NOW() to a couple of sheets and format the cells to show seconds so that you can see when it calculates.
  4. Check that Calculation shows Automatic both in File-> Options->Formulas and in the Formulas Tab->Calculation Options.
  5. Select 2 or more Sheets (hold down Control and select 2 sheet tabs) so that the sheets are grouped (The Workbook title bar should show [Grouped] ).
  6. Excel should still be in Automatic calculation mode, but File->Options->Formulas shows Manual and Formulas->Calculation Options shows Automatic.
  7. Calculation is actually Automatic.

Confused? Well so is Excel!

Conclusions

  • Looks like it is safer to use Formulas->Calculation Options
  • If you are using Grouped Sheets be careful about your Calculation Mode!
Posted in Uncategorized | Leave a comment

Excel 2007 multi-threading bug with Range.SubTotal and XLL

OK: this is a fairly obscure Excel 2007 only bug but I thought I should place it on record.

The conditions required for the bug are:

  • Excel 2007 (tested with 12.0.6683.5002 SP3)
  • Automatic calculation Mode
  • Multi-threaded calculation is on and the system has multiple cores
  • XLL multi-threaded worksheet functions are being used
  • A VBA routine uses Range.SubTotal to create subtotals

The bug symptoms are

  • Either Range.subtotal fails and Excel becomes unstable
  • Or Range.subtotal seems to work but Excel becomes unstable

This bug seems to have been fixed in Excel 2010 and later versions.

Bypassing the bug

If your VBA switches off Multi-threaded calculation (Application.MultiThreadedCalculation.Enabled = False) just before doing the Range.SubTotal and then switches it back on again the bug is bypassed.

Whats causing the problem

Using Range.subtotal is triggering a calculation event. VBA always runs on the main thread, but the XLL multi-threaded functions can run on any thread.

So presumably the problem happens when the XLL function is not calculated on the main thread and tries to return a result to Excel, but Excel is not ready to accept it, thus a portion of memory gets overwritten.

Posted in Uncategorized | Leave a comment

Extracting Digits from Text: Using Formulas and Designing a Missing Excel Function: GROUPS

An Excel problem that crops up quite often is how to extract digits (0-9) from text. The text might be part numbers, or web addresses, or currency values or …

Some cases are easy to handle with formulas:

  • A fixed number of digits at the start or end of the text-string (Use LEFT or RIGHT)
  • A fixed number of digits starting at a fixed point within the string (Use MID)
  • Groups of n digits separated by a single separator character (Use MID)

But in real life things are often not so simple:

  • No fixed position for the start
  • A variable number of digits
  • A variable number of separators
  • Extract all the digits or only one group
  • Need to locate a particular text string as separator
  • Extract the nth group of digits
  • Work from left to right or right to left
  • Extract only the first or the last n digits from a group

Most of these more complex cases can still be solved using formulas, but the required formulas are often long, complicated, hard to understand and do not adapt well to changes in the data.

Some Formula Examples

The first example is from an article by MVP Ashish Mathur

The Data and results look like this:

Digits1

Ashish’s formula, entered using Control/Shift/Enter as an array formula, looks like this (see his article for an explanation).{=MID(A2,MATCH(TRUE,ISNUMBER(1*MID(A2,ROW($1:$9),1)),0), COUNT(1*MID(A2,ROW($1:$9),1)))}

The next example is from Chandoo

Digits2

This one is probably impossible for all the entries just using a formula, but here is Chandoo’s best attempt.{=MID(B4,MIN(IFERROR(FIND(lstNumbers,B4),””)), SUMPRODUCT(COUNTIF(lstDigits,MID(B4,ROW($A$1:$A$200),1))))+0}

Where lstNumbers is a range containing the digits 0-9 and listDigits is a range containing 0-9 comma and decimal point.

The final example is from a Bill Jelen/Mike Girvin Dueling Excel Podcast, and its really tricky. Some example data looks like this:

Digits4The challenge is to extract the last 3 of the consecutive digits after the first POV_ and before any non-numeric character. As Bill Jelen points out, this is much easier to do with a VBA UDF than with formulas. But if you watch the podcast you can see how Mike Girvin develops some incredible formulas to do the job.

Bill’s VBA UDF looks like this:


Function Nums(MV)
x = Application.WorksheetFunction.Find("POV_", MV) + 4
Nums = ""
For i = x To Len(MV)
ThisChar = Mid(MV, i, 1)
Select Case ThisChar
Case ".", "_"
GoTo FoundIt
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Nums = Nums & ThisChar
End Select
Next i

FoundIt:
Nums = Right(Nums, 3)
End Function

The Missing Excel Function: GROUPS

Well, after watching Bill and Mike solving this really tricky problem I started wondering why does Excel not have a function to do this kind of stuff? And what would it look like if it did have one?

So (of course) I decided to build one! After a few iterations and the inevitable scope creep the requirements looked like this:

  • Extract groups of characters from a text-string
  • Allow the user to define what constitutes a group of characters
  • Extract the Nth group from the start, or the Nth group working backwards from the end
  • Option to specify the maximum number of characters to extract from the front or the back of the group
  • Option to give the start and/or end position within the string for the search for groups.

GROUPS(Text, GroupNumber, MaxChars, GroupType, StartPos, EndPos)

GroupNumber can be zero (all Groups) or a positive or negative number to get the Nth group from the start or end.

MaxChars can be zero (all characters) or a positive or negative number to restrict the number of characters from the start or end of the group.

GroupType can be either of

  • a Regex Pattern string, for example [0-9,.] would define a group as consecutive characters consisting of 0 to 9 comma and decimal point
  • A number from 0 to 4 for the most common group types (0-9 , a-z , not 0-9 , 0-9 and . , not 0-9 and .)

StartPos and EndPos default to the first character (1) and the last character (0)

GROUPS Examples

To get the result from Ashish’s data shown above use

=GROUPS(A2)

The defaults are: get the first group of numbers starting at the left.

For Chandoo’s example you need to define the group of digits as being 0-9 comma and decimal point, and you could append the other characters as groups of everything except 0-9 comma and decimal point and space. A bit of experimentation shows that using -1 as the group number gives better results for this data!

=GROUPS(B4,-1,,”[0-9,.]”) & ” ” & GROUPS(B4,-1,,”[^0-9,. ]”)

Digits5Which is pretty good apart from maybe from having to choose between the INR18lacs or USD$36000!

For Bill Jelen and Mike Girvin’s podcast problem there are a couple of approaches:
a slight cheat notices that the numbers we want are always the second group of numbers so this works: find the last 3 digits from the second group of digits.

=GROUPS(A2,2,-3)

But for the original problem as stated we need to look for POV_

=GROUPS(A2,1,-3,0,SEARCH(“Pov_”,A2)+4)

Find the last 3 digits from first group of digits found starting after POV_.
I am using SEARCH rather than FIND because SEARCH is not case-sensitive.

Conclusion

  • It would be much easier to solve these kind of problems if Excel had a function like GROUPS.
  • You can try out the GROUPS function for yourself by downloading the 15-day trial of FastExcel Version 3 from my website.
  • The GROUPS function is implemented as a multi-threaded XLL function, so performance is quite good.

If you have a real-life extraction problem that cannot be solved by the GROUPS function please let me know!

Posted in UDF, VBA, XLL | Tagged , | Leave a comment