None of the previous posts on writing efficient VBA UDfs (Part1,Part2,Part3,Part4,Part5) talked about handling strings in VBA.
This could be a major omission since string-handling is one of VBAs slowest “features”.
Suppose you want to find the position of the first capital letter in a string.
Array Formula
You could use an array formula like this:
{=MATCH(TRUE,ISERR(FIND(MID(A5,ROW($1:$255),1),
LOWER(A5))),0)}
My test data is 2000 rows, each containing 25 lower-case characters and one randomly placed upper-case character.
2000 calls to this array formula takes 250 milliseconds.
So lets try some VBA UDFs.
Using LIKE
One way is to use the VBA LIKE statement:
Function FirstCap2(Cell As Range) For FirstCap2 = 1 To Len(Cell.Value) If Mid(Cell.Value, FirstCap2, 1) Like "[A-Z]" Then Exit For End If Next FirstCap2 End Function
The code loops across the string using Mid to look at each character in turn, and then uses LIKE to see if the character is one of upper-case A to upper-case Z.
2000 calls to this UDF takes 50 milliseconds – a factor of 5 faster, but we can make it faster (of course).
Function FirstCap3(Rng As Range) As Long Dim theString As String theString = Rng.Value2 For FirstCap3 = 1 To Len(theString) If Mid$(theString, FirstCap3, 1) Like "[A-Z]" Then Exit For End If Next FirstCap3 End Function
I changed the code to only get the string out of the cell once, and to use Mid$ rather than Mid. All the VBA string handling functions have 2 versions: versions without the $ work with variant arguments, whereas versions with the $ suffix only work on string arguments, but are slightly faster.
2000 calls to this version of the UDF takes 17 milliseconds, nearly 3 times faster.
Using MID$
But maybe using LIKE is slow? Lets try comparing a lower-case version of the string and stopping when the characters don’t match:
Function FirstCap4(strInp As String) As Long Dim tmp As String Dim i As Long Dim pos As Long tmp = LCase$(strInp) pos = -1 For i = 1 To Len(tmp) If Mid$(tmp, i, 1) <> Mid$(strInp, i, 1) Then pos = i Exit For End If Next FirstCap4 = pos End Function
Well surprisingly this is slower than the optimised version using LIKE:
2000 calls to this version of the UDF takes 36 milliseconds.
Using Byte Arrays
Using Byte arrays with strings is one of VBAs less well known secrets, but its often an efficient way of handling strings when you need to inspect each character in turn.
Public Function FirstCap5(theRange As Range) As Long Dim aByte() As Byte Dim j As Long FirstCap5 = -1 aByte = theRange.Value2 For j = 0 To UBound(aByte, 1) Step 2 If aByte(j) < 91 Then If aByte(j) > 64 Then FirstCap5 = (j + 2) / 2 Exit For End If End If Next j End Function
This version of the UDF is slightly faster: 2000 calls takes 15 milliseconds.
So how does this work?
First create an undimensioned array of Bytes : Dim aByte() as Byte
Then assign a string to it: aByte=”abEfg”
You can use the Locals window to see what the resulting Byte array looks like:
Each character in the string has resulted in 2 bytes which are the Unicode code points for the character. Since I am working in a UK English Locale using the Windows Latin-1 codepage the first byte is the ANSI number for the character and the second byte is always zero.
Unaccented english upper-case characters are ANSI numbers 65 to 90, so I can loop down the byte array, looking at every other byte, and do a numeric test directly on the character to see if it is upper-case. You can see that only the third character is upper-case.
Another surprising feature of this kind of Byte array is that you can assign a byte array directly back to a string:
Dim str1 as string
str1=aByte
Str1 now contains “abEfg”
Array version of the Byte UDF
As discussed in Part 5 of writing efficient UDFs, Array Formulae go faster. So here is an array formula version of the Byte UDF.
Public Function AFirstCap(theRange As Range) As Variant Dim aByte() As Byte Dim j As Long Dim L As Long Dim vRange As Variant Dim jAnsa() As Long Dim NumCells As Long vRange = theRange.Value2 NumCells = UBound(vRange, 1) ReDim jAnsa(NumCells - 1, 0) For L = 0 To NumCells - 1 jAnsa(L, 0) = -1 aByte = vRange(L + 1, 1) For j = 0 To UBound(aByte, 1) Step 2 If aByte(j) < 91 Then If aByte(j) > 64 Then jAnsa(L, 0) = (j + 2) / 2 Exit For End If End If Next j Next L AFirstCap = jAnsa End Function
This version, entered into 2000 rows as an array formula using Control/Shift/Enter, takes just 4.8 milliseconds.
Conclusion
Here is a table comparing the speed of these different approaches.
Method |
Milliseconds |
Array Formula |
250 |
LIKE UDF |
50 |
Optimised LIKE UDF |
17 |
MID$ UDF |
36 |
Byte Array UDF |
15 |
Array Formula version of Byte Array UDF |
4.8 |
So the fastest VBA is just over 10 times faster than the slowest VBA solution, and a whopping 52 times faster than the array formula solution.
Using Byte arrays for strings can be a good solution for string handling where you need to inspect or manipulate many individual characters.
So what do you use Byte arrays for?
Interesting post!!
How fast is this array formula?
=MATCH(TRUE, CODE(MID(A1, ROW($1:$255), 1))<=90, 0)
Hi Oscar,
Takes 117 millisecs, so .. faster than the previous arry formula but slower than all the UDFs.
aByte < aBark?
Out of curiosity, does VBA work out what UBound(aByte, 1) is for each pass of the For j = 0 To UBound(aByte, 1) loop? If so, I wonder what kind of speed improvement you’d get from caching the value of UBound(aByte, 1)
None. Makes no difference whether you assign the UBOUND to a variable or not, over 1,048,576,000 loops
Thanks for all your post.
If you can not use “xlsm”, you may use:
——————————-
=AGGREGATE(15;6;FIND(
PROPER(MID($A5;
ROW($A$1:INDEX($A$1:$A$256;LEN($A5)));
1));$A5;
1);
1)
——————————
In you system maybe “;” -> “,”.
——————————-
=AGGREGATE(15,6,FIND(
PROPER(MID($A5,
ROW($A$1:INDEX($A$1:$A$256,LEN($A5)));
1)),$A5,
1),
1)
——————————
It is nearly 10 times faster than your first CSEformula and you can choose position of the 1st, 2…n capital letter 😉