## Extracting Digits from Text: Using Formulas and Designing a Missing Excel Function: GROUPS

An Excel problem that crops up quite often is how to extract digits (0-9) from text. The text might be part numbers, or web addresses, or currency values or …

Some cases are easy to handle with formulas:

• A fixed number of digits at the start or end of the text-string (Use LEFT or RIGHT)
• A fixed number of digits starting at a fixed point within the string (Use MID)
• Groups of n digits separated by a single separator character (Use MID)

But in real life things are often not so simple:

• No fixed position for the start
• A variable number of digits
• A variable number of separators
• Extract all the digits or only one group
• Need to locate a particular text string as separator
• Extract the nth group of digits
• Work from left to right or right to left
• Extract only the first or the last n digits from a group

Most of these more complex cases can still be solved using formulas, but the required formulas are often long, complicated, hard to understand and do not adapt well to changes in the data.

## Some Formula Examples

The first example is from an article by MVP Ashish Mathur

The Data and results look like this: Ashish’s formula, entered using Control/Shift/Enter as an array formula, looks like this (see his article for an explanation).{=MID(A2,MATCH(TRUE,ISNUMBER(1*MID(A2,ROW(\$1:\$9),1)),0), COUNT(1*MID(A2,ROW(\$1:\$9),1)))}

The next example is from Chandoo This one is probably impossible for all the entries just using a formula, but here is Chandoo’s best attempt.{=MID(B4,MIN(IFERROR(FIND(lstNumbers,B4),””)), SUMPRODUCT(COUNTIF(lstDigits,MID(B4,ROW(\$A\$1:\$A\$200),1))))+0}

Where lstNumbers is a range containing the digits 0-9 and listDigits is a range containing 0-9 comma and decimal point.

The final example is from a Bill Jelen/Mike Girvin Dueling Excel Podcast, and its really tricky. Some example data looks like this: The challenge is to extract the last 3 of the consecutive digits after the first POV_ and before any non-numeric character. As Bill Jelen points out, this is much easier to do with a VBA UDF than with formulas. But if you watch the podcast you can see how Mike Girvin develops some incredible formulas to do the job.

Bill’s VBA UDF looks like this:

```
Function Nums(MV)
x = Application.WorksheetFunction.Find("POV_", MV) + 4
Nums = ""
For i = x To Len(MV)
ThisChar = Mid(MV, i, 1)
Select Case ThisChar
Case ".", "_"
GoTo FoundIt
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Nums = Nums & ThisChar
End Select
Next i

FoundIt:
Nums = Right(Nums, 3)
End Function

```

## The Missing Excel Function: GROUPS

Well, after watching Bill and Mike solving this really tricky problem I started wondering why does Excel not have a function to do this kind of stuff? And what would it look like if it did have one?

So (of course) I decided to build one! After a few iterations and the inevitable scope creep the requirements looked like this:

• Extract groups of characters from a text-string
• Allow the user to define what constitutes a group of characters
• Extract the Nth group from the start, or the Nth group working backwards from the end
• Option to specify the maximum number of characters to extract from the front or the back of the group
• Option to give the start and/or end position within the string for the search for groups.

GROUPS(Text, GroupNumber, MaxChars, GroupType, StartPos, EndPos)

GroupNumber can be zero (all Groups) or a positive or negative number to get the Nth group from the start or end.

MaxChars can be zero (all characters) or a positive or negative number to restrict the number of characters from the start or end of the group.

GroupType can be either of

• a Regex Pattern string, for example [0-9,.] would define a group as consecutive characters consisting of 0 to 9 comma and decimal point
• A number from 0 to 4 for the most common group types (0-9 , a-z , not 0-9 , 0-9 and . , not 0-9 and .)

StartPos and EndPos default to the first character (1) and the last character (0)

## GROUPS Examples

To get the result from Ashish’s data shown above use

=GROUPS(A2)

The defaults are: get the first group of numbers starting at the left.

For Chandoo’s example you need to define the group of digits as being 0-9 comma and decimal point, and you could append the other characters as groups of everything except 0-9 comma and decimal point and space. A bit of experimentation shows that using -1 as the group number gives better results for this data!

=GROUPS(B4,-1,,”[0-9,.]”) & ” ” & GROUPS(B4,-1,,”[^0-9,. ]”) Which is pretty good apart from maybe from having to choose between the INR18lacs or USD\$36000!

For Bill Jelen and Mike Girvin’s podcast problem there are a couple of approaches:
a slight cheat notices that the numbers we want are always the second group of numbers so this works: find the last 3 digits from the second group of digits.

=GROUPS(A2,2,-3)

But for the original problem as stated we need to look for POV_

=GROUPS(A2,1,-3,0,SEARCH(“Pov_”,A2)+4)

Find the last 3 digits from first group of digits found starting after POV_.
I am using SEARCH rather than FIND because SEARCH is not case-sensitive.

## Conclusion

• It would be much easier to solve these kind of problems if Excel had a function like GROUPS.
• You can try out the GROUPS function for yourself by downloading the 15-day trial of FastExcel Version 3 from my website.
• The GROUPS function is implemented as a multi-threaded XLL function, so performance is quite good.

If you have a real-life extraction problem that cannot be solved by the GROUPS function please let me know!

This entry was posted in UDF, VBA, XLL and tagged , . Bookmark the permalink.

### 7 Responses to Extracting Digits from Text: Using Formulas and Designing a Missing Excel Function: GROUPS

1. fastexcel says:
2. Martin Dekar says:

Why not (for Bill Jelen’s example) use:
=IF(ISERROR(Int(MID(A4,1,1))),,MID(A4,1,1))&
IF(ISERROR(Int(MID(A4,2,1))),,MID(A4,2,1))&
IF(ISERROR(Int(MID(A4,3,1))),,MID(A4,3,1))&
IF(ISERROR(Int(MID(A4,4,1))),,MID(A4,4,1))&
IF(ISERROR(Int(MID(A4,5,1))),,MID(A4,5,1))&
IF(ISERROR(Int(MID(A4,6,1))),,MID(A4,6,1))&
IF(ISERROR(Int(MID(A4,7,1))),,MID(A4,7,1))&
IF(ISERROR(Int(MID(A4,8,1))),,MID(A4,8,1))&
IF(ISERROR(Int(MID(A4,9,1))),,MID(A4,9,1))&
IF(ISERROR(Int(MID(A4,10,1))),,MID(A4,10,1))
– Not a range formula
– Works in office older than 2010
– Allows rows to be added to lines 1:9 (which destroys Ashish’s formula)

You can make it work for arbitrary length of string, just by pasting more IF(ISERROR(Int(MID(A4,7,1))),,MID(A4, 7,1))

3. fastexcel says:

Hi Martin,
That could work for Ashis’s example, but the Bill/Mike example asks to get the last 3 of the consecutive digits after the first POV_ and before any non-numeric character, rather than just extract all the numeric digits.

4. Gene Kok says:

Can someone please help me. ive got a sting of digits : 9002245022089. i need a formula to extract the 7th, 8th, 9th, and 10th digit. eg(5022) if anyone can please assist me with this

• fastexcel says:

@Gene, you don’t need anything complex – just use MID
=MID(9002245022089,7,4)

5. Garth says:

Hi I have a cell that reads 214P0 23X26-1250 SEMI, I’d like to extract the numbers particularly those around the “x” into separate columns (i.e. 23 , 26), but I’d be ok pulling all the numbers into separate columns also.

• fastexcel says:

Assuming your text is in A4 then these GROUPS formulas give you the group of numbers before and after the X:
=GROUPS(\$A\$4,-1,,,,SEARCH(“X”,\$A\$4))
=GROUPS(\$A\$4,1,,,SEARCH(“X”,\$A\$4))
You could get the 3rd and 4th groups of numbers using
=GROUPS(\$A\$4,3)
=GROUPS(\$A\$4,4)