## Excel Data Types

Excel has only  4 or 5 native data types:

• Numbers (which can be formatted as Dates, Times, Currency, Integers, Doubles etc, but are all held internally as floating point doubles)
• Strings (Text including zero length strings like “”)
• Booleans (True or False)
• Errors (#N/A, #DIVO etc)
• Empty (which annoyingly is only partly supported –  for instance you can’t return it from a function or a formula)

You can format all these data types in lots of different ways so that they look different, but a Cell’s underlying value is always going to be one of these types.

And unlike most Database systems Excel allows the cells in a column to contain multiple data types.
This can lead to problems: the most frequent one being a column of numbers some of which have been entered as text strings and some as real numbers. Usually you can visually see them because the numbers that are text are left-aligned in the cell and the real numbers are right-aligned.

Numbers as text can arrive in Excel in various ways:

• Start by entering a ‘ followed by the number
• Format the cell as text
• Data imported from external sources

## Sorting Columns containing multiple data types

When Excel sorts data contining different data types it uses this relationship between types:

Numbers<Strings<Booleans<Errors

Empty cells are always sorted last, both in Ascending and Descending sorts!
When you sort data containing numbers stored as text strings Excel asks you if you want to sort Text numbers as text or as numbers. Usually its better to sort text numbers as text rather than risk confusing any subsequent operation that relies on things being properly sorted.

## Comparing Data Types

If you use a simple formula (=A6<A5)  to compare data types you get this:

You can see that XYZ is >= the empty cell above it, but ABCD is <XYZ.
Numeric 1234 in A9 is less than text string 1234 in A8
A12 entered as a ‘ is a zero-length string and is > the number in A11.
The errors in A16:A18 propagate so you can’t see how Excel compares them.

So if you get rid of the error cells, sort the data and change the formula so its looking for A6>A5 you get this:

So the formula comparison precedence rules are the same as the Excel sorting rules, except for empty cells!

## FILTER.IFS Data Type Comparison Operators

Its useful to be able to filter by data type (although the standard Excel Filter command does not have this option), so I added some type filtering operators:

• #ERR – filters all the error cells
• #TXT – filters all the string/text cells
• #N – filters all the number cells
• #BOOL – filters all the True/False cells
• #EMPTY – filters all the empty cells
• #ZLS – filters all the cells containing a zero-length string
• #TYPE – filters all the cells that have the same data type as the first cell in the filtered range
• #BLANK – filters all the cells that contain one or more blanks or spaces

You can prefix these operators with ¬ to filter out everything that does NOT match the data type.

And you can have a list of multiple filtering operators: {“¬”,”#EMPTY”,”#ZLS”,”#BLANK”,”#ERR”} would exclude empty cells, cells with zero length strings or blanks, and cells with errors.

Here is an example:

The FILTER.IFS formula is =FILTER.IFS(0,\$A\$6:\$B\$17,1,D\$5) entered as an array formula (Control-Shift-Enter) and copied across.

• The 0 says there are no sorted Criterion columns (because the type filters don’t care if the data is sorted or not).
• \$A\$6:\$B\$17 gives the range to be filtered
• 2 gives the column within the data range to be returned as the answer
• 1 gives the column within the data range to be filtered using the criterion
• D\$5 gives the cell containing the criterion itself

## Handling Data Types with the Relational Operators <,<=,>,>=,¬=

Suppose you create a FILTER.IFS formula  like this: =FILTER.IFS(1,\$A\$6:\$B\$17,2,1,”<1235″)

The criterion says less than 1235, but which 1235 – the numeric one or the string one or both?

I don’t think there is neccessarily a “correct” answer to this, so I invented a rule!

If the Criterion value can be converted into more than one data type (in this case a string and a number) choose the same data type as the first cell in the column.

In this case the first cell is a number, so FILTER.IFS chose to use numeric 1235, which results in a single result, the 1 from row 6.

Because the data is sorted the binary search routine has to use a single datatype, so looking for both the string 1235 and the numeric 1235 is not an option.

But if the data is NOT sorted a linear search can find both: so you can tell FILTER.IFS to compare using ALL the avialable datatypes by using an & prefix.

=FILTER.IFS(0,\$A\$6:\$B\$17,2,1,”&<1235″)

you get ALL the data which is less than numeric 1235 AND all the data that is less than string 1235.

Of course if you don’t use any of the Criterion operators and it finds only the matching data type (=FILTER.IFS(1,\$A\$6:\$B\$17,2,1,”1235″) or =FILTER.IFS(1,\$A\$6:\$B\$17,2,1,1235)

## Conclusion

Using mixed data types with relational operators can be tricky – sometimes its difficult to work out what Excel is doing.
A drawback of following the same kind of syntax as SUMIFS (a string containing both the relational operator and the value) is that there is no clear datatype choice.

But I was not sure that the previous FILTER.IFS design, which could give different results for sorted and unsorted data, made sense, so I changed it so that sorted and unsorted data gave the same results and added the & prefix :
What do you think?

This entry was posted in Calculation, UDF, VBA, XLL. Bookmark the permalink.

### 2 Responses to The SpeedTools FILTER.IFS Function Design Part 2: Excel Data Types – When is a Number a String?

1. Colin Legg says:

You’ll have put a lot more thought into this than me, Charles, but I think I would try to handle the comparison operators by mimicking the behaviour of Excel’s built-in functions such as SUMIF, COUNTIF etc.

2. fastexcel says:

Well, unfortunately it looks to me like SUMIF & COUNTIF do completely nonsensical things. I could emulate them I suppose, but I would rather try and do what (IMHO!!!) is the correct thing. Probably I should do another blog post comparing both approaches.