The SpeedTools FILTER.IFS Function Design Part 3: Excel Data Types – Strange COUNTIF behaviour

The previous post discussed how Excel’s data types, and how FILTER.IFS was designed to handle them.

Colin Legg suggested that a good starting point for the design choices could be what COUNTIF/SUMIF do. So here is an example of some of the problems with COUNTIF, and what the equivalent SpeedTools function ACOUNTIFS does. (ACOUNTIFS uses the same filtering engine as FILTER.IFS).

Using COUNTIF with Number Strings

Suppose you have a list of zero-prefixed numbers, headed DATA, and you want to count how many of each of the numbers there are:
Each zero-prefixed number is unique apart from 0012345 which appears twice in the second and third row (54 and 55).
So I created a COUNTIF formula to count the number of occurrences in the list for each number, using each of the different criteria operators.

If COUNTIF works correctly in this situation the answer should be {1;2;2;1;1}, but as you can see below it gets it wrong!

DataTypes6

Each row in the table tries to count how many of the corresponding cell can be found using the relational operator.

So COUNTIF always give 5 when using =, so I think it must be converting ALL the text in both the data and the criteria to numbers.
And < and > always give zero because COUNTIF thinks all the data cells contain the same thing (a numberĀ  12345).

But COUNTIF($A$50:$A$54,”<>” & $A50) also gives 5 ! This looks like a BUG to me.

Here is what the SpeedTools ACOUNTIFS function gives:

DataTypes7

ACOUNTIFS treats the text numbers as text numbers and so gives what looks to me like a more “correct” answer for all the relational operators.

Conclusion

Using COUNTIF/SUMIF/COUNTIFS/SUMIFS with mixed data types looks very unwise to me!

But maybe you can figure out a way to make them work sensibly?

Advertisements
This entry was posted in Calculation, Lookups, UDF, XLL. Bookmark the permalink.

5 Responses to The SpeedTools FILTER.IFS Function Design Part 3: Excel Data Types – Strange COUNTIF behaviour

  1. Colin Legg says:

    Nice investigation, Charles.

    I noticed that the cell references in D51 of your screenshot are 3 rows too high.

    The criteria formula looks like a bug to me too. If I change the value in A57 to 12345 (number, not text) the formula returns 4, so I think it is converting the criteria to a number, but not the data. It also does this with =. So it seems if the comparison operator is 2 characters then it only converts the criteria to a number. Not very consistent.

    So what to implement in your own function? Brainstorming:
    — Keep it as is?
    — Do it the way Excel’s functions do it but without the bugs?
    — Reserve these comparison operators for arithmetic comparisons and then introduce your own, special comparison operators for text types?
    — Other?

    Sorry if this isn’t helping.

  2. Colin Legg says:

    The text in my reply was parsed as HTML because of the characters I used.

    My second paragraph should have read:
    “The criteria formula looks like a bug to me too. If I change the value in A57 to 12345 (number, not text) the formula returns 4, so I think it is converting the criteria to a number, but not the data. It also does this with the ‘less than or equal to’ and ‘greater than or equal to’ operators. So it seems if the comparison operator is 2 characters then it only converts the criteria to a number. Not very consistent.”

  3. fastexcel says:

    Thanks Colin, I have fixed the post.
    I am definitely NOT going to emulate COUNTIF!
    With FILTER.IFS if there is no criteria operator then I do not do any type conversion (because there is no possible ambiguity)
    Otherwise I attempt to convert the Criteria Value to the same data type as the first data cell.
    and I have added a “&” prefix that uses all of the possible type conversions to try to find the best type match with each cell in the data.

  4. Another approach to this is to factor the problem into smaller pieces, as in http://xllrange.codeplex.com. Suppose Data is a two column array of stock names and stock rankings. To find the top three stocks you could use

    RANGE.TAKE(3, INDEX(Data, 1, RANGE.GRADE(Data, -3, 2)))

    The take and grade functions come from APL, J, A, A+, k, kdb, or whatever the latest incarnation is called.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s