## UNIQUES and DISTINCTS: exploring lists with LISTDISTINCTS

I just added some options to the SpeedTools LISTDISTINCTS functions that make them surprisingly powerful. You can now easily find the most frequently occurring item in a list, or find the item with the largest sum or average of a corresponding column.

But first since there is disagreement about the meaning of the terms UNIQUES and DISTINCTS I should explain what I mean:

• A unique item in a list is one that only occurs once
• Distinct items in a list can occur once or more than once

### Creating a list of distinct items

Suppose you have a list formatted as a table:

Then entering the formula =LISTDISTINCTS(Table1) as a multi-cell array formula (select 14 cells in a column, enter the formula in the formula bar and press Control/Shift/enter) gives you this:

Notice that the items appear in the sequence of their first occurrence in the list.

There are rather a lot of #N/As since Excel pads out the excess cells (the cells for which the array formula did not return anything) with #N/A.
But of course LISTDISTINCTS allows us to fix that using the PAD option in the formula =LISTDISTINCTS(Table1,,,,,1)
Pad can be 0 = pad with #N/A, 1 = pad with “”, 2 pad with zero.

That looks better, but OOPS the #N/A in the list has disappeared! Thats because the default option for LISTDISTINCTS is to ignore error values, blanks and empty cells, so we just need to change the Ignore option to 2.

And the result looks like this: (it shows #ERROR rather than #N/A so that you can distinguish it from the padding #N/A)

There some more options for LISTDISTINCTS

Case_Sense defaults to false, so the aa and AA in the list are treated as being the same.

If your list of items has more than one column you can either ask for a list of distinct rows (ByRows=True) or a list of all the distinct items across all the columns.

And you can sort the result list ascending (Sort=1), descending (Sort=2) or leave it unsorted (Sort=0).

Here is an example of LISTDISTINCTS sorted ascending, case-sensitive, pad with blanks, include errors, showing the difference with ByRows True and ByRows False.

### Counting Distinct Items

There are 2 variations of LISTDISTINCTS for counting the number of distinct items: COUNTDISTINCTS and LISTDISTINCTS.COUNT

COUNTDISTINCTS is not an array formula and just gives you the count.
But LISTDISTINCTS.COUNT adds an extra column that gives the count of occurrences of each of the distinct items.

### Finding the most frequently occurring item

You can also sort the output of LISTDISTINCTS.COUNT most frequent occurrences first (Sort=-2) or last (Sort=2).

So the formula =LISTDISTINCTS.COUNT(A21:A34,,,,-2) (not an array formula, entered in a single cell) returns AA which is the most frequently occurring item.

### Sums and Averages for distinct items.

As well as LISTDISTINCTS.COUNT there are LISTDISTINCTS.SUM and LISTDISTINCTS.AVG
These take an additional column argument showing what to sum or average for each distinct item.
And, just like LISTDISTINCTS.COUNT, you can sort the output either on the distinct items or on the resulting sums or averages.

## Summary

Of course you can achieve similar things with Pivot Tables and PowerPivot. But there are many occasions when I find that a simple formula that automatically refreshes whenever Excel recalculates is a better solution.

And adding the option to sort ascending or descending on either the item list or the count, sum or average adds a lot of pwer to the functions.

## So what do you use for this kind of thing: Formula, UDF, Pivot Table or PowerPivot?

This entry was posted in Calculation, UDF, XLL and tagged , , . Bookmark the permalink.

### 3 Responses to UNIQUES and DISTINCTS: exploring lists with LISTDISTINCTS

1. Eddy says:

Dear Charles,
I am using FastExcel for the past year and i can’t even imagine today how i could run my models without it. Indeed, i was able to develop VBA free stock market simulation models, which are processing a huge amount of data – Each model / Excel file is typically 3 Go (main database = 800 000 rows x 1000 columns) and require up to 30 Go of RAM when “calculating”..
Simulation runtimes are now down to just few minutes, ie 10 to 20 times faster than before…. and any kind of Excel memory related crashes has completely disappeared.
Anyhow, I want to further simplify the model by reducing the number of formulas,ie replacing SUMIFS type of statements by LISTDISTINCTS functions…
Typically, I need to perform some statistical calculations (standard deviation, Percentile, Rank…) using the data set from the second column of the array generated by a LISTDISTINCTS.SUM function, but without displaying the resulting array itself.
Right now, the only solution that i found is with the following syntax :
=STDEV.P(PAD.ARRAY(,REVERSE.ARRAY(LISTDISTINCTS.SUM(InputRange,SumColumn,,FALSE),FALSE,TRUE),,1))
ie i am first “reversing” the 2 columns of the output array before “cropping” it with PAD.ARRAY and applying the statistical function.
It works well, but I am sure there might be another more straightforward / faster solution that would require less array manipulations / CPU time in order to achieve the same result.
Any tip will be very much appreciated
Thanks a lot for the feedback
Eddy

• fastexcel says:

Hi Eddy,
You should be able to use INDEX with 0 as the row argument to return the second column INDEX(Array,0,2) – so something like this
=STDEV.P(INDEX(LISTDISTINCTS.SUM(InputRange,SumColumn,,FALSE),0,2))

2. Eddy says:

Hi Charles,
Thanks a lot for the prompt feedback.
This works perfectly fine. I am so used to recover single cell value from an INDEX search that I forgot that one could also get a range, a column or a row as an output….
Eddy