## The SpeedTools FILTERIFS function: Design and Implementation Part 1

Excel users have been using SUMPRODUCT and array formulas to create multiple-condition formulas for many years. This is a powerful technique, but can be painfully slow with large amounts of data. Pivot Tables and Excel 2013’s PowerPivot can provide good solutions in some instances, and the introduction of SUMIFS in Excel 2007 gave a fast alternative for some scenarios.

But there is still a need for a powerful, dynamic function that can perform better than SUMPRODUCT/ array formulas, so let me introduce my attempt at creating one : FILTERIFS.

### FILTERIFS Design Objectives

• Speed of calculation – multi-threaded, exploit sorted data and clustered data.
• Extended criterion types to include AND/OR, Lists, Wild Card Patterns, Regular Expressions, Calculated Columns and Arrays etc.
• Dynamic calculation in the same way as other Excel functions.
• Extend multiple condtions to many more functions by outputing an array to other functions such as SUM, MEDIAN, LISTDISTINCTS, VSORT etc, or directly as a multi-cell array formula.

The original implementation was done using a VB6 automation addin, but lack of multi-threading and 64-bit support in VB6 lead me to re-implement as a C++ XLL.

### So how do you make it fast?

The idea is to process each criterion in turn using only the rows that meet all the criteria processed so far, thus avoiding the SUMPRODUCT/array formula approach of evaluating all the criteria for all the rows.
Criteria operating on sorted columns are processed first using a fast High-Low binary search modified for relational operators.
Non-sorted columns and criterion types like Regex are then processed using linear search in a sequence designed to minimise data transfer/coercion time.
And using a C++ XLL allows multi-threading and fast execution.

### FILTERIFS Syntax

The syntax uses a similar approach to SUMIFS to pass the criterion as a string concatenation of a relational operator and a value. Because the value is passed as a string FILTERIFS has to do some datatype conversions of the value to match the datatype of the criterion column (and hopefully avoid some of the SUMIFS bugs in this area).

FILTERIFS( nSortedCols, InputRange, ReturnCol, CriteriaColumn1, Criteria1,
CriteriaColumn2, Criteria2, … , [“#OR#”, nsortedCols,] CriteriaColumnx, Criteriax, …)

nSortedCols gives the number of columns which are sorted in the InputRange

InputRange is a range reference to the data containing the sorted columns and return column.
The data can contain a header row of names for the columns.

ReturnCol is the header name or number of the column within InputRange to return results from.

Criteria Column gives either the name/number of a column in InputRange, or a range reference to an independent column, or an array or an expression returning a column of data to be used as the criterion column.

Criteria is the expression used to filter the criterion column.
This can be a relational operator (=, >=, <=, >, <, ¬=, ~, ¬~, ~~) ( ¬ means NOT, ~ means LIKE, and ~~ means Regex) and value.
It can also be a LIST of alternatives to look for, given either as an array ({“FL”,”NY”,TX”} or with a relational operator {“~ABC*”,”~DEF*}) or as a range reference.

#OR# allows you to have multiple alternative sets of criteria.

### FILTERIFS Components

To deliver this fairly complex set of capabilities the function is broken down into a number of component blocks. These are the major ones:

• Handle any header row column names & translate column names and numbers to column indexes.
• Parse and analyse the criterias, storing the result in an array of Criterion structures
• Data Type detection and type-casting of the criteria values
• Find the optimum sequence to process the criteria
• Row-Pairs class to store first-row last-row pairs for the rows that meet the criteria. Methods for this class include Append, Merge, Condense, CountPairs, CountRows etc.
• High-Low binary search for the sorted criteria
• Translate High-Low to rowpairs using the relational operators
• Determine optimum data-transfer/coercion strategy and sequencing for the non-sorted criteria
• Linear Search on row-pairs for non-sorted criteria
• Comparison functions for the various Criterion operators.
• Conversion of row-pairs to results

These components currently result in just under 5000 lines of code.

FILTERIFS Status

As at December 2012 the function is coded and the first phase of testing has been completed. It has taken considerably longer than planned, mainly because the VB6 version made extensive use of EVALUATE, which turned out not to be allowed to multi-thread in C++ and so I had to redesign most of the approach for non-sorted columns.

There is still some performance testing, refactoring and rework to be done but the target is to start Beta3 in early January 2013.

(Assuming that the Mayan calendar is wrong in predicting the end of the world today Friday 21 Decembery 2012).