New Cardinality Estimator Part 2 – Modulo Operator

Three months ago Transact SQL Guru Itzik Ben-Gan (blog | twitter) posted a small T-SQL puzzle with a question why should you use “greater than or equal to” operator instead of “equal to” when you compare the result of modulo 2 operation with the constant 1.

Figure1

The result of a Modulo 2 operation is reminder of the division with the number 2. So, the possible values for the result are 0 and 1. Therefore the result of the operation “= 1” is logically equivalent to “>= 1”. So, his question does not make much sense from logical point of view. But, as you guess, there is a reason behind this tricky question, which really makes sense. Let’s find it!

We’ll create a very simple table with two integer columns and 10M rows. Use the following code  to create and populate the test table (the help function dbo.GetNums is developed by Itzik Ban-Gan and you can download it here).

IF OBJECT_ID('dbo.TestModulo', N'U') IS NOT NULL DROP TABLE dbo.TestModulo;
CREATE TABLE dbo.TestModulo (
id INT IDENTITY(1,1) NOT NULL,
c1 INT NOT NULL,
CONSTRAINT PK_TestModulo PRIMARY KEY CLUSTERED (id ASC)
);
GO
INSERT INTO dbo.TestModulo(c1)
SELECT 1 + ABS(CHECKSUM(NEWID())) % 100000
FROM dbo.GetNums(1,10000000);
GO

Now when we populated the table let’s write a simple query which returns all rows from the table where value in the c1 column is an odd number:

SELECT * FROM dbo.TestModulo WHERE c1 % 2 = 1;

Let’s execute it and look at the execution plan. The query returns (on my machine, on your machine it could be slightly different) 5.000.557 rows. This result is expected since the way we populated the table assumes uniformly distribution of values in the c1 column. That’s what we expected. Surprising, the SQL Server query optimizer expects only 177.828 rows!

Figure2

Strange…What would be the estimated number of rows for a query which returns even numbers? As you guess the value is the same. When we combine both statements with the UNION operator we cover the whole range of values in the column c1.

SELECT * FROM dbo.TestModulo WHERE c1 % 2 = 1
UNION ALL
SELECT * FROM dbo.TestModulo WHERE c1 % 2 = 0;

As all rows will be returned as Estimated Number of Rows we expect a value equals or at least near to the total number of rows in the table. However, in the execution plan we can see that the SQL Server estimator missed 96.44% rows!
Figure3

The number of output rows is significantly underestimated. Now it’s clear that Itzik trick tries to improve the cardinality estimation. In his example he got sort warnings in the execution plan due to underestimation. With a greater or equal operator he tried to get a higher value for Estimated Number of Rows.

SELECT * FROM dbo.TestModulo WHERE c1 % 2 >= 1;

Figure4

In his case this assumption was good enough to eliminate sort warnings spills in tempdb.

But why is discrepancy between estimated and actual number of rows is more than order of magnitude? Maybe because our table has a lot of rows (10M)? What would be estimation for a smaller table? Again 1.78%? No. It would be a little bit better, but again far away from the actual number of rows. The cardinality estimation for predicates with modulo operator depends on table size. If a table contains 10 rows the estimated number of rows is 5.6 and that is OK. For a table with 10K rows, the estimation falls to 10%. In a million rows table SQL Server optimizer expects only 3.16% of rows as result of a modulo 2 operation. The following table shows how cardinality in percent of total rows depends on table size.

T1

And here is a graphical representation.

Figure5

And you know what? The estimated number of rows does not depend on the value of divisor in the modulo operation! Cardinality for the predicate with modulo 2 is the same as the cardinality with modulo 956! Yes, it is! The estimated number of rows depends on table size only! This is definitely not something we would expect, and differs significantly from real workloads.

New Cardinality Estimator in SQL Server 2014

Unlike SQL Server 2012 and previous versions new cardinality estimator in SQL Server 2014 CTP2 does not take into account the table size. The cardinality estimation for the modulo 2 operation is always 50%, for the modulo 3 33.33%, modulo 4 25% and so on. That’s the significant change in new CE. Let’s see the estimation for our initial query with new CE:

SELECT * FROM dbo.TestModulo WHERE c1 % 2 = 1;

Figure6

Now it looks great. In this particular case even perfect, but in any other case the estimated number of rows would be more realistic as prior SQL Server 2014. When we use modulo operator we usually expect uniformly distribution of values and new CE better implements this assumption.

So, Itzik does not need to change his code anymore. But wait! He already changed it. It seems that his query version with “greater than or equal” operator will be outperformed by the “equal” operator in new CE. Let’s check this quickly:

SELECT * FROM dbo.TestModulo WHERE c1 % 2 >= 1;

The execution plan looks exactly the same as the plan shown in previous Figure for the query with the equal operator. So, Itzik does not need to change the code, his workaround still works. It is also important that new feature or fix not only corrects an existing problem, but also does not break existing workarounds. And again, since these two queries are fully logically equivalent, the fact that SQL Server 2014 comes up with the same estimation for them, confirms that the way how new CE handles modulo operator is not only more accurate but logically consistent and stable.

Conclusion

Cardinality estimation for queries with modulo operator in new CE is more realistic, logically intuitive and stable. It does not depend on table size, only on divisor, which is for most workloads and distributions more realistic und would help generating more suitable execution plans and more appropriate memory grants.

Modulo operator is definitely not a frequently used operator, so this improvement will not have significant impact to overall performance of your workload, but if you need to use it, you will appreciate changes in new cardinality estimator.

Related Articles