New Cardinality Estimator Part 6 – Simple JOIN And Foreign Key

In this article we will check if PK-FK relationship is respected by new cardinality optimizer in case of a simple JOIN with tables in the mentioned relation. The old CE did not meet expectations, we hope that new CE will handle it better.

We will use again the AdventureWorks2012 database and the JOIN between Production.TransactionHistory and Production.Product tables. The transaction table has a key (ProductID) which refers the same key from the product table. So, each row in the transaction table has some value in the ProductID and this value has corresponding row in the product table.

First we’ll look how the old CE deals with it. Therefore we set trace flag 9481 at the statement level to instruct the optimizer to use old CE. Let’s assume the following query:

SELECT  t.TransactionID, t.TransactionDate, t.ActualCost, t.ProductID
FROM Production.TransactionHistory t
INNER JOIN Production.Product p ON t.ProductID = p.ProductID
OPTION (QUERYTRACEON 9481);

Let’s observe the execution plan:

Figure6_1

In the execution plan we can see the transaction table only. Since only columns from this table are returned, the product table is used as a filter. However, the optimizer is aware of the foreign key relationship and uses these metadata to ensure that it is not required to access the product table. This is expected behavior (and the same happens in new CE) and so far everything looks perfect.

However, in reports or lists IDs are not welcome; users would rather have some descriptions instead of IDs. Therefore we will update our query according to these expectations. We will look up the product name from the product table and show it instead of product id.

SELECT t.TransactionID, t.TransactionDate, t.ActualCost, p.Name
FROM Production.TransactionHistory t
INNER JOIN Production.Product p ON t.ProductID = p.ProductID
OPTION (QUERYTRACEON 9481);

Let’s observe the plan again:

Figure6_2

This time the product table appears in the plan and this is, of course, expected otherwise the query engine would not be possible to return the product name. However, the Estimated Number of Rows is underestimated. The fact that we should return the product name in addition to transaction table attributes does not change the number of returned rows. The product table should not change the cardinality of the join and the final cardinality is simple cardinality of the transaction table. Each row in the transaction table has exactly one corresponding row in the product table and this ensured by the PK-FK relationship. So, the optimizer does not respect the foreign key in this case. This is definitely a bug. It is reported in this connect item. An estimation improvement can be achieved by using additional trace flag 2301 to support advance optimization decisions.

SELECT t.TransactionID, t.TransactionDate, t.ActualCost, p.Name
FROM Production.TransactionHistory t
INNER JOIN Production.Product p ON t.ProductID = p.ProductID
OPTION (QUERYTRACEON 9481, QUERYTRACEON 2301);

Let’s check the plan:

Figure6_3

The execution plan is of course the same, but the estimation is closer to the actual number of rows. But it’s still not correct. And in this case (PK-FK relationship) it should be. We should not expect that implementation of cardinality estimation model always come up with accurate estimations, this is not possible, but in this particular case we expect this because we provide all necessary metadata (PK, FK) which can help the optimizer to get correct estimations. The final status of the connect item is “won’t fix” but I am curious to see the same query executing under the new CE. Let’s execute it with new CE (just remove the option a the end of the statement) and look at the execution plan and estimations:

Figure6_4

It seems that it’s fixed in new version! This is an expected result and it looks promising for this query pattern. In order to see if the same happens for other queries I have executed 86 such queries in the AdventureWorks2012 database (join of two tables with PK-FK relationship) with old and new CE and here are the results.

Figure6_5

With  old CE only for 18 queries (of 86) has an absolutely accurate estimation, while with new CE 78 queries have perfect estimation (more than 90%). This is a significant improvement, but as we can see the estimated number of rows does not match the actual one for all queries under new CE. In almost all cases (84 of 86) new CE comes up with better or the same estimation as the old CE, but in two queries the old CE did it better. If you are curious to see when the old CE handles better a join with foreign key relationship check the following queries:

SELECT sp.BusinessEntityID, p.LastName
FROM Sales.SalesPerson sp
INNER JOIN Person.Person p ON sp.BusinessEntityID = p.BusinessEntityID
OPTION (QUERYTRACEON 9481);
GO
SELECT sp.BusinessEntityID, p.LastName
FROM Sales.SalesPerson sp
INNER JOIN Person.Person p ON sp.BusinessEntityID = p.BusinessEntityID;

First query is executed under the old CE and the second with newest version available in SQL Server 2014 CTP2. Let’s compare the executions:

Figure6_6

In this case the old CE made a correct estimation and new CE did not handle it well. The query patter is the same, the tables are small. I am still investigating reasons why this query is so specific that new CE struggled with it and will post the results in one of future articles. I am playing with extended event sessions including the XE event.  In this Bob Beauchemin’s (blog | twitter) post you can find how to explore cardinality execution details under new CE via Extended Events.

Related Articles