Category - Performance

SQL Server Performance

Miracle of OPTIMIZE_FOR_SEQUENTIAL_KEY

SQL Server 2019 provides another option to deal with the last-page insert contention – OPTIMIZE_FOR_SEQUENTIAL_KEY. It improves throughput for high-concurrency inserts into an ascending clustered index. Unlike the solutions dealing with hash keys, partitioning or reverse indexing, this one does not change the nature of the clustered key – it remains ascending – and thus it does not affect business logic at all.
To get exhaustive information about this feature (motivation and implementation details), I would recommend the excellent article Behind the Scenes on OPTIMIZE_FOR_SEQUENTIAL_KEY, written by Pam Lahoud, Program Manager with SQL Server Engineering Team. In this post, I will share with you some very exciting (and surprising) test results.

I used the following code to create and populate sample tables and create procedures. To compare the execution times for the INSERT statements, I used Query Store with the default settings, except for the INTERVAL_LENGTH_MINUTES attribute – just to easily compare the runtime stats.

ALTER DATABASE OptSeqKey SET QUERY_STORE =ON (INTERVAL_LENGTH_MINUTES = 5);

To produce a last page contention friendly workload, I used the ostress tool. I ran the following commands with about 5 minutes interval distance. By taking this approach, I can easily analyze wait stats – all entries I’ll find there belongs to a specific testing scenario. Here are my test scenarios:

  • First test: calling the P1 stored procedure that performs 50 inserts into a “normal, non-optimized” disk table with an ascending clustered key with 300 parallel sessions.

    .\ostress -E -dOptSeqKey -Q"EXEC dbo.P1" -SMySQL2019Server -r100 -n300 -q
  • Second test: calling the P2 stored procedure that performs 50 inserts into a table optimized for sequential key, with an ascending clustered key with 300 parallel sessions.

    .\ostress -E -dOptSeqKey -Q"EXEC dbo.P2" -SMySQL2019Server -r100 -n300 -q
  • Third test: calling the P3 stored procedure that performs 50 inserts into each of the two tables, with 300 parallel sessions.

    .\ostress -E -dOptSeqKey -Q"EXEC dbo.P3" -SMySQL2019Server -r100 -n300 -q

That means, the first test inserts 1.5M rows into the normal table, the second 1.5M rows into the optimized table, while the third inserts 3M rows (1.5M rows in each table). Now, the exciting part – results.

In my Query Store, the plan with the plan_id of 1 inserted a row into the normal table, the plan with the id of 5 was used to insert into the optimized table. The following query returns the average execution time for two inserts statements:

SELECT plan_id, runtime_stats_interval_id, count_executions, CAST(avg_duration AS INT) avg_duration
FROM sys.query_store_runtime_stats WHERE plan_id IN (1, 5) ORDER BY runtime_stats_interval_id, plan_id;


BlogPost1

And here are results in chart format:

BlogPost2

As you see, the average execution time for a single INSERT (the first blue bar, 16.5 ms) is very high with 300 parallel sessions, I definitely hit the contention. When I ran the same workload against the table with the OPTIMIZE_FOR_SEQUENTIAL_KEY turned to ON, the throughput looks better – the average execution time (the first orange bar) is reduced about 28%. This is what we expected (from Microsoft documentation), and it shows that this feature helps, with contention convoys.

I tested it also with different number of parallel sessions, and here are results of one my tests in tabular and chart format:

BlogPost3


BlogPost4.png

The results show that the feature should be used for tables, where latch convoys already happen or are about to happen. In a not-so-heavy workloads, it brings a small overhead. You can see that inserts with 100 parallel sessions are 16% slower for a table with the optimized sequential key. However, when you hit the convoy, the feature definitely helps.

Let’s analyze now waits captured by Query Store. The following figure shows waits for the first two tests, respectively:

BlogPost5.png

As we assumed, we hit the convoy; the wait stats shows clearly that inserts into a normal table were waiting for resources from the Buffer Latch category. From the lower part of the figure, we can find out that the BTREE_INSERT_FLOW_CONTROL wait type seems to be categorized as Unknown. Nevertheless, the overall results are better.

All mentioned results and numbers are more-less expected. What I definitely did not expect are significant improvements for inserts into the optimized table in my third test, where I executed both procedures. With this tests, inserts into a normal table behaved similar to the first test, where we inserted in the normal table only (16.5 ms vs. 17.1ms). However, inserts into the optimized table were significantly faster – the orange bar is 6x smaller than the blue one (11.9ms vs. 2.7ms)! When we compare the average execution time for the insert with the initial execution time for non-optimized table, we see an incredible throughput improvement of 84%!

When we look into the Query Store, we can see that BTREE_INSERT_FLOW_CONTROL is significantly reduced compared to the case where we ran only optimized inserts:

BlogPost6.png

EDIT:

It seems I was too excited about the feature and wanted to believe that in a mixed contention workload, the code that deals with latch convoys in the optimized for sequential key table somehow interferes with the poor latch contention in a non-optimized table. However, the numbers in the third tests were too good to be fully true. The numbers are correct, but in the 3rd test, where both inserts are invoked, the insert in the normal table introduces a delay to the second call, therefore reduces the frequency of the incoming calls that insert into the optimized table (as the opt_for_seq_key does for the incoming threads). The exec time for the second INSERT dropped because the incoming calls are slowed down, and this time is, of course, not taken in these numbers.

However, all my tests with 200+ parallel sessions on the 8-core machine for different tables clearly show that inserts into a table with the optimized key are 20-30% faster.

Thanks for reading!

Dude, Where’s My Forced Plan?! – Part 2

In my previous post about Query Store, you saw that the persistence of forced plans does not imply that they are used. This post brings another example where forced plans are still in the system, with no failures, but they are not applied against the desired query.

For this demo, we need a database that is a part of the availability group. Within a database we need only one sample table, one user table type, and one stored procedure:

DROP TABLE IF EXISTS dbo.T;
CREATE TABLE dbo.T(
id INT IDENTITY(1,1) NOT NULL,
c1 INT NOT NULL,
c2 INT NULL,
CONSTRAINT PK_T PRIMARY KEY CLUSTERED (id)
);
GO
INSERT INTO dbo.T(c1)
SELECT 1 + ABS(CHECKSUM(NEWID())) % 10000
FROM dbo.GetNums(100000);
GO


CREATE TYPE dbo.Tvp AS TABLE(
Id INT NOT NULL PRIMARY KEY CLUSTERED
)
GO
CREATE OR ALTER PROCEDURE dbo.GetList
(@tvp AS dbo.Tvp READONLY)
AS
SELECT t.*
FROM dbo.T t
INNER JOIN @tvp tvp ON t.id=tvp.Id;
GO

Let’s ensure that Query Store is empty and turned on:

ALTER DATABASE A SET QUERY_STORE CLEAR;
GO
ALTER DATABASE A SET QUERY_STORE = ON (QUERY_CAPTURE_MODE = AUTO);
GO

Now, we can execute our sample query multiple times:

DECLARE @t AS dbo.Tvp;
INSERT @t SELECT TOP (10) id FROM dbo.T ORDER BY 1 DESC;
EXEC dbo.GetList @t;
GO 3

The execution is fast, with a Nested Loop join based execution plan. Let’s suppose we want to be sure that this plan will be used for all further calls, and we want to achieve this by forcing the plan. This is not always a good idea, but in this demo, we want to have any forced plan. When we execute the query again, we should see in Query Store something like this:

post1
The plan forcing is respected, all fine.

One of the main advantages of Query Store is that data belongs to the database. Thus, it should survive server crashes and failovers. In case of a failover, we should see a similar screen when we run the query on another server, which takes the role of the primary server after the failover. After the failover, I executed the same query again (against a new primary):

DECLARE @t AS dbo.Tvp;
INSERT @t SELECT TOP (10) id FROM dbo.T ORDER BY 1 DESC;
EXEC dbo.GetList @t;
GO 3

When I looked at the Query Store report, I saw no new executions for this query:

post2.png

I checked then the sys.query_store_query DMV and figured out, why these executions are not captured:

SELECT * FROM sys.query_store_query WHERE query_hash = (SELECT query_hash FROM sys.query_store_query WHERE query_id = 35);



post3

Query Store has instantiated new query_id (57) for my query, and executions are associated to it:

post21

Let’s look closer at the batch_sql_handle attribute. In the following figure, you can see the only difference:

post4.png

The selected number are hex numbers with value of 20 and 16 respectively. Where is this number coming from? This is the database_id attribute. And yes, in this example, my databases have different database_ids:

post5

We have again a situation where query_id has been changed for a query that we did not touch at all, this time because of different database_id attributes on primary and secondary.

Conclusion

When you force a plan for a query that uses table variables or table-valued parameters, you have to check frequently whether it is still in use, since the uniqueness of query_id is very weak and can be affected even if you did not change the code.

Thanks for reading.

SQL Server 2019 CE, Local Variables and Nullable Columns

In this article, I will demonstrate a small change in the Cardinality Estimator component under the latest compatibility level – 150 (SQL Server 2019).

Let’s first create and populate a sample table:

DROP TABLE IF EXISTS dbo.T;
CREATE TABLE dbo.T(
id INT IDENTITY(1,1) NOT NULL,
c1 INT NOT NULL,
c2 INT NULL,
c3 CHAR(5) DEFAULT 'test',
CONSTRAINT PK_T PRIMARY KEY CLUSTERED (id)
);
GO
CREATE INDEX ix1 ON dbo.T(c1);
GO
CREATE INDEX ix2 ON dbo.T(c2);
GO
INSERT INTO dbo.T(c1)
SELECT 1 + ABS(CHECKSUM(NEWID())) % 10000
FROM dbo.GetNums(1000000);
GO

--update the c2 column to not null for about 5% rows
UPDATE dbo.T SET c2 = c1 WHERE c1 >= 9500;
GO

Ensure that the database runs under the compatibility level 140 and consider the following simple query with a local variable:

ALTER DATABASE New SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c1 = @v;
Here is its execution plan:

e00
This is a very known situation – an estimation for an equality predicate with local variable. The estimated number of rows is calculated as the All density attribute from the stats density vector multiplied by the Rows from the stats header.

e02


SELECT 0.00010002*1000000;
--100.02000000

The c1 column does not allow NULLs. What would be the estimation for a nullable column? The c2 column in our table is nullable. Let’s have a check:

DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c2 = @v;


e01

The estimation is significantly higher and even lead to a plan with Clustered Index Scan! However, it follows the same formula for the estimation. Let’s have a look at the statistics object:

e03

By using the same formula, the estimated number of rows is:

SELECT 0.001992032*1000000
--1992.032000000

This is exactly what we see in the execution plan. OK, that was CL 140, let’s see how SQL Server 2019 handles this simple case.

When we switch to CL 150, the plan and estimations for the c1 column (non-nullable) are the same. However, the estimation for the nullable column is changed!

ALTER DATABASE New SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c2 = @v;
Here is the plan:

e04

This time, the estimation is not so high, and we are getting more appropriated plan, based on the Nested Loop join operator. But, where is 104.397 coming from? Let’s look at the statistics again:

e05

Since the column is nullable, CE apply the All density against non-nullable rows only. Thus, the formula seems to be All density x (RowsEQ_ROWS for RANGE_HI_KEY = NULL).

SELECT 0.001992032*(1000000-947697);
--104.189249696

Well, it is not exactly 104.397, but very, very close to it.  I did not find official documentation for this, therefore cannot confirm that the mentioned calculation is always correct. However, this new behavior makes more sense: the average value from the stats density vector refers to non-nullable set of values, and it’s wrong to multiply it with all rows.

Funfact – the behavior is actually not new at all – that’s exactly how the old CE (under the compatibility levels <=110) handles this case.

I would like to thank my colleague Vladimir Đorđević for making me aware of this behavior.

Thanks for reading.

Dude, Where’s My Forced Plan?! – Part 1

Query Store is a great performance troubleshooting tool. I like it; in the past three years, it saved me a lot of time and money. It allows you to quickly identify exec plan regressions. By forcing an old plan, you can almost instantly implement a workaround for the issue and buy a time necessary for the final fix. Forcing plan acts (should act) as “a temporary solution” for plan regression issues. But until you finally fix the underlying issue, it should be considered as a solution:  a temporary, but stable and steadily solution. However, forcing plans does not always work. You can find interesting articles about plan forcing in the excellent series of Query Store articles written by Erin Stellato (blog | twitter). In this article, I won’t write about forcing plan failures, where a plan fails because of missing index or objects, but describe a situation where everything seems to be fine – no failure at all – but it does not work.

To demonstrate the issue, we need to create a simple user table type and use it in a stored procedure in the WideWorldImporters sample database.

CREATE TYPE dbo.IntList AS TABLE(
fValue INT NOT NULL PRIMARY KEY CLUSTERED
)
GO
CREATE OR ALTER PROCEDURE dbo.GetOrderList
(@tvp AS dbo.IntList READONLY)
AS
SELECT o.*
FROM Sales.Orders o
INNER JOIN @tvp t on o.OrderID=t.fValue;
GO

It’s very simple: we’ll invoke the stored procedure to get order details for a given list of order ids. Let’s ensure that Query Store is empty and turned on:

ALTER DATABASE WideWorldImporters SET QUERY_STORE CLEAR;
ALTER DATABASE WideWorldImporters SET QUERY_STORE = OFF;
GO
ALTER DATABASE WideWorldImporters
SET QUERY_STORE = ON (QUERY_CAPTURE_MODE = AUTO);
GO

Now, run the following code that calls the stored procedure with 10 order Ids, and after clearing the cache calls it again, but this time with 2000 order ids:

DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (10) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO 3
DBCC FREEPROCCACHE;
GO
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (2000) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;

We expect here two plans: a Nested Loop Join based one for the first query and a Merge or Hash Join plan for the low selective second query. Let’s look at the Query Store to see, how this stored procedure executes. Since Query Store works with queries rather than with procedures, we need to identify the query_id of our SELECT statement:

SELECT qs.query_id, q.query_sql_text, qs.query_hash, qs.batch_sql_handle
FROM sys.query_store_query AS qs
INNER JOIN sys.query_store_query_text AS q ON qs.query_text_id = q.query_text_id
WHERE object_id = OBJECT_ID('dbo.GetOrderList')
ORDER BY qs.last_execution_time DESC

I got the value 1 for the query_id  (in your case it could be different, depending on your workload). Here is what we can see in the Tracking query report:

P1


P2

As we expected, we see two different plans: Nested Loop Join for 10 Ids and Merge Join for 2000 order ids.

Let’s assume now that we want to have the Nested Loop Join plan for our query, and that we have no time or access to the code to achieve this otherwise than forcing the plan. The assumption is not realistic for this simple query, but it’s reasonable in case of complex queries with complex plans on production system. Let’s force the lower plan and run the following code:

DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (10) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (2000) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO

What we expect to see now is the same execution plan for both executions. Let’s check in the Query Store report:

P3

We see that SQL Server reused the same plan for both calls, exactly one that we forced. All fine.

What would happen, when we clear the cache and call the stored procedure with parameters in the opposite order, i.e:

DBCC FREEPROCCACHE;
GO
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (2000) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (10) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO

Let’s observe the Query Store report:

P4

What we see now is the third execution plan. The plan that we forced is not reused this time, SQL Server has created a new one. But this plan is also a Nested Loop join plan; it reflects the shape defined by forcing the plan. When you look at its XML representation, you would find the UsePlan attribute with the value True. Since there was no plan in cache, a new one is created, with the same shape but different estimations. Although it’s not exactly the same plan, it is still the plan we want to see – a Nested Loop Join based one. Still all fine – forcing plan works.

Let’s imagine now that we rollout a new version of the application. There were no changes in our procedure, but somebody included it in the installation and this statement has been executed.

CREATE OR ALTER PROCEDURE dbo.GetOrderList
(@tvp AS dbo.IntList READONLY)
AS
SELECT o.*
FROM Sales.Orders o
INNER JOIN @tvp t on o.OrderID=t.fValue;
GO

From a business logic point of view, there are no changes, altering a procedure with the same text should only mean that the new plan will be generated for it. And this should not be a problem because with forcing plan we ensured that the “good plan” will be used. Let’s confirm this by executing the following code:

DBCC FREEPROCCACHE;
GO
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (2000) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO 10
DECLARE @t AS dbo.IntList;
INSERT @t SELECT TOP (10) OrderId FROM Sales.Orders ORDER BY 1 DESC;
EXEC dbo.GetOrderList @t;
GO 10

Let’s have a look at the Query Store:

P5

When you observe this screenshot, you might think – hey dude, it seems you used the old screenshot, it does not show new executions! No, I did not, I refreshed the report, the report is correct. But, where are the recent 20 executions? Well, let’s execute again the statament looking for query_id:

P6

Ooops… Our SELECT query got a new query_id! And since a forced plan is associated not to a particular query, but to a given query_id, it’s clear why it does not affect the execution. The forcing plan is correctly configured – there is no forced plan failure, it is just waiting for execution of the query with the query_id 1. If this would happen, it would work correctly, but since a new query_id is associated with our query, the forced plan has nothing to do with it. We can confirm this by observing the Query Store report:

P7

You can see that all 20 executions are done with the Merge Join based execution plan, although we have forced the plan and our query was not changed at all!

Forced plans are stable and will remain in the system only if query_id does not change. And query_id depends on the following five parameters:

  • query_hash  – hash value of query text
  • object_id of the object (e.g. stored procedure) where the query has been executed (0 for standalone queries)
  • context_settings_id (e.g. ANSI standard, etc.)
  • query_parameterization_type – indicating how the query is parameterized
  • batch_sql_handle –  ID of the statement batch the query is part of. Populated only if the query references temporary tables or table variables

Since we call the stored procedure by using table variables, the batch_sql_handle attribute has been changed after we issued the ALTER PROCEDURE statement. Here is a query that returns two queries in the Query Store for the same query text:

P8

You can see that the values in these two rows are slightly different and this small difference canceled all our efforts with forcing plans and revitalized our issue.

Conclusion

Be very careful with forcing plans against queries referencing table variables and table-valued parameters. The query_id attribute for such queries is very unstable and forcing plans even as a workaround could have a very short lifetime. You should use a different approach to fix such issues.

Thanks for reading.

 

SQL Server 2019 Query Store Defaults

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some  graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.

First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour). Bl1

By clicking on the wait category bar, a new screen shows up with ids of queries that contribute to this wait category.

Bl2

You can also use tabular view for initial report (aggergated time for wait categories):

Bl3

The second change is related to default configuration of Query Store. When you enable Query Store for a database without specifying attributes, by using the Properties tab or command:

ALTER DATABASE WideWorldImporters SET QUERY_STORE = ON;

SQL Server adds default values for Query Store parameters. In SQL Server 2019 Microsoft decided to change default values for some of them. Here is the table with the attributes and former and current default values:

Attribute SQL Server 2017 SQL Server 2019
max_storage_size_mb 100 500
interval_length_minutes 60 15
query_capture_mode ALL AUTO
flush_interval_seconds 900 3,000
max_plans_per_query 200 1,000

 

Thanks for reading.

Workshop: Performance and Query Tuning with SQL Server 2017 and 2019

PFTAs part of the SQL Saturday Linz conference, I will have a performance tuning workshop on 17th January in Tabakfabrik in Linz. The full workshop name is Performance and Query Tuning with SQL Server 2017 and 2019. As the name suggests, the workshop covers performance tuning techniques specific for two recent SQL Server versions. It is aimed at application and database developers who already work or plan to work with the latest versions of SQL Server – 2017 and 2019. The first part of the workshop is focused on execution plans, statistics and query optimizer. It covers common mistakes made by developers that lead to poor performing queries and brings tips how to avoid them.

The second part of the workshop includes a detailed view of Query Store, a new powerful troubleshooting tool introduced in SQL Server 2016 and improved in the later versions. You can see how you can use it not only to troubleshoot, but also to fix regressed queries, and it will be demonstrated its power by upgrade to new SQL Server version or changing compatibility level. At the end, you can learn about database engine improvements in the SQL Server 2019 CTP2 feature with the most promising name – Intelligent Query Processing.

The workshop will cover the following topics:

  • How SQL Server uses statistics and what are common issues with them
  • Quick overview of index usage; how to create a useful index
  • Typical mistakes made by developers that lead to poor performing queries
  • Troubleshooting parameter sniffing issues in different SQL Server versions
  • Cardinality estimation issues and tuning queries with large tables
  • Tips for writing well-performed queries
  • Query Store as game changer in SQL Server performance troubleshooting
  • Fixing execution plans for regressed queries by using Query Store
  • Improvements in SQL Server 2019 Intelligent Query Processing

Level: 300 – Advanced

Here is the link for registration: https://bit.ly/2rkj4wX.

Hope to see you in Linz!

Rgds,
M.

I am speaking at Microsoft Sinergija 18 Conference Belgrade

iamspeak.2jpgI’m very glad that I will present again at Microsoft Sinergija 18 conference in Belgrade on October 25th 2018.

Microsoft Sinergija 18 will be held from October 24 through 26 at the Crowne Plaza hotel in Belgrade under the slogan “Digital breakthrough”.

This time, my topic is related to cardinality estimation issues, and I will describe root causes, workarounds and solutions for issues with executions for queries against large tables.

The session name is “Cardinality Estimation Issues with Large Tables” and is scheduled for the second conference day, on 25th Oct at 09.00.

See you in Belgrade!

Table Variable Deferred Compilation in SQL Server 2019 CTP2- Undocumented Options

At the time of writing and publishing the series of articles about the Table Variable Deferred Compilation feature in SQL Server 2019 CTP2, I did not know about undocumented options and hints related to it, but in the meantime, thanks to Dmitry Pilugin (blog | twitter), I learned how you can play with table variables behavior under CL 150.

You can use the undocumented hint DISABLE_DEFERRED_COMPILATION_TV to disable the feature at the statement level. Let’s execute the query from the previous article The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3, but with the mentioned hint:


DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC OPTION (USE HINT ('DISABLE_DEFERRED_COMPILATION_TV'));

The execution plan is almost the same as the one under the CL 140! You can see that the Estimated Number of Rows for the table variable has value 1 instead of 288, as it had  under all previous compatibility levels.

TVDPart4_0

However, it’s not exactly the same plan; you can see this, when you compare the plan under CL 140 and the plan with CL 150 and the DISABLE_DEFERRED_COMPILATION_TV hint:

TVDPart4_1.png
All estimations and plan operators are the same, the only difference is the value for Memory Grant. This is probably related to the mechanism of memory grants in SQL Server 2019 CTP2, but the main thing here is that the hint works and that you can enforce the old behavior of table variables within CL 150.

By using the hint, you can disable the feature at the statement level, but it is also possible opposite – to disable the feature at the database level and enable it for individual statements only.

To disable the feature, you can use this undocumented(!) option:

ALTER DATABASE SCOPED CONFIGURATION SET DEFERRED_COMPILATION_TV = OFF;

With this option, your code would produce the same execution plan as under CL 140 (with a bit more memory grants, as mentioned previously) without the changes! I think, this is important and hope that this option will be available in the SQL Server 2019 RTM and documented, of course.

To use the feature in a database which disable it as default, you can use undocumented hint FORCE_DEFERRED_COMPILATION_TV.

ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 150;
GO
ALTER DATABASE SCOPED CONFIGURATION SET DEFERRED_COMPILATION_TV = OFF;
GO
DECLARE @t TABLE(id INT PRIMARY KEY);
INSERT INTO @t
SELECT DISTINCT message_id FROM sys.messages;
SELECT * FROM @t OPTION (USE HINT('FORCE_DEFERRED_COMPILATION_TV'));
GO

Here is the execution plan for the query:

TVDPart4_2

As you can see, this hint is valid (the query does not return an error), but unfortunately it does not work; it does not affect the cardinality of the table variable.

Again, things discussed here are undocumented, and are part of a community technology preview version, so we can expect many changes until the SQL Server 2019 RTM version.

I would like to thank Dmitryi Pilugin for point me to these undocumented options and thank you for reading.

The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3

In the final article in this series, you will see how new behavior of table variables in SQL Server 2019 CTP2 called Table Variable Deferred Compilation can break existing performance workarounds.

As mentioned in the earlier posts, table variables had fixed cardinality of 1 for more than 18 years. This property caused some bad execution plans, but on the other side, it was also used (or misused) for fixing cardinality estimation issues for complex queries. In this article, I will demonstrate, how table variables can be used to reduce an extreme overestimation in a simple two-joins-query.

The query uses two large sample tables in a parent-child relation: the parent table A has 100M of rows, child table B has about 335M of rows.

TVDPart3_1

On the id columns in both tables there is a clustered index, and on both pid columns exists a nonclustered index.

Consider the following query that returns parent and child rows for a specific pid value in the parent table:

SELECT * FROM A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

Note that I have instantiated the child table twice in order to create a more complex query, and to produce more data volume (since I have no a third large table on my laptop).
Anyway, the above query returns 3.222 rows only. Therefore, we can say, this is a very selective query since it deals with 100M+ tables. However, although appropriate indexes exist, and query is high selective, when you click on Display Estimated Execution Plan, SQL Server suggests a very expensive plan based on Hash Match Join and Merge Join operators, with extremely high estimations:

TVDPart3_2When you click on the enlarge figure of the execution plan, you can see that SQL Server expects more than 51 billion of rows with 16 TB data volume! The reason for this overestimation is out of scope of this article, and will be discussed in one of the future articles; here is important that a workaround is required for overcoming a huge overestimation and this terrible execution plan.

Since I know from business point of view, that the query is selective and cannot return more than 10K rows, I can try to force an execution plan based on Nested Loop Joins.

SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 197 ms.

TVDPart3_3.pngHowever, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables. At the beginning of the Nested Loop based plan, you can see, where the overestimation starts:

TVDPart3_4

In the initial Index Seek operator, you can see that there are only 288 rows in the table A, but SQL Server expects more than 32.000!  You can extract this in a single query:


SELECT * FROM A
WHERE A.pid = 413032;

Here is the execution plan for this very simple query:

TVDPart3_5

This is a significant overestimation – more than 100 times more rows are expected, and this for a single predicate!

The plan of our initial query starts with this overestimation in the outer join branch, and thus it is multiplied with the inner side’s estimation and propagated to the left side of the plan. At the end, we have a huge overestimation  How can we reduce it? Well, we can decompose the big query and extract above query in a table variable, and then join with the other tables. By taking this approach, we would start with 1 instead of 32.078. Let’s ensure that the database is running in the CL 140 and perform the above described change:


ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Let’s check the execution plan:

TVDPart3_6

It looks very good! Nested Loop Joins and only 2MB of memory grant! From 2GB to 2MB with a little help of a table variable! Note, that we did not use LOOP hints anymore; the table variable’s underestimation has corrected the input estimations so that the optimizer decided to use Nested Loop Join operators! The initial estimation in the outer join branch was very wrong, the one in inner branch too, but then we have introduced an underestimation as corrective factor. You can see, that after the first join Actual and Estimated number of rows are very similar (856 vs. 1269). The final estimation is still far away from the actual number of rows, but this maneuver was enough to enforce the optimal execution plan with appropriate memory grant.

As you might guess, new behavior of table variables in SQL Server 2019 will undo our cardinality correction efforts. Let’s change the compatibility level and run the query again:

ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

I have executed this query with the Include Live Query Statistics option, and after two minutes, the execution looked like this:

TVDPart3_7

SQL Server estimates that 3% of processing is done; with this tempo in ca. 60 minutes, the query execution will be finished. And it will still return 3.222 rows only.

So, my workaround does not work anymore, and I have to change the code again, to introduce the LOOP hints again in order to be able to run this query at all:

DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Now, the plan is good again, but the estimations are significantly discrepant from the actual numbers (463M rows), and the memory grant is again greater than 2 GB!

TVDPart3_8

But wait, we can maybe use a hint OPTION (USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')), to enforce the old table variables behavior?! Yes, we can, but the optimizer comes with the same execution plan. If you want the old behavior of table variables, the entire database must be in CL 140. What a bitter disappointment!

Thus, with CL 150, the old workaround is broken. You could reduce the estimation discrepancy by introducing another table variable to store the result of the first join and then join it with the instance of the B table, but the fact is, your solution does not work anymore; you need to find another way to correct the cardinality.

In this example, a table variable does not need to be used for getting an optimal plan. I have used it to demonstrate how new behavior of table variables can break your existing solutions. A better way to enforce an acceptable plan would be to use LOOP hints followed by a hint about maximum memory grants:


SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC
OPTION (MAX_GRANT_PERCENT = 0.001);

Under the both CLs 140 and 150, you’ll get the same, Nested Loop Joins based execution plan, with a limited, but sufficient 13MB memory grant. This is also a workaround; the real solution is to find the root cause of the cardinality estimation issues and to understand why SQL Server dramatically overestimates. This will be discussed in one of the future articles.

Final Thoughts

Cardinality for table variables has been fixed in SQL Server 2019 CTP2. Accurate number of rows instead of 1 row regardless of number of rows in a table variable should improve future execution plans. However, for existing solutions, you should not expect a lot of improvements. Table variables with estimation of 1 exist since SQL Server 2000, and most of the queries that had a cardinality issue due table variables have been already fixed or rewritten.

Therefore, very few of them will be improved with the Table variable deferred compilation. On the other side, some solutions that used (or misused) this 1-row-estimation could be broken with the new improved estimation. In addition to this, non-fixed estimations make queries with table variables more prone to parameter sniffing issues. That does not mean automatically significant performance regressions, but most probably, you would need to change your code and find new workarounds or solutions for cardinality issues, and monitor execution plans, previously considered as stable.

Another important and conservative thing with Table variable deferred compilation is the fact that the feature is available as a part of the 150 compatibility level, and cannot be turned off, without turning off the entire database to an < 150 compatibility level. I would rather have  this feature with an option that let me soft enable or disable it, maybe with the hint OPTION (USE HINT ('TABLE_VARIABLE_DEFERRED_COMPILATION')).

EDIT:
At the time of writing and publishing this article, I did not know about undocumented options and hints related to Table Variable Deferred Compilation, but in the meantime, thanks to Dmitry Pilugin, I learned how you can play with table variables behavior under CL 150. You can find details about appropriate undocumented options and hints in this article.

Thanks for reading!

The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 2

In two previous articles, you saw how new behavior of table variables in SQL Server 2019 can improve execution plans. This article demonstrates another aspect of this behavior, which is by design neither good, nor bad. Depending on data and execution context, it could end up as an improvement, but it can also degrade performance and introduce new problems.

Consider the following stored procedure that deal with a query similar to the one in the previous article. It accepts a single input parameter for the product color and has a table variable that collects product ids for that color, while the second statement returns order details for those products:
CREATE OR ALTER PROCEDURE dbo.TVDeferred_Demo
@Color NVARCHAR(15)
AS
BEGIN
DECLARE @T AS TABLE(ProductID INT PRIMARY KEY);
INSERT INTO @T SELECT ProductID FROM Production.Product WHERE Color = @Color;
SELECT * FROM @T t
INNER JOIN Sales.SalesOrderDetail od on t.ProductID = od.ProductID
INNER JOIN Sales.SalesOrderHeader h on h.SalesOrderID = od.SalesOrderID
ORDER BY od.UnitPrice DESC;
END

Again, we will execute this procedure under both compatibility levels 140 and 150. Let’s ensure that the database is in the SQL Server 2017 related mode and let’s invoke the above created procedure for the first time with the parameter ‘White’:

ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 140;
GO
EXEC dbo.TVDeferred_Demo N'White';

Here is the execution plan for the second query in the stored procedure, with some extracted execution parameters:

TVDPart2_1

As you can see, this is a Nested Loop Join based plan, which is usual, when the processing starts with a table variable, since its cardinality prior to SQL Server 2019 is fixed to 1. You can also see that all values are underestimated, which leads to a memory grant of only 1 MB.
When you call the same procedure with another parameter, it will reuse the plan from cache. What would be the plan, if we would call the procedure with a parameter which is not so high selective? Let’s ensure that there is no plan in cache, and call the same procedure with the parameter ‘Black’:


DBCC FREEPROCCACHE;
GO
EXEC dbo.TVDeferred_Demo N'Black';

When you observe the execution plan carefully, you will see exactly the same plan. Although, there are 93 products with the black color, all estimations and plan decisions remain the same, because the cardinality of the input table variable did not and could not be changed – it was still 1.

TVDPart2_2

Since the actual number of rows is significantly greater than in the call with the parameter ‘White’, you can see here sort warnings because this time 1MB of memory grant was not sufficient for sorting. But, the execution plan is exactly the same as for the first call.

Prior to SQL Server 2019, the execution plan for the second query in this stored procedure was always the same, regardless of parameter used for the first invocation and thus plan generation. Since the table variable has cardinality of 1, all estimations and the execution plan will be the same. We can say, using a table variable in this stored procedure and passing it to the second query neutralizes parameter sniffing effect. That does not mean, this is good or bad for all executions (you saw sort warnings and they are always bad), but the plan was stable, it did not change even after failover or clearing cache. If you call this stored procedure usually with high selective parameters, you can consider this plan as a good plan.

In SQL Server 2019, since table variable could have different cardinality, this stored procedure is prone to parameter sniffing and depending on the first execution parameter, you can have different execution plans:


ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 150;
GO
EXEC dbo.TVDeferred_Demo N'White';
GO
DBCC FREEPROCCACHE;
GO
EXEC dbo.TVDeferred_Demo N'Black';
GO

Here is the execution plan under CL 150 with a high selective parameter:

TVDPart2_3.png

And another one, when the first invocation is with a low selective parameter:

TVDPart2_4

You can see two completely different plans (different plan operators and execution parameters). The only difference between two calls is the number of rows in a table variable, therefore in SQL Server 2019 cardinality of the table variable can manage the entire execution plan. Unlike with SQL Server 2017, the execution plan depends on the parameter used for the first execution. If you used a low selective parameter for the first execution, but most of the calls are with the high selective parameters, you will not be happy with the new table variable behavior. And even if the current plan is good for your workload, there is no guarantee that after it disappear from cache the new one will be the same. With old table variable behavior the plan was stable.

Queries within stored procedures in SQL Server 2019 that use table variables will be more prone to parameter sniffing. Although these estimations will be more accurate than the ones prior to SQL Server 2019, that does not mean the execution plan will be better. They could be better then your current plan, but, as you saw in this article, they could introduce performance regression and instability.

Why then I am classifying it as a “bad” thing? Simply because I do not have significantly bad plans on the production system caused by the table variable’s cardinality. Yes, I had them initially, but I had to fix them, I could not wait until a better estimation comes. Therefore, I do not expect significant improvements in production workloads, the code can be aligned with this new behavior for future development, of course, but existing code is more-less fixed. However, some plans that are considered as stable plans due to fixed table variable’s cardinality prior to SQL Server 2019, can be now prone to parameter sniffing, and most probably, we need to fix them. Therefore, I am not so happy wit this aspect of the new table variables’s behavior.

In the next article, and the last one in this series, you’ll see another important side effect of this behavior, that can break some of performance workarounds implemented in solutions in previous SQL Server versions.

Thanks for reading.