Category - SQL Server 2019

Scalar UDF Inlining in SQL Server 2019

Scalar UDFs in SQL Server have very bad reputation. You can find a lot of articles why they are bad and why you should avoid them. When you google them, here are article captions you’ll most probably get:

UDF0
It is always suggested to use inline TVF instead of scalar UDFs. This is still valid in most of the cases.

SQL Server 2019 introduces the Scalar UDF Inlining feature. As you might guess, with this feature SQL Server inlines scalar UDFs for you. More details about it you can find in Books Online: Scalar UDF Inlining in SQL Server 2019 . If you would like to know details about the concept and implementation behind it, a great source for this is the Optimization of Imperative Programs in a Relational Database paper.

In this article, I am gonna test this feature by using a sample function that calculates distance between two points. The credits for the function and its invocation go to Hugo Kornelis (blog | twitter) and his great series of articles about Scalar UDFs: T-SQL User-Defined Functions: the good, the bad, and the ugly (part 1 to 3). I will use the Application.Cities table from the WideWorldImporters sample database, invoke the dbo.Distance function and compare the performance under different compatibility levels.

Let’s create and populate our sample table:

USE WideWorldImporters;
--create and populate a sample table
DROP TABLE IF EXISTS dbo.Place;
CREATE TABLE dbo.Place(Id INT PRIMARY KEY CLUSTERED, Lat FLOAT, Lon FLOAT, St CHAR(2));
GO
INSERT INTO dbo.Place
SELECT c.CityID, Location.Lat, Location.Long, sp.StateProvinceCode
FROM Application.Cities c
INNER JOIN Application.StateProvinces sp ON c.StateProvinceID = sp.StateProvinceID;
GO

After inserting rows, the table should have 37.490 entries for places in US states. The next piece of code brings the function definition. As mentioned, you can find the original function definition and considerations here:

--credit for function and comments to Hugo Kornelis
CREATE OR ALTER FUNCTION dbo.Distance (@Lat1 FLOAT, @Lon1 FLOAT, @Lat2 FLOAT, @Lon2 FLOAT)
RETURNS FLOAT
AS
BEGIN;
--Convert from degrees to radians
DECLARE @Lat1R FLOAT = RADIANS(@Lat1), @Lon1R FLOAT = RADIANS(@Lon1),
@Lat2R FLOAT = RADIANS(@Lat2),@Lon2R FLOAT = RADIANS(@Lon2),
@DistR FLOAT, @Dist FLOAT;
-- Calculate the distance (in radians)
SET @DistR = 2 * ASIN(SQRT(POWER(SIN((@Lat1R - @Lat2R) / 2), 2)
+ (COS(@Lat1R) * COS(@Lat2R) * POWER(SIN((@Lon1R - @Lon2R) / 2), 2))));
--Convert distance from radians to kilometers
-- Explanation: Distance in radians = distance in nautical miles * (pi / (180 * 60)), so
-- distance in nautical miles = distance in radians * 180 * 60 / pi
-- One nautical mile is 1.852 kilometers, so
-- distance in km = (distance in radians * 180 * 60 / pi) * 1.852
-- And since 180 * 60 * 1.852 = 20001.6, this can be simplified to
-- distance in km = distance in radians * 20001.6 / pi
SET @Dist = @DistR * 20001.6 / PI();
RETURN @Dist;
END;
GO

Now, we need a sample query to invoke the function. The following query will return the distance between each two places in the state Texas. There are 2.395 places in our sample table that meet this criteria. Therefore, the function will be invoked 5.7 million times. This sound as a good test. Before you execute the query, please ensure that the Discard results after execution option is turned ON. With this option, returned times will represent query execution times, and time required for painting 5.7 million rows in SSMS will not interfere with it. We will execute the query twice, under the 140 and 150 compatibility levels which correspond to SQL Server 2017 and SQL Server 2019 respectively:

ALTER DATABASE WideWorldImporters SET COMPATIBILITY_LEVEL = 140;
GO
SET STATISTICS TIME ON;
--Ensure that the Discard results after execution option is turned ON
GO
SELECT dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon)
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';
GO
ALTER DATABASE WideWorldImporters SET COMPATIBILITY_LEVEL = 150;
GO
SELECT dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon)
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';
GO

Results are quite impressive:
--CL 140:
SQL Server Execution Times:
CPU time = 49594 ms, elapsed time = 63069 ms.
--CL 150:
SQL Server Execution Times:
CPU time = 15672 ms, elapsed time = 2911 ms.

You can see that the CPU time is 3 times lower under the 150 CL, and from the duration point of view, the query is more than 20x faster! When you look at execution plans, you can see that in the first case invoking of UDF is hidden within the Compute Scalar operator, while in SQL Server 2019 UDF definition is expanded and the plan is optimized.

UDF1

The inlined execution plans looks more complicated – SSMS even shows that the first plan is 4 times cheaper –  but this is simple not true and it is caused by completely wrong perception of costs of UDF invocation in the execution plan.

The improvement looks really fabulous, a query is 20 times faster and all you need is to run the database under the latest compatibility level!

With such as big improvement, I was curious about what would be the execution time, if we would write expression in the query instead of function. By taking this approach, SQL Server does not need to switch the context between modules, does not need to invoke function at all. So, let’s compare the following two queries under the 150 CL:

SELECT dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon)
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';
GO
SELECT 2 * ASIN(SQRT(POWER(SIN((RADIANS(a.Lat) - RADIANS(b.Lat)) / 2), 2)
+ (COS(RADIANS(a.Lat)) * COS(RADIANS(b.Lat))
* POWER(SIN((RADIANS(a.Lon) - RADIANS(b.Lon)) / 2), 2)
))) * 20001.6 / PI()
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';

Here are CPU and execution times:
--UDF:
SQL Server Execution Times:
CPU time = 16483 ms, elapsed time = 2952 ms
--Expression:
SQL Server Execution Times:
CPU time = 10219 ms, elapsed time = 5474 ms.

The query that invokes UDF used 60% more CPU resources that the one with expression, but at the same time it is almost 2 times faster! And this is not coincidence; I have executed both queries 100+ times and the first query was always about 2 times faster. Both execution plans are parallel, the one with UDF is still more complicated, but it definitely better handles multi-core opportunities.

UDF2

You’ll get the same results, if you wrap the expression in an inline table valued function:

CREATE OR ALTER FUNCTION dbo.Distance_Inline](@Lat1 FLOAT, @Lon1 FLOAT, @Lat2 FLOAT, @Lon2 FLOAT) RETURNS TABLE
AS RETURN
SELECT 2 * ASIN(SQRT(POWER(SIN((RADIANS(@Lat1) - RADIANS(@Lat2)) / 2), 2)
+ (COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2))
* POWER(SIN((RADIANS(@Lon1) - RADIANS(@Lon2)) / 2), 2)
))) * 20001.6 / PI() AS Dist;
GO
SELECT dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon)
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';
SELECT (SELECT Dist FROM dbo.Distance_Inline(a.Lat, a.Lon, b.Lat, b.Lon)) Dist
FROM dbo.Place AS a
CROSS JOIN dbo.Place AS b
WHERE a.St = 'TX' AND b.St = 'TX';

Results:

--inlined scalar UDF
SQL Server Execution Times:
CPU time = 15158 ms, elapsed time = 3051 ms.
--inline TVF
SQL Server Execution Times:
CPU time = 9967 ms, elapsed time = 5553 ms.

When you look at this example, you can figure out that the advice to use inline-table-valued function instead of scalar UDFs does not always lead you to the best performance. 

I have executed the query with UDF against 140 and 150 CLs for different states and here are some results:

UDF3

For those, who intensive use scalar UDFs in SQL Server, the new version looks like a gift from heaven. The improvement is very impressive. However, the list of limitations also looks very impressive (taken from the article Inlineable Scalar UDF Requirements):

  • The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects (such as NEWSEQUENTIALID())
  • The UDF uses the EXECUTE AS CALLER clause (the default behavior if the EXECUTE AS clause is not specified)
  • The UDF does not reference table variables or table-valued parameters
  • The query invoking a scalar UDF does not reference a scalar UDF call in its GROUP BY clause
  • The query invoking a scalar UDF in its select list with DISTINCT clause does not reference a scalar UDF call in its ORDER BY clause
  • The UDF is not natively compiled (interop is supported)
  • The UDF is not used in a computed column or a check constraint definition.
  • The UDF does not reference user-defined types
  • There are no signatures added to the UDF
  • The UDF is not a partition function

Nevertheless, SQL Server team did a great job with inlining of scalar user defined functions and since the RTM version is not done yet, it is to expect that the list of limitations will be shortened by a few items.

Thanks for reading!

SQL Server 2019 Query Store Defaults

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some  graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.

First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour). Bl1

By clicking on the wait category bar, a new screen shows up with ids of queries that contribute to this wait category.

Bl2

You can also use tabular view for initial report (aggergated time for wait categories):

Bl3

The second change is related to default configuration of Query Store. When you enable Query Store for a database without specifying attributes, by using the Properties tab or command:

ALTER DATABASE WideWorldImporters SET QUERY_STORE = ON;

SQL Server adds default values for Query Store parameters. In SQL Server 2019 Microsoft decided to change default values for some of them. Here is the table with the attributes and former and current default values:

Attribute SQL Server 2017 SQL Server 2019
max_storage_size_mb 100 500
interval_length_minutes 60 15
query_capture_mode ALL AUTO
flush_interval_seconds 900 3,000
max_plans_per_query 200 1,000

 

Thanks for reading.

Workshop: Performance and Query Tuning with SQL Server 2017 and 2019

PFTAs part of the SQL Saturday Linz conference, I will have a performance tuning workshop on 17th January in Tabakfabrik in Linz. The full workshop name is Performance and Query Tuning with SQL Server 2017 and 2019. As the name suggests, the workshop covers performance tuning techniques specific for two recent SQL Server versions. It is aimed at application and database developers who already work or plan to work with the latest versions of SQL Server – 2017 and 2019. The first part of the workshop is focused on execution plans, statistics and query optimizer. It covers common mistakes made by developers that lead to poor performing queries and brings tips how to avoid them.

The second part of the workshop includes a detailed view of Query Store, a new powerful troubleshooting tool introduced in SQL Server 2016 and improved in the later versions. You can see how you can use it not only to troubleshoot, but also to fix regressed queries, and it will be demonstrated its power by upgrade to new SQL Server version or changing compatibility level. At the end, you can learn about database engine improvements in the SQL Server 2019 CTP2 feature with the most promising name – Intelligent Query Processing.

The workshop will cover the following topics:

  • How SQL Server uses statistics and what are common issues with them
  • Quick overview of index usage; how to create a useful index
  • Typical mistakes made by developers that lead to poor performing queries
  • Troubleshooting parameter sniffing issues in different SQL Server versions
  • Cardinality estimation issues and tuning queries with large tables
  • Tips for writing well-performed queries
  • Query Store as game changer in SQL Server performance troubleshooting
  • Fixing execution plans for regressed queries by using Query Store
  • Improvements in SQL Server 2019 Intelligent Query Processing

Level: 300 – Advanced

Here is the link for registration: https://bit.ly/2rkj4wX.

Hope to see you in Linz!

Rgds,
M.

Speaking at the BLBit user group and SQL Saturday conference in Banja Luka

sqlsat791_headerLast week, we had a great time in Banja Luka. On 1st November, together with my friend and fellow Data Platform MVP and book author Dejan Sarka I talked about common database design mistakes made by application and database developers. It was really pleasure to deliver the session in that audience – more than 70 people attended the session, we were very surprised and impressed!

Two days later, on 3rd November, at the SQL Saturday Banja Luka conference, I delivered two sessions:  Common Developer Mistakes with SQL Server and Transact-SQL Performance Tips. I was very happy to see so many people in the room for my sessions  and want to thank all the attendees. It was really pleasure to be part of this successfull conference.

Regards,
M.

I am speaking at Microsoft Sinergija 18 Conference Belgrade

iamspeak.2jpgI’m very glad that I will present again at Microsoft Sinergija 18 conference in Belgrade on October 25th 2018.

Microsoft Sinergija 18 will be held from October 24 through 26 at the Crowne Plaza hotel in Belgrade under the slogan “Digital breakthrough”.

This time, my topic is related to cardinality estimation issues, and I will describe root causes, workarounds and solutions for issues with executions for queries against large tables.

The session name is “Cardinality Estimation Issues with Large Tables” and is scheduled for the second conference day, on 25th Oct at 09.00.

See you in Belgrade!

Table Variable Deferred Compilation in SQL Server 2019 CTP2- Undocumented Options

At the time of writing and publishing the series of articles about the Table Variable Deferred Compilation feature in SQL Server 2019 CTP2, I did not know about undocumented options and hints related to it, but in the meantime, thanks to Dmitry Pilugin (blog | twitter), I learned how you can play with table variables behavior under CL 150.

You can use the undocumented hint DISABLE_DEFERRED_COMPILATION_TV to disable the feature at the statement level. Let’s execute the query from the previous article The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3, but with the mentioned hint:


DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC OPTION (USE HINT ('DISABLE_DEFERRED_COMPILATION_TV'));

The execution plan is almost the same as the one under the CL 140! You can see that the Estimated Number of Rows for the table variable has value 1 instead of 288, as it had  under all previous compatibility levels.

TVDPart4_0

However, it’s not exactly the same plan; you can see this, when you compare the plan under CL 140 and the plan with CL 150 and the DISABLE_DEFERRED_COMPILATION_TV hint:

TVDPart4_1.png
All estimations and plan operators are the same, the only difference is the value for Memory Grant. This is probably related to the mechanism of memory grants in SQL Server 2019 CTP2, but the main thing here is that the hint works and that you can enforce the old behavior of table variables within CL 150.

By using the hint, you can disable the feature at the statement level, but it is also possible opposite – to disable the feature at the database level and enable it for individual statements only.

To disable the feature, you can use this undocumented(!) option:

ALTER DATABASE SCOPED CONFIGURATION SET DEFERRED_COMPILATION_TV = OFF;

With this option, your code would produce the same execution plan as under CL 140 (with a bit more memory grants, as mentioned previously) without the changes! I think, this is important and hope that this option will be available in the SQL Server 2019 RTM and documented, of course.

To use the feature in a database which disable it as default, you can use undocumented hint FORCE_DEFERRED_COMPILATION_TV.

ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 150;
GO
ALTER DATABASE SCOPED CONFIGURATION SET DEFERRED_COMPILATION_TV = OFF;
GO
DECLARE @t TABLE(id INT PRIMARY KEY);
INSERT INTO @t
SELECT DISTINCT message_id FROM sys.messages;
SELECT * FROM @t OPTION (USE HINT('FORCE_DEFERRED_COMPILATION_TV'));
GO

Here is the execution plan for the query:

TVDPart4_2

As you can see, this hint is valid (the query does not return an error), but unfortunately it does not work; it does not affect the cardinality of the table variable.

Again, things discussed here are undocumented, and are part of a community technology preview version, so we can expect many changes until the SQL Server 2019 RTM version.

I would like to thank Dmitryi Pilugin for point me to these undocumented options and thank you for reading.

The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3

In the final article in this series, you will see how new behavior of table variables in SQL Server 2019 CTP2 called Table Variable Deferred Compilation can break existing performance workarounds.

As mentioned in the earlier posts, table variables had fixed cardinality of 1 for more than 18 years. This property caused some bad execution plans, but on the other side, it was also used (or misused) for fixing cardinality estimation issues for complex queries. In this article, I will demonstrate, how table variables can be used to reduce an extreme overestimation in a simple two-joins-query.

The query uses two large sample tables in a parent-child relation: the parent table A has 100M of rows, child table B has about 335M of rows.

TVDPart3_1

On the id columns in both tables there is a clustered index, and on both pid columns exists a nonclustered index.

Consider the following query that returns parent and child rows for a specific pid value in the parent table:

SELECT * FROM A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

Note that I have instantiated the child table twice in order to create a more complex query, and to produce more data volume (since I have no a third large table on my laptop).
Anyway, the above query returns 3.222 rows only. Therefore, we can say, this is a very selective query since it deals with 100M+ tables. However, although appropriate indexes exist, and query is high selective, when you click on Display Estimated Execution Plan, SQL Server suggests a very expensive plan based on Hash Match Join and Merge Join operators, with extremely high estimations:

TVDPart3_2When you click on the enlarge figure of the execution plan, you can see that SQL Server expects more than 51 billion of rows with 16 TB data volume! The reason for this overestimation is out of scope of this article, and will be discussed in one of the future articles; here is important that a workaround is required for overcoming a huge overestimation and this terrible execution plan.

Since I know from business point of view, that the query is selective and cannot return more than 10K rows, I can try to force an execution plan based on Nested Loop Joins.

SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 197 ms.

TVDPart3_3.pngHowever, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables. At the beginning of the Nested Loop based plan, you can see, where the overestimation starts:

TVDPart3_4

In the initial Index Seek operator, you can see that there are only 288 rows in the table A, but SQL Server expects more than 32.000!  You can extract this in a single query:


SELECT * FROM A
WHERE A.pid = 413032;

Here is the execution plan for this very simple query:

TVDPart3_5

This is a significant overestimation – more than 100 times more rows are expected, and this for a single predicate!

The plan of our initial query starts with this overestimation in the outer join branch, and thus it is multiplied with the inner side’s estimation and propagated to the left side of the plan. At the end, we have a huge overestimation  How can we reduce it? Well, we can decompose the big query and extract above query in a table variable, and then join with the other tables. By taking this approach, we would start with 1 instead of 32.078. Let’s ensure that the database is running in the CL 140 and perform the above described change:


ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Let’s check the execution plan:

TVDPart3_6

It looks very good! Nested Loop Joins and only 2MB of memory grant! From 2GB to 2MB with a little help of a table variable! Note, that we did not use LOOP hints anymore; the table variable’s underestimation has corrected the input estimations so that the optimizer decided to use Nested Loop Join operators! The initial estimation in the outer join branch was very wrong, the one in inner branch too, but then we have introduced an underestimation as corrective factor. You can see, that after the first join Actual and Estimated number of rows are very similar (856 vs. 1269). The final estimation is still far away from the actual number of rows, but this maneuver was enough to enforce the optimal execution plan with appropriate memory grant.

As you might guess, new behavior of table variables in SQL Server 2019 will undo our cardinality correction efforts. Let’s change the compatibility level and run the query again:

ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

I have executed this query with the Include Live Query Statistics option, and after two minutes, the execution looked like this:

TVDPart3_7

SQL Server estimates that 3% of processing is done; with this tempo in ca. 60 minutes, the query execution will be finished. And it will still return 3.222 rows only.

So, my workaround does not work anymore, and I have to change the code again, to introduce the LOOP hints again in order to be able to run this query at all:

DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Now, the plan is good again, but the estimations are significantly discrepant from the actual numbers (463M rows), and the memory grant is again greater than 2 GB!

TVDPart3_8

But wait, we can maybe use a hint OPTION (USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')), to enforce the old table variables behavior?! Yes, we can, but the optimizer comes with the same execution plan. If you want the old behavior of table variables, the entire database must be in CL 140. What a bitter disappointment!

Thus, with CL 150, the old workaround is broken. You could reduce the estimation discrepancy by introducing another table variable to store the result of the first join and then join it with the instance of the B table, but the fact is, your solution does not work anymore; you need to find another way to correct the cardinality.

In this example, a table variable does not need to be used for getting an optimal plan. I have used it to demonstrate how new behavior of table variables can break your existing solutions. A better way to enforce an acceptable plan would be to use LOOP hints followed by a hint about maximum memory grants:


SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC
OPTION (MAX_GRANT_PERCENT = 0.001);

Under the both CLs 140 and 150, you’ll get the same, Nested Loop Joins based execution plan, with a limited, but sufficient 13MB memory grant. This is also a workaround; the real solution is to find the root cause of the cardinality estimation issues and to understand why SQL Server dramatically overestimates. This will be discussed in one of the future articles.

Final Thoughts

Cardinality for table variables has been fixed in SQL Server 2019 CTP2. Accurate number of rows instead of 1 row regardless of number of rows in a table variable should improve future execution plans. However, for existing solutions, you should not expect a lot of improvements. Table variables with estimation of 1 exist since SQL Server 2000, and most of the queries that had a cardinality issue due table variables have been already fixed or rewritten.

Therefore, very few of them will be improved with the Table variable deferred compilation. On the other side, some solutions that used (or misused) this 1-row-estimation could be broken with the new improved estimation. In addition to this, non-fixed estimations make queries with table variables more prone to parameter sniffing issues. That does not mean automatically significant performance regressions, but most probably, you would need to change your code and find new workarounds or solutions for cardinality issues, and monitor execution plans, previously considered as stable.

Another important and conservative thing with Table variable deferred compilation is the fact that the feature is available as a part of the 150 compatibility level, and cannot be turned off, without turning off the entire database to an < 150 compatibility level. I would rather have  this feature with an option that let me soft enable or disable it, maybe with the hint OPTION (USE HINT ('TABLE_VARIABLE_DEFERRED_COMPILATION')).

EDIT:
At the time of writing and publishing this article, I did not know about undocumented options and hints related to Table Variable Deferred Compilation, but in the meantime, thanks to Dmitry Pilugin, I learned how you can play with table variables behavior under CL 150. You can find details about appropriate undocumented options and hints in this article.

Thanks for reading!

The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 2

In two previous articles, you saw how new behavior of table variables in SQL Server 2019 can improve execution plans. This article demonstrates another aspect of this behavior, which is by design neither good, nor bad. Depending on data and execution context, it could end up as an improvement, but it can also degrade performance and introduce new problems.

Consider the following stored procedure that deal with a query similar to the one in the previous article. It accepts a single input parameter for the product color and has a table variable that collects product ids for that color, while the second statement returns order details for those products:
CREATE OR ALTER PROCEDURE dbo.TVDeferred_Demo
@Color NVARCHAR(15)
AS
BEGIN
DECLARE @T AS TABLE(ProductID INT PRIMARY KEY);
INSERT INTO @T SELECT ProductID FROM Production.Product WHERE Color = @Color;
SELECT * FROM @T t
INNER JOIN Sales.SalesOrderDetail od on t.ProductID = od.ProductID
INNER JOIN Sales.SalesOrderHeader h on h.SalesOrderID = od.SalesOrderID
ORDER BY od.UnitPrice DESC;
END

Again, we will execute this procedure under both compatibility levels 140 and 150. Let’s ensure that the database is in the SQL Server 2017 related mode and let’s invoke the above created procedure for the first time with the parameter ‘White’:

ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 140;
GO
EXEC dbo.TVDeferred_Demo N'White';

Here is the execution plan for the second query in the stored procedure, with some extracted execution parameters:

TVDPart2_1

As you can see, this is a Nested Loop Join based plan, which is usual, when the processing starts with a table variable, since its cardinality prior to SQL Server 2019 is fixed to 1. You can also see that all values are underestimated, which leads to a memory grant of only 1 MB.
When you call the same procedure with another parameter, it will reuse the plan from cache. What would be the plan, if we would call the procedure with a parameter which is not so high selective? Let’s ensure that there is no plan in cache, and call the same procedure with the parameter ‘Black’:


DBCC FREEPROCCACHE;
GO
EXEC dbo.TVDeferred_Demo N'Black';

When you observe the execution plan carefully, you will see exactly the same plan. Although, there are 93 products with the black color, all estimations and plan decisions remain the same, because the cardinality of the input table variable did not and could not be changed – it was still 1.

TVDPart2_2

Since the actual number of rows is significantly greater than in the call with the parameter ‘White’, you can see here sort warnings because this time 1MB of memory grant was not sufficient for sorting. But, the execution plan is exactly the same as for the first call.

Prior to SQL Server 2019, the execution plan for the second query in this stored procedure was always the same, regardless of parameter used for the first invocation and thus plan generation. Since the table variable has cardinality of 1, all estimations and the execution plan will be the same. We can say, using a table variable in this stored procedure and passing it to the second query neutralizes parameter sniffing effect. That does not mean, this is good or bad for all executions (you saw sort warnings and they are always bad), but the plan was stable, it did not change even after failover or clearing cache. If you call this stored procedure usually with high selective parameters, you can consider this plan as a good plan.

In SQL Server 2019, since table variable could have different cardinality, this stored procedure is prone to parameter sniffing and depending on the first execution parameter, you can have different execution plans:


ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 150;
GO
EXEC dbo.TVDeferred_Demo N'White';
GO
DBCC FREEPROCCACHE;
GO
EXEC dbo.TVDeferred_Demo N'Black';
GO

Here is the execution plan under CL 150 with a high selective parameter:

TVDPart2_3.png

And another one, when the first invocation is with a low selective parameter:

TVDPart2_4

You can see two completely different plans (different plan operators and execution parameters). The only difference between two calls is the number of rows in a table variable, therefore in SQL Server 2019 cardinality of the table variable can manage the entire execution plan. Unlike with SQL Server 2017, the execution plan depends on the parameter used for the first execution. If you used a low selective parameter for the first execution, but most of the calls are with the high selective parameters, you will not be happy with the new table variable behavior. And even if the current plan is good for your workload, there is no guarantee that after it disappear from cache the new one will be the same. With old table variable behavior the plan was stable.

Queries within stored procedures in SQL Server 2019 that use table variables will be more prone to parameter sniffing. Although these estimations will be more accurate than the ones prior to SQL Server 2019, that does not mean the execution plan will be better. They could be better then your current plan, but, as you saw in this article, they could introduce performance regression and instability.

Why then I am classifying it as a “bad” thing? Simply because I do not have significantly bad plans on the production system caused by the table variable’s cardinality. Yes, I had them initially, but I had to fix them, I could not wait until a better estimation comes. Therefore, I do not expect significant improvements in production workloads, the code can be aligned with this new behavior for future development, of course, but existing code is more-less fixed. However, some plans that are considered as stable plans due to fixed table variable’s cardinality prior to SQL Server 2019, can be now prone to parameter sniffing, and most probably, we need to fix them. Therefore, I am not so happy wit this aspect of the new table variables’s behavior.

In the next article, and the last one in this series, you’ll see another important side effect of this behavior, that can break some of performance workarounds implemented in solutions in previous SQL Server versions.

Thanks for reading.

The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 1

As mentioned in the previous article, SQL Server 2019 cardinality estimations for a table variable are based on actual table variable row counts. Therefore, in SQL Server 2019, we should expect better estimations and better plans for queries that use table variables.
Which queries will benefit from this improvement? Generally, queries that use table variables with a lot of rows in them, which are not tuned yet. For table variables with a few rows, there will not be significant changes and you should expect the same execution plan and almost same execution parameters.

Queries whose execution was slow due to underestimation in table variables usually implement logical joins by using Nested Loop Join physical operator where a Hash or Merge Join operators would be more appropriate. In addition to this, underestimation of table variables participating in multiple joins could lead to issues with insufficient memory grants, and thus data spilling to tempdb .

For instance, the following query stores ProductIds in a table variable, and in the next statement returns all order details related to them. We’ll execute it under compatibility levels 140 and 150, to see the difference in the executions between usual and deferred table variable compilation. Let’s execute first the query under CL 140:

ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @T AS TABLE (ProductID INT);
INSERT INTO @T SELECT ProductID FROM Production.Product WHERE ProductLine IS NOT NULL;


SELECT * FROM @T t
INNER JOIN Sales.SalesOrderDetail od on t.ProductID = od.ProductID
INNER JOIN Sales.SalesOrderHeader h on h.SalesOrderID = od.SalesOrderID
ORDER BY od.UnitPrice DESC;

As expected, the execution plan for the second query uses Nested Loop Join operator for all logical Joins, since the cardinality of the table variable is 1, which makes Nested Loop as appropriate choice for join operations.

TV1

By observing execution plan, you can also see, that the number of rows is highly underestimated. That does not only lead to Nested Loop operators, but also the Sort operator in the plan got a memory grant that is enough for data sorting. The amount of memory would be enough for estimated and not for actual data volume.

TV2

This lead to tempdb spilling, since 1MB is not enough for sorting 120K rows. You can see spills in a SQL Server Profiler or Extended Events tracing session (yes, you can still use both of them, Extended Events on production servers and Profiler for demo and testing purposes), as I did during  execution of the previous query:

TV3

OK, that was CL 140, let’s switch to 150, to see performance improvements:
ALTER DATABASE AdventureWorks2019 SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @T AS TABLE (ProductID INT);
INSERT INTO @T SELECT ProductID FROM Production.Product WHERE ProductLine IS NOT NULL;


SELECT * FROM @T t
INNER JOIN Sales.SalesOrderDetail od on t.ProductID = od.ProductID
INNER JOIN Sales.SalesOrderHeader h on h.SalesOrderID = od.SalesOrderID
ORDER BY od.UnitPrice DESC;

The plan has been changed; instead of Nested Loop Join operators, you can see Hash Match operators, estimations are significantly better 126K vs.119K, and memory grant is appropriate (140 MB).

TV4

Since memory grant is sufficient, there are no Sort Warnings detected, thus the overall execution looks significantly better under CL 150.

What with the execution time? To compare the execution time, we will use Adam Machanic’s SQLQueryStress tool. It will allow us to run queries multiple time without panting results, so that we can compare the time needed for query execution only. And here are the results:

TV5a

The execution parameters for CL 150 look definitely better, the execution is more than 50% faster, no tempdb spilling and we can conclude that this query is happy with new behavior of table variables in SQL Server 2019.

Which queries are not so happy, you’ll see in the second part of this article.

Thanks for reading.