Category - SQL Server 2014

All About SQL Server 2014

SQL Server 2019 CE, Local Variables and Nullable Columns

In this article, I will demonstrate a small change in the Cardinality Estimator component under the latest compatibility level – 150 (SQL Server 2019).

Let’s first create and populate a sample table:

DROP TABLE IF EXISTS dbo.T;
CREATE TABLE dbo.T(
id INT IDENTITY(1,1) NOT NULL,
c1 INT NOT NULL,
c2 INT NULL,
c3 CHAR(5) DEFAULT 'test',
CONSTRAINT PK_T PRIMARY KEY CLUSTERED (id)
);
GO
CREATE INDEX ix1 ON dbo.T(c1);
GO
CREATE INDEX ix2 ON dbo.T(c2);
GO
INSERT INTO dbo.T(c1)
SELECT 1 + ABS(CHECKSUM(NEWID())) % 10000
FROM dbo.GetNums(1000000);
GO

--update the c2 column to not null for about 5% rows
UPDATE dbo.T SET c2 = c1 WHERE c1 >= 9500;
GO

Ensure that the database runs under the compatibility level 140 and consider the following simple query with a local variable:

ALTER DATABASE New SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c1 = @v;
Here is its execution plan:

e00
This is a very known situation – an estimation for an equality predicate with local variable. The estimated number of rows is calculated as the All density attribute from the stats density vector multiplied by the Rows from the stats header.

e02


SELECT 0.00010002*1000000;
--100.02000000

The c1 column does not allow NULLs. What would be the estimation for a nullable column? The c2 column in our table is nullable. Let’s have a check:

DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c2 = @v;


e01

The estimation is significantly higher and even lead to a plan with Clustered Index Scan! However, it follows the same formula for the estimation. Let’s have a look at the statistics object:

e03

By using the same formula, the estimated number of rows is:

SELECT 0.001992032*1000000
--1992.032000000

This is exactly what we see in the execution plan. OK, that was CL 140, let’s see how SQL Server 2019 handles this simple case.

When we switch to CL 150, the plan and estimations for the c1 column (non-nullable) are the same. However, the estimation for the nullable column is changed!

ALTER DATABASE New SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @v INT = 600000;
SELECT * FROM dbo.T WHERE c2 = @v;
Here is the plan:

e04

This time, the estimation is not so high, and we are getting more appropriated plan, based on the Nested Loop join operator. But, where is 104.397 coming from? Let’s look at the statistics again:

e05

Since the column is nullable, CE apply the All density against non-nullable rows only. Thus, the formula seems to be All density x (RowsEQ_ROWS for RANGE_HI_KEY = NULL).

SELECT 0.001992032*(1000000-947697);
--104.189249696

Well, it is not exactly 104.397, but very, very close to it.  I did not find official documentation for this, therefore cannot confirm that the mentioned calculation is always correct. However, this new behavior makes more sense: the average value from the stats density vector refers to non-nullable set of values, and it’s wrong to multiply it with all rows.

Funfact – the behavior is actually not new at all – that’s exactly how the old CE (under the compatibility levels <=110) handles this case.

I would like to thank my colleague Vladimir Đorđević for making me aware of this behavior.

Thanks for reading.

Quo vadis iliti j*** te fičer koji sve radi umesto mene a ja moram ništa

Ovaj post ima previše psovki da bi bio napisan na engleskom jeziku, al računam da nije malo i to što će dopreti do onih koji se služe srpskohrvatskim jezikom.

G1Postalo je gotovo nesnosno kako predstavnici mnogog proizvođača softvera (ne samo iz ogranka marketinga) najnovije fičere, alate i servise prodaju koristeći sintagmu “to sve radi za vas ispod haube,  ne morate kurcem mrdnuti, ne morate ništa znati o tome, ne morate programirati, sve je transparentno, dva-tri klika, malo drag and drop i fičer će sve uradit za vas”. Tome se pridružuje sve više predavača i tzv. eksperata ističući u prvi plan da je neki fičer do jaja jer njegov korisnik gotovo ništa ne mora da zna i ne mora više da brine o sistemu. Doduše i ne mora da brine, al sve dok ga sistem kurčinom ne tresne po čelu.

Toliko je novih stvari koje ne znaš, da prosto ne znaš gde i kako da počneš da ih učiš, a vremena nikad dovoljno. I onda ti neko kaže e ovo uopšte ne moraš da znaš, to radi samo, naravno da ćeš to malo vremena da posvetiš nečemu drugom, potpuno se oslanjajući na taj moćni fičr. Dok god ta stvar radi ono što korisnik očekuje i to radi dovoljno brzo, nema nikakvih problema. Dođe li do neželjenih efekata ili je stvar prespora onda kompania od tebe očekuje da nešto preduzmeš. A ako ne znaš kako to radi, ako ne razumeš concept, nemaš workaround, nisi se pripremio za tu situaciju onda možeš kurac.

Nove softverske verzije, novi alati samo su oruđe koje vam može pomoći da dođete do proizvodnog cilja: bržeg razvoja, skalabilnog sistema, boljih performansi. Ali to su samo jebeni alati. Oni ne rade za vas ili umesto vas; vi njih koristite i vi treba da radite sa njima. Treba dobro da ih poznajete, da znate šta i koliko mogu i možda još bitnije šta ne mogu. I valja gvirnuti poda haubu da vidiš kako to sve radi da bi znao da li je to za tebe.

Kad javno kažeš ovaj fičer je do jaja, biće ti query do 100 puta brži, a ne kažeš kad neće (a većini jebeno neće) i da može pod nekim uslovima i da bude i sporiji time si uljuljkao posetioca svoje sesije i potencijalno ga naveo na lošu poslovnu odluku. On obično nema vremena da nakon konferencije dalje razrađuje tu temu – bio je, čuo od eksperta i uzeće to kao istinu. I kad se suoči sa teškoćama u vezi sa tim fičerom, vendor ili spiker će bu obećati noviji, brži, jači i bolji.

Zato treba ljude podsećati na to da je baš lepo imati super fičere i alate, i da nesumljivo u njih treba investirati, ali da je još uvek jebeno najbitnije znanje i da treba da što više uče i znaju i da se prema svemu odnose kritički, a ne da se poput šiparica primaju na pojedine stvari pa se onda u javnim blog-postovima razočaravaju. I da treba da investiraju u znanje.

 

Altering an Indexed Text Column

In this post, I am going to demonstrate how increasing the size of a text column with an Hastindex on it can lead to serious troubles without warning. I tested it two instances of SQL Server: SQL Server 2016 and 2017, but the same behavior should be expected with older versions; only the maximal allowed column size would be different (900 instead of 1.700 bytes).

 

Use the following code to create a sample table and populate it with 1M rows:

DROP TABLE IF EXISTS dbo.AlterColumnTest;
CREATE TABLE dbo.AlterColumnTest(
id INT NOT NULL,
descr NVARCHAR(500) NOT NULL DEFAULT 'test',
CONSTRAINT PK_AlterColumnTest PRIMARY KEY CLUSTERED (id ASC)
);
GO
INSERT INTO dbo.AlterColumnTest(id)
SELECT n
FROM dbo.GetNums(1000000)
GO
CREATE INDEX ix1 ON dbo.AlterColumnTest(descr)
GO

As you can see, there is a non-clustered index on the descr column. From SQL Server 2016 the maximum size for index keys with nonclustered indexes has been increased to 1.700 bytes. That means that the size of the column cannot exceed value of 850. However, when you try to increased it to a value grater than 850, this will be accepted. The following command will be successfully executed:

ALTER TABLE dbo.AlterColumnTest ALTER COLUMN descr NVARCHAR(1000) NOT NULL;
GO 
Commands completed successfully.

 

The command will be executed instantly since it updates only meta-data; existing rows are not touched at all. And there is no warning about the potential exceeding of allowed index key size!

When you try to update a row so that it contains a 850 character long text value, the update will work:

UPDATE dbo.AlterColumnTest SET descr = REPLICATE(N'a', 850) WHERE id = 1;

(1 row affected)

(1 row affected)

However, if text is longer than this, you will get an error:

UPDATE dbo.AlterColumnTest SET descr = REPLICATE(N'a', 851) WHERE id = 1;

Msg 1946, Level 16, State 3, Line 34
Operation failed. The index entry of length 1702 bytes for the index 'ix1' exceeds the maximum length of 1700 bytes for nonclustered indexes.

OK, you have got an exception, only after we tried to store data longer than the limit and now you have to alter the text column to the allowed size (850):

ALTER TABLE dbo.AlterColumnTest ALTER COLUMN descr NVARCHAR(850) NOT NULL;

Instead of decreasing the column size and an intently operation, you will get the following error:

Msg 5074, Level 16, State 1, Line 42
The index 'ix1' is dependent on column 'descr'.
Msg 4922, Level 16, State 9, Line 42
ALTER TABLE ALTER COLUMN descr failed because one or more objects access this column.

To proceed with this action, you need to drop the underlined index, alter the column size and rebuild the index!

DROP INDEX IX1 ON dbo.AlterColumnTest;
GO
ALTER TABLE dbo.AlterColumnTest ALTER COLUMN descr NVARCHAR(850) NOT NULL WITH (ONLINE = ON);
GO
CREATE INDEX IX1 ON dbo.AlterColumnTest(descr) WITH (ONLINE = ON);
GO

For large tables, this action can take some time and during it, you will have no index on that column. The ALTER TABLE statement this time is not a meta-data operation; since the column size is decreased, the action affect all rows of the table, although there is no single rows with a column longer than the new length! For the ALTER statement, you can choose between the offline and online operation. In the first case, it will be faster, but the table will be locked and unavailable for querying and modifications; an online operation will let table be available for these actions, but it will take more time and resources (a new table will be created in shadow and will replace the original one).

This is definitely not something that you have expected after a successful initial command which increased the column size to 1.000 characters. At least a warning should be shown in this case.

Thanks for reading.

M.

SQL PASS Austria Meeting April: SQL Server Extended Events & Simple Rules for Clear and Attractive Visuals

bwin_neutral_negative_rgbIm kommenden SQL Server Community Meeting am Mittwoch, 15. April 2015 geht es um SQL Server Extended Events und Daten Visualisierung. Ich freue mich sehr, dass diesmal das Meeting in meiner Firma bwin stattfindet. Wir werden um 17:30 starten, die Adresse lautet: Marxergasse 1B, 1030 Wien.

 

Making the Leap from Profiler to Extended Eventses1

If you’re like me, you’ve been using SQL Trace and Profiler since you started using SQL Server. They’re like old friends, tried and true. You know how they work, you know what you can capture, you know how to find the data you need. But now you’ve been told that your good friends have been deprecated in SQL Server 2012, and you need to become best friends with Extended Events. What is THAT all about? It’s ok. Really. In this session we’ll look at Profiler and Extended Events side by side to see how easy it is to transition to Extended Events. We’ll step through real world examples using Extended Events to do what you’re used to doing with Trace, and even more. You can find the time learn Extended Events…today.

presented by
Erin Stellato (blog | twitter), Principal Consultant @ SQLskills.com


My Favourite Pie (Chart): Simple Rules for Clear and Attractive Visuals

memjDo you want to deliver information in an effective and efficient way? Even when the attractiveness of a report is important, beauty is in the eye of the beholder. Join this session where Markus will show you some simple rules for helping end-users to understand the story their data is trying to tell.

You will see how you can implement those rules with different tools from Microsoft’s BI stack – resulting in clear and concise information delivered through beautiful dashboards. You will also learn how to identify sub-optimal dashboards and what you can do to improve them.

presented by
Markus Ehrenmüller-Jensen (blog | twitter), , Business Intelligence Architect @ runtastic

 

Anmeldung und weitere Details: https://passaustriameeting.eventbrite.de

Beyond Statistics Histogram Part 4 – Significant Performance Regression in SQL Server 2014

In the previous articles we’ve covered differences in estimations between the old and new CE related to queries that use predicates with literals which values are beyond the statistics histogram. We saw some issues with key ascending columns in SQL Server 2012 which can be solved with the trace flags 2389, 2390 and 4139. And we saw that the same queries perform well in SQL Server 2014 without flags. It seems that this is a reason why these flags are not supported in the newest SQL Server version.

You can find some articles about new CE and all of them wrote about improved estimations for key ascending column queries in SQL Server 2014. You can think that there is no problem with key ascending columns anymore. This is only particularly true. In this article I will show you another query pattern with significant performance regression in SQL Server 2014.

Key Ascending Problem Solved?

In one of my previous articles Beyond Statistics Histogram – Part 1 we saw that the same query performs significantly better in SQL Server 2014 (from more than 4M to 455 logical reads) and that the new CE, opposite to the old CE, does not estimate 1 row whenever a value beyond stats histogram is used in a predicate. It estimates significantly more. More enough to govern the Query Optimizer to use a correct index. The problem described in the mentioned article (a most common ascending key problem) is caused by the estimation of 1 for the orderdate predicate. In this concrete case it would be enough to estimate more than 110 rows (which is the estimate for the custid predicate) and the Query Optimizer would use the index on custid, as we initially expected. So, the new CE did it – it estimated 90.000, significantly more than needed, and the plan was OK – the problem solved.

But where did this estimation come from? That’s an interesting question, maybe irrelevant in this particular example, but it definitely deserves an answer.

Let’s create a new sample table and populate it initially with 1M rows and add then additional 100K rows by using this code. As we saw in previous posts this is not enough to trigger statistics update and the statistics object still see only orders until 31.12.2013. Now when we have a sample table and sample data let’s implement our very simple requirement: to return all orders from yesterday and today, sorted by amount descending. Here is the solution::
SELECT * FROM dbo.Orders WHERE orderdate >= DATEADD(DAY,-1,GETDATE()) ORDER BY amount DESC;

A very basic query with expected Index Seek on the orderdate index followed by Key Lookup, since we don’t expect more than 500-600 rows. And this is exactly what we get. In SQL Server 2012.

L1

What about SQL Server 2014? Well, “a little bit different”…

L2

A clustered index scan and memory grant of 900 MB to return and sort 60 rows only! And yes, this is again a “key ascending problem”. And it is not solved. It is actually introduced with the new CE! There is no such problem under the old CE! OK, the old estimation of 1 was not correct, but it’s definitely better than 30.000 given by the new CE. You can even fix the estimation of 1 under the old CE by using TF 2390, which was the solution for the key ascending problem before SQL Server 2014. Let’s execute the same query with TF 2390:
SELECT * FROM dbo.Orders WHERE orderdate >= DATEADD(DAY,-1,GETDATE()) ORDER BY amount DESC OPTION (QUERYTRACEON 9481, QUERYTRACEON 2390);

And now observe the execution plan:

L3

You can see an excellent estimation and expected execution plan with appropriate memory grant. So, with the old CE we have one acceptable and one excellent execution plan. With the new CE a suboptimal one with significantly overestimated memory grant.

New CE simply ignores literals if they are beyond the statistics histogram. It simply estimates 30% of the total number of modifications for the leading statistics column since the last time statistics were updated. In our case this number is 100.000 and the estimation is then 30.000. And the new CE has the same estimation (30.000) for the following two queries:
SELECT * FROM dbo.Orders WHERE orderdate >= '20150314' OPTION (RECOMPILE);
SELECT * FROM dbo.Orders WHERE orderdate >= '34450101' OPTION (RECOMPILE);

In case of BETWEEN operator the estimation is 9% of the modification_counter value, but again the literals are completely ignored. I am not happy with this fact at all. The estimation done by the old CE was hardcoded to 1, but the new CE has hardcoded it too, not to the constant, but to the expression (i.e. modification_counter * 0.3 for  > or <, modification_counter *0.09 for BETWEEN etc.). To make the problem even worse, the trace flags 2389, 2390 and 4139 are not supported in SQL Server 2014 and you cannot force the optimizer to re-calculate statistics object on the fly. This functionality would help here, but it is removed. I created the following Connect Item for this issue. If you think that this could be useful please vote for it.

Conclusion

The key ascending problem caused by constant underestimation in SQL Server 2012 can lead to significant performance problems with the default QO behavior. However, with TF 2389, 2390 and 4139 the problem can be solved with acceptable costs. With the new Cardinality Estimator in SQL Server 2014 the problem disappear for common query patterns, but the execution of some other queries, which perform well in old SQL Server versions, shows significant performance regressions. New CE significantly overestimates number of rows for predicates using literals beyond the statistics histogram which usually induces an execution plan with the scan of large tables. Unfortunately all trace flags which are helpful in previous versions are not supported anymore. We hope, however, that this functionality still can be back, at least until the status of the corresponding connect item is active.

Thanks for reading.

Speaking in Lisbon at SQLSaturday Portugal

sqlsatlisboaI’ am really excited to back as a speaker to my favorite SQL Saturday event – Sql Saturday Portugal in Lisbon. The SqlSat 369 will be my fourth event in one of the most beautiful cities in the world. I am happy to see all my Portuguese friends feat. world wide freinds from Sqlfamily.

My session will be on Saturday forenoon and I will speak about problem with ascending key columns in large tables. I will compare problems and solutions in SQL Server 2014 and in the previous versions. For my favorite Sql Saturday Lisbon, which is definitely “the special one” I will prepare a special one presentation, full of surprises…

Obrigado pela leitura, e vê-lo em Lisboa!

SQL Server Konferenz 2015 – Slide Decks and Sample Code

300x250_SQL_Server_Konferenz_ENI had a great time in Darmstadt, on  the largest SQL Server conference in the German speaking area “SQL Konferenz 2015“. I was very happy to see a lot of people in the room for my session SQL Server 2014 and Ascending Key Problem in Large Tables and want to thank all the attendees. It was really pleasure to deliver the session in that audience. Here you can download slides decks and sample code from the todays session. Feel free to drop me an email, comment or message if you have any questions. Thank You!

 

Beyond Statistics Histogram – Part 3 (TF 2389 feat. TF 4139)

In the previous article we’ve shown how can we use TF 2390 to handle queries with literals beyond statistics histogram in SQL Server 2012. We saw that SQL Server creates some statistics on the fly for all columns used in filter expressions. If we would enable trace flag on the instance level that means that whenever SQL Server creates an execution plan, it will touch indexes on all filtered columns in all queries. And not all columns have problems with ascending statistics. It would make more sense to apply these action only on columns branded by SQL Server as “Ascending” columns. This is exactly use case for trace flag 2389. It performs the same action and ends up with the same result as TF 2390, but it is applied to columns that SQL Server knows as ascending columns.

SQL Server marks a column as ascending column in a little bit esoteric way. Statistics object on the column should be updated three times in a row with the option “FULL SCAN” and for each update more than 90% of newly added rows should have value in that column greater than the max value from the actual statistic histogram. If this is true the column is branded as ascending. If it’s not the column is marked as stationary.

To demonstrate this let’s create a table and populate it with 10M rows following the procedure we described above. Click here to download the code. After we have updated statistics three times we can see that SQL Server marked the statistics object on the orderdate column as ascending.

DBCC TRACEON(2388);
DBCC SHOW_STATISTICS ('OrdersAsc','ix_orderdate');
DBCC TRACEOFF(2388);

Here is the output that confirms that the leading column Type is Ascending.

1201

Now we add additional 1M rows into the table. Again, this is not enough for automatically statistics update and statistic object remains unchanged.

Let’s recall our query from the previous post, to get all orders for customer with ID 160 in 2014. We will create two queries: one that uses TF 2390 and the other with TF 2389 and an Extended Events session to check which calculations on the fly are performed during the query execution in both cases. Here is the code for the trace session:

CREATE EVENT SESSION TF2389 ON SERVER
ADD EVENT sqlserver.sp_statement_completed(SET collect_statement=(1)
WHERE (sqlserver.database_id=10))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,MAX_DISPATCH_LATENCY=1 SECONDS)
GO
ALTER EVENT SESSION TF2389 ON SERVER STATE = START;
GO

Let’s execute both queries:

SELECT * FROM dbo.OrdersAsc
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2390, RECOMPILE);
SELECT * FROM dbo.OrdersAsc
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2389, RECOMPILE);

As mentioned above we got exactly the same execution plan and execution details. The only difference we can see in the trace file:

1202

TF 2389 instructs SQL Server to check the value in the orderdate column, while TF 2390 checks the column custid, too. So, if you want to apply this workaround only against really ascending columns use TF 2389. However, TF 2389 works only for columns branded as ascending; if column status changes back to Unknown it does not work anymore. TF 2390 works in both cases. So, if you are sure that column will be always ascending, use TF 2389.

And what happens if the column is marked as Stationary? Well, neither 2390 nor 2389 can help in this case! The plan is suboptimal, the same as without trace flags, with a lot of logical reads and unacceptable long execution time. Until July 2014 SQL Server 2012 didn’t have a workaround for dealing with stationary columns. Since that you can use another trace flag (TF 4139) to simulate the same behavior with stationary columns. TF 4139 is available from SQL Server 2012 SP1 CU 10 and SQL Server 2012 SP2 CU 1. More info about it you can find in this KB article.

Use this code to populate a table identical to previous one, but where the orderdate column is marked as Stationary.

Now we’ll execute the same query, this time in three different versions:

SELECT * FROM dbo.OrdersStat
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2390, RECOMPILE);
SELECT * FROM dbo.OrdersStat
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2389, RECOMPILE);
SELECT * FROM dbo.OrdersStat
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 4139, RECOMPILE);

Let’s observe the execution plans:

1203

As we can see trace flags 2389 and 2390 are helpless, and only TF 4139 ensures the correct execution plan.

Conclusion

Let’s summarize dealing with ascending columns with the help of trace flags so far:

TF 2389 helps the Query Optimizer to come up with better estimations and correct execution plans only for columns branded as Ascending.

TF 2390 instructs SQL Server to perform on-the-fly calculation for statistics objects for columns marked as Ascending or Unknown.

TF 4139 works with all three Leading Column Types: Ascending, Unknown and Stationary. However, it is available from SQL Server 2012 SP1 CU10 or SP2 CU1

And one note regarding the status of the flags in the latest SQL Server version: SQL Server 2014 does not support these flags. It ignores all of them, since the behavior we described in this and previous post is implemented in the new database engine and SQL Server 2014 without trace flags comes up with similar execution details as SQL Server 2012 with the flags. It seems that we don’t need these flags in SQL Server 2014. It seems. Until the next post.

Thanks for reading.

Speaking at Deutsche SQL Server Konferenz 2015 in Darmstadt

I am honored  to have been selected to speak at the largest SQL Server conference in the German speaking area “SQL Konferenz 2015“. The 2-day conference will be held from 4 to 5 February 2015, with a pre-con day on 3rd February.

150x115_ImSpeakingI’ll be presenting a session about Key Ascending problems with large tables in SQL Server 2014 and in the previous versions. I will explain and demonstrate what’s the problem with it and how the problem can we solve it in SQL Server 2012 and how it is (partially) solved in the latest SQL Server version.

There are 40 sessions on the two conference days,  the conference agenda you can see here.

You can use this link to register for the conference. An Early Bird offer with €150 discount on the standard registration price is available until 24.12.2014.

See you in Darmstadt!

Beyond Statistics Histogram – Part 2 (TF 2390)

In the previous article of this series we introduced the problem with queries using literals which are beyond statistics histogram, but exist in the table. We saw that this can be a serious problem in large tables in SQL Server 2012. We saw also that SQL Server 2014 handles it better.

In this article we’ll see how to solve this problem in previous versions of SQL Server.

Whenever performance problems are caused by stale statistics one of the possible solutions is to update statistic. However, this process is resource and IO intensive and as soon as the update is finished new rows come in the table and statistics is again out of date. Of course, the amount of newly added or updated data is significantly reduced with regularly updates.

It is recommended to schedule update statistics on most frequently used key ascending columns in volatile tables to reduce discrepancy between actual data distribution and those in the statistics histogram. Even if you schedule statistics update for a large table the most recent entries are again beyond statistics histogram. With the update statistics we can reduce impact of rows missing in the statistics histogram, but what we can do if update is too expensive? How can we help optimizer when statistics are stale on the key ascending columns? The answer: by using trace flags.

Let’s show again the query and the execution plan from the previous post:

SELECT * FROM dbo.Orders
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101';

1001

1002

If we would execute the same query with the trace flag 2390 the estimations and execution details looks significantly better:

SELECT * FROM dbo.Orders
WHERE custid = 160 AND orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2390);

1003

1004

Using TF 2390 helped the optimizer to choose the proper execution plan with an Index Seek on the custid column and execution details looks as we expected.

What is the estimation number of rows for the orderdate predicate only with the TF 2390? Let’s check the estimated execution plan:

SELECT * FROM dbo.Orders WHERE orderdate >= '20140101' AND orderdate < '20150101' OPTION(QUERYTRACEON 2390);

1005

The old CE in SQL Server 2012 with the TF 2390 estimates about 666K rows. Just to remind you that the new CE comes up with 90K. This difference is important and we’ll discuss it in the next post. For this query it’s clear that the other predicate is winner when we use selectivity as criteria and there is no doubt for the Query Optimzer regarding the index usage.

How it gets the devil number of Ks for the Estimated Number of Rows? What’s the magic behind the TF 2390?

To check this we’ll create an Extended Event Session and add the event sp_statement_completed. It is recommended to limit the session to get only events for the specific database. Here is what we collected when we executed the above query:

1006

We see two queries doing some statistics calculations by using the max value from the actual table for both columns involved in the query: custid and orderdate. The estimated number of rows of 666K strongly correlates to the expression (last order date from the actual tablemax RANGE_HI_KEY from stats histogram) * average number of orders per day. It assumes uniform distribution of data beyond the stats histogram which is in case of the ordering system a reasonable assumption. The estimation of 666K compared to 1M of actual rows is a good estimation. (Especially compared to the default estimation (without TF) of 1 or 90K from SQL Server 2014) Our statistics object has been created automatically when we created the index on the orderdate column (see appropriate code in the previous post). If we wold have only statistics object (auto- or manually created) but without an index, TF 2390 would not improve performance. It requires an index on the column; otherwise it is not able to efficiently find the maximum value. The statistics is created on the fly and does not persist in the system. For more details you can find in the Dmitry Pilugin’s (blog | twitter) excellent article series..

Now we have a good estimation for the orderdate predicate; for the custid we had it anyway. The literal used for the custid (160) isn’t behind the histogram. Why then SQL Server calculated statistics on the fly of it too? TF 2390 instructs SQL Server to perform this calculation against all columns which participate in the filter criteria regardless of used literals. For instance, even if we would look for orders from 2013 which are in the histogram, the stats calculation on the fly would be performed. Actually, not for all columns; it will work for all columns which Leading column Type is Ascending or Unknown. For columns with the leading type Stationary no stats will be calculated on the fly and therefore these columns cannot have benefits from the TF 2390. I will cover all three leading types and how to handle differences in the next post.

How to find out the column leading type? This is detailed described in the excellent article Statistics on Ascending Columns written by Fabiano Amorim (blog | twitter). Again we need a trace flag. This time TF 2388.

DBCC TRACEON(2388);
DBCC SHOW_STATISTICS(Orders, IX_Date);
DBCC TRACEOFF(2388);

When TF 2388 is switched on DBCC SHOW_STATISTICS shows additional info about statistics objects. One of additional attributes is leading column type.

1007

We can see that SQL Server does not know the leading column type for the column orderdate; it is marked as Unknown. Therefore TF 2390 does statistics calculation on the fly when this column participates in filter expressions. Default behaviour of the query optimizer is to use meta data and statistics information during the plan generation, the table or index structure are not touched at all. By using TF 2390 this is not true, SQL Server touches the appropriate index in order to calculate statistics on the fly as a part of good enough plan finding.

Conclusion

The problem with queries using literals which are beyond statistics histogram can be solved in SQL Server 2012 by using TF 2390. Instead of estimation of 1 row the QO estimates by non-persistent statistics calculation on the fly. This brings a small overhead since the calculation on the fly will be performed against all columns participating in filter expressions (not only on those where it is required), but it is anyway better as default CE behavior prior to SQL Server 2014. The solutions done by TF 2390 and old CE and new CE in SQL Server 2014 for this query pattern looks similar, but some companies don’t allow usage of trace flags in th eproduction system. Therefore for them the recommended solution is to use SQL Server 2014 CE. But again, for this query pattern, not for all of them!

Thanks for reading.