Administration · T-SQL coding

The New Cardinality Estimator

With SQL Server 2014 there’s a new sheriff in town in the form of a brand new cardinality estimator. I’m going to try and document my experiences with it as we upgraded our databases from SQL 2008 and SQL 2008 R2 to 2014.

All of the technical details on this topic can be found here:

Microsoft’s article on the cardinality estimator

or here:

The SQL 2014 Cardinality Estimator Eats Bad TSQL for Breakfast

What I am going to try and do is to document what you won’t find in these articles.

First the good news; the new cardinality estimator does appear to do a better job at estimating, so (in general) the same or better plans are produced for most of your queries.

Now the bad news: the estimator was completely re-written, so there are some situations that it doesn’t deal with very well. Reverting to the old estimator is a great idea in these situations. With that in mind when we found a query that was performing well in the pre-2014 days but is now dogging it, the first thing I do is try and isolate the long-running query (easy if there is only one query, but when there are many in a stored procedure then you need to figure out which query is the issue). Once found, use this clause to force the optimizer to use the old estimator:

OPTION (QUERYTRACEON 9481)

If your results return much, much faster then this is your fix. Stop here unless you are curious and want to know more.

Also please note that this “option” statement can only be added to a query by someone with sysadmin access to the database.

We’ve really only changed a handful of stored procedures using this method, but it is a bit frustrating to have a query that used to run quite fast and now (using 2014) either runs very slowly or produces a timeout. So the question is: what seems to be causing this issue? Please read the first link above (warning, 43 pages of reading!) and digest it. I noticed that we seem to have a problem when 1) there are several large tables in the SELECT, and 2) at least one of the JOINs references columns with 2 different datatypes. This situation does produce an optimizer warning which you can see visually in the estimated or actual plan or you can use this query to show plans with warnings (thanks Brent Ozar!):

SELECT st.text, qp.query_plan

FROM ( SELECT TOP 50 *

FROM sys.dm_exec_query_stats

ORDER BY total_worker_time DESC

) AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st

CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp

WHERE qp.query_plan.value(‘declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";count(//p:Warnings)’, ‘int’) > 0;

 

Please note that there will be many plans with warnings but a warning doesn’t necessarily mean that there is an issue causing a performance problem. With that in mind I have noticed that often there are optimizer warnings that actually point to the performance issue. I frequently notice 2 main types of warnings similar to these two:

Type conversion in expression (CONVERT(varchar(24),[MY_STATUS_DATE],127)) may affect “CardinalityEstimate” in query plan choice

Type conversion in expression (CONVERT(varchar(24),[MY_STATUS_DATE],127)) may affect “CardinalityEstimate” in seek plan choice

My experience has been that the first message might be an issue (but most are not!), but the second message is often a performance issue. Note that the second message says “…in seek plan choice”. I have found that when this warning is generated that the optimizer often chooses a really poor performing plan for the query and if I change to the old cardinality estimator this poor choice is corrected.

Next question; how can I tell if the query is “better” with the old estimator? The first, and most obvious way is to time it. The query will usually run much slower using the new estimator and the time difference is usually quite significant. However, there are a few times where the time hasn’t been a huge factor (on our test servers), so I had to look deeper at the plans. I started by looking at the estimated and actual plans using both the new and old estimator. However, I found that with complex queries it was pretty hard to find the plan changes and the estimated costs often aren’t very accurate (at least for this exercise). So I went to plan B: I created 2 versions of my procedure and ran both in the same query window, showing the plan for each in the same window. If the both old and new estimators create the same plan then each procedure will show “50%” of the total batch. If one is better than the other the percentage of the total will be skewed towards the worst performing version. (I often use the estimated plans here due to the fact that the “new estimator version” of my procedure usually runs…and runs…and runs… and never returns a result.)

For those executions where the time difference wasn’t that obvious on our test server I looked at the resources used by each plan. I looked at the memory grant and parallelism properties of each plan. Sometimes the plan was different and required a large memory grant or it used parallelism using the new estimator but did not using the old estimator. My advice here is to use the old estimator; I believe that the large number of required resources needed to run the query can cause the query to wait for resources (on our production server) during more peak usage times, causing the query to run slowly at certain times of the day but not others.

I have also found that the new cardinality estimator is “exposing” a few bad design decisions in the database. These design decisions mostly involve data type decisions between tables; Table one might have a “code” that is defined as a character type, and Table two has the same “code” as a numeric type (the actual values for the codes are numbers in this case). We only see performance issues when one or more of the tables are large and the query is complex enough to “confuse” the optimizer. Since the database is in production and changing the design would bring pain and anguish to the developers our decision at this time is to leave the design alone but to try and NOT make the same design mistakes in future changes. Of course our management wants to know how long we will utilize the “band-aid” of using the old cardinality estimator; that answer depends on when those bad design changes can be incorporated into the application upgrade schedule (which could be a long time!).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s