Generally speaking, most T-SQL queries contain WHERE clauses to filter out data and WHERE clauses almost always contain a reference to a column…after all, that’s why the WHERE clause exists (to filter out the data). However, occasionally there is a need to use a condition in a WHERE clause that has no reference to any column. Usually it takes a form like this: “WHERE @IsManager = 0” or some parameter equals some constant. This is a valid use of the WHERE clause however it does have some performance implications. Let’s take a closer look.
I was recently given a T-SQL performance mystery to solve. A SELECT statement was running a bit on the slow side and the developer wanted to know why. He knew that the coded order of his JOIN statements was not supposed to matter, however, when he changed the order of the JOINS he created a much faster query.
Why? It turns out a part of the WHERE clause did not contain a column reference; that was the difference maker and the performance killer.
Each piece of every WHERE clause needs to have an “anchor”; i.e. a table on which it can filter the data. (Yes, “anchor” is a highly technical term that I just made up.) For most conditions there is at least one column reference, so the table to which that column belongs becomes the “anchor”. When there is a condition without a column reference where does SQL Server place the “anchor” so that it can filter the data? The answer is the first table (the FROM table) in the query.
In the mystery query above the first table in the query had grown over time (to about 1.1 million rows) until it became the slowest portion of the execution plan. Having the large table coded first in the query (the FROM table) forced the optimizer to create an execution plan that grabbed all 1.1 million rows, then applied the filter, then continued with the rest of the join logic. Changing the order of JOINS in the query placed a much smaller table first, causing a much better filter to be executed first, which produced an execution plan to only read a few rows from our large table.
What follows is a simple example that I created using the AdventureWorks database. If you try these queries you may get a different set of plans; that’s okay…it’s just a simple example to show where the filter for the WHERE clause is placed in the order of operations. In fact, for this small example the execution time doesn’t change much, but it does illustrate the point. The two queries below are the same except for the order in which the tables are coded. If you comment out “AND @IsManager = 0” from both queries you will get the same execution plan (illustrating that the optimizer determines the best data access path, not you!). However, since there is a WHERE clause that does not contain a column reference (@IsManager = 0), the optimizer uses the first table to “anchor” this WHERE clause, causing more i/o to occur and a slower execution. Let’s start with my sample queries:
declare @IsManager bit = 1; -- Query #1 - larger table is first select h.* from Sales.SalesOrderDetail d join Sales.SalesOrderHeader h on d.SalesOrderID = h.SalesOrderID join Sales.SalesPerson p on h.SalesPersonID = p.BusinessEntityID where p.BusinessEntityID = 277 -- below is my WHERE clause without a column reference and @IsManager = 0; -- Query #2 – filtered, smaller table first select h.* from Sales.SalesPerson p join Sales.SalesOrderHeader h on h.SalesPersonID = p.BusinessEntityID join Sales.SalesOrderDetail d on d.SalesOrderID = h.SalesOrderID where p.BusinessEntityID = 277 -- below is my WHERE clause without a column reference and @IsManager = 0;
Look at the execution plan for each query. Note where the “non-column” filter (@IsManager = 0) is located (circled on the plan below in red). For query #1 the filter is executed on the results of the SalesOrderDetail table, which contains 121,000+ rows. This causes all of those rows to be returned, and then filtered out. The optimizer also determined that an extra sort and a merge join needs to be performed and adds those steps to the plan. Viewing the properties of the filter will confirm that this is the step of the plan that performs @IsManager = 0.
Execution Plan for Query #1
The plan for query #2 is a bit different. Note that the filter placement has changed. For query #2 the filter is executed on the results of the SalesPerson table, which, in my example, returns one row. Because of this the optimizer knows (based on statistics) that there is a much smaller set of data that could be returned from the SalesOrderDetail table and changes the plan accordingly (no sort and no merge join!).
Execution Plan for Query #2
After reviewing this scenario and searching for more information about it, this appears to be a variation of a performance issue that can occur using dynamic search conditions (i.e. WHERE MyColumn = @MyParameter or @MyParameter is null), where the optimizer is forced to grab all of the records in the table(s) and work from there. There are several good articles on this topic which you may find informative; just google “SQL Server dynamic search conditions” and you will find several. My experience shows that using “WHERE MyColumn = @MyParameter or @MyParameter is null” works pretty well until one or more of the tables involved becomes large. At that point a query re-write lands on my plate…