T-SQL coding

Should I use SELECT * in my T-SQL code?

SELECT *… I see this all the time when reviewing T-SQL coding for applications. This is another item that I include on coding checklists; you shouldn’t use SELECT * in your T-SQL code. It’s that simple. Explicitly code only the columns that you need to return.

Sounds simple enough, you say, but why not?

Let’s start with what you are asking the database engine to do. When you code SELECT * what you are saying is, “I’m too lazy to type a column list, and I know that the database engine will look-up all of the columns for all of the tables in my query and create a column list for me”. It saves me coding time!

Fine…it saves you coding time, and maybe prevents a few typos. But that’s not the goal here. And besides, with SSMS you can drag the column names. The point is that, when you use a wildcard here you are telling the database engine to figure out the column list EVERY TIME THE CODE IS EXECUTED. I am simplifying it here, but that could be hundreds of times per day, depending on your application. The database optimizer might need to re-compile your code each time it is executed…hmmm…now it doesn’t sound so efficient, does it?

My next point isn’t about performance, but about application maintenance. Joe Developer is given an enhancement, and he needs to change column MyColumnCode, changing the name of the column and the datatype. Being the smart developer that he is, Joe scans all of the stored procedures used in the application for MyColumnCode to get a list of the affected procedures. You guessed it, if SELECT * is used anywhere Joe will miss that procedure; his enhancement will probably fail sometime in the testing phase (we hope it doesn’t make it to production!), making him look not-so-smart.

So what happens when my DBA adds a column to my table? If I use SELECT * then the new column will automatically be picked up! Yes that is true, but I bet your application code is already written to accept the original column list, and if another one is added…BOOM! Your application may just blow up right before your eyes. If you explicitly code the column list, and a new column is added, your application will still run. You can change your application code and T-SQL code and promote them at the same time, preventing this (embarrassing!) issue.

Let’s talk about the work performed by the database engine for a moment. When you ask for data (i.e. any DML statement), the database engine tries to be as efficient as possible, only returning the columns used in the query. This saves disk I/O, memory, CPU and creates a more efficient plan to access the data, not to mention less data travelling across your network. This will make your users happy. 🙂  If you ask for all of the columns you are pretty much telling the engine to do a table scan, a clustered index scan/seek, or a non-clustered index scan/seek with a corresponding table look-up for every table in your query. If you don’t believe me, write yourself a simple query against your favorite table, like this (examples from AdventureWorks2008R2):

SELECT *

FROM Sales.SalesOrderHeader h

JOIN Sales.SalesOrderDetail d on h.SalesOrderID = d.SalesOrderID

WHERE SalesOrderNumber = ‘SO57221’;

 

 

Try to use a WHERE condition that you know would reference an index on the table. Now take a look at the estimated plan in SSMS and you will see what I mean. Depending on the WHERE conditions you will see that the engine has to access the entire table using one of the methods I listed above. Now, change your simple query, removing the * and replacing it with one or more columns (not all of the columns!).

SELECT CarrierTrackingNumber, OrderQty

FROM Sales.SalesOrderHeader h

JOIN Sales.SalesOrderDetail d on h.SalesOrderID = d.SalesOrderID

WHERE SalesOrderNumber = ‘SO57221’;

 

 

Based on the list, the optimizer will choose the most efficient method that it can to return your data. If your new column list and WHERE conditions are “covered” by an index you will see that the table pages are never accessed directly, only the indexes are referenced, saving you (and the database engine) time and resources. Here are the 2 estimated plans from my samples above…you can see the difference even in this simple example:

SELECT Plan Example

 

On a side note, SQL Server is continually getting better at figuring out what you really want, so Microsoft introduced a concept called “column optimization” in SQL 2005. This concept is mostly used in CTEs and sub-queries; the optimizer figures out that, even though you might have said “SELECT *” in your sub-query, that you really only need 3 columns, so it only returns the columns you need. This is no excuse, however, for you NOT to follow this recommendation!

So, never, never use SELECT * in your T-SQL coding for your application.

Okay, almost never.

I can think of 2 exceptions to this rule. First, if you make use of the IF EXISTS (SELECT * FROM …) or IF NOT EXISTS, then I usually let that slide. The optimizer is smart enough to know that you don’t really want all of the columns in this case and it only checks for the existence of the row. I know people that prefer to see “SELECT 1” in this case, but I don’t have a preference here. The second exception is that you can use SELECT COUNT(*) to count the number of rows in a table. Syntactically this is a different statement, and again, the optimizer is smart enough to know that you only want to count the rows here.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s