When many DBAs talk about index maintenance they are referring to the process of rebuilding or reorganizing indexes to reduce fragmentation. Many use a script much like I do to determine the set of indexes to defragment based on the size of the table and the percent of fragmentation, etc.
I don’t want to talk about that. I believe that index maintenance is more than correcting fragmentation issues; it’s also about maintaining as set of “usable” indexes for your applications. This post will document the thought process that I go thru to proactively maintain a “usable” set of indexes. Why proactive? I call it that because I try to regularly check my index set and add/remove indexes based on the steps below BEFORE my users complain about the response time for their application/query.
To maintain my “usable” set of indexes, I look at missing indexes that could be added for better performance. I also review current indexes that are not used or are under-used and I remove them. The databases that I manage have hundreds of tables, so I can’t manage the entire set at one time; I usually carve out some time during each application development release cycle and try to look at one or two tables and the index usage for those tables in production. I try to look at all of the indexes for a single table as a single unit. What follows are the steps that I use (including the T-SQL scripts I use to gather information).
Step One: Check for Missing Indexes
I use the query below (actually I currently have it in an SSRS report) to show me missing indexes that SQL Server has identified. This query also “scores” each hit; I use that score to determine the missing indexes that may be of the most use to my application. I found this query in the MCTS exam 70-432 training kit/self-study guide. The scores are based on the usage counts and percentage of improvement that could be obtained. A score of 50,000 or higher means it’s a really useful index to add, 10,000 or greater means it’s worth looking at. The query also provides everything you need to create the index (see the comments in the script for details. Also note that I select the start time of the server, since the information returned by the DMVs in the query are only valid since that last start.
-- list out last re-start date/time SELECT sqlserver_start_time FROM sys.dm_os_sys_info; -- use this to determine the overall usefulness of adding a particular index... -- from the 70-432 self-study guide, p. 389 -- book says that a score of &gt; 50,000 is really useful -- &gt; 10,000 is probably useful -- query is done by instance, so you see all databases unless you use the where clause -- Use the following guidelines for ordering columns in the CREATE INDEX statements you write -- from the missing indexes feature component output: -- List the equality columns first (leftmost in the column list). -- List the inequality columns after the equality columns (to the right of equality columns listed). -- List the include columns in the INCLUDE clause of the CREATE INDEX statement. -- To determine an effective order for the equality columns, order them based on their selectivity; -- that is, list the most selective columns first. select * from (select user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) as Index_Advantage, migs.* from sys.dm_db_missing_index_group_stats migs) as migs_adv inner join sys.dm_db_missing_index_groups as mig on migs_adv.group_handle = mig.index_group_handle inner join sys.dm_db_missing_index_details as mid on mig.index_handle = mid.index_handle inner join sys.databases d on mid.database_id = d.database_id where d.name = 'your_database_name' order by migs_adv.index_advantage desc;
Here are some items to note about the missing indexes query:
- The DMVs are updated constantly, so the results will change each time you run it.
- If you change the indexes for a table SQL Server will remove the missing indexes entries and start over, so they may “disappear” from the output of this query!
- I use this as a starting point for my review, selecting one of the indexes near or at the top of the list (assuming that the score is high enough to matter).
- Some DBAs also generate a CREATE INDEX statement right in the query output; cool idea. The reason I haven’t done that can be found below…
- In my SSRS report I have removed many of the columns from the SELECT. Here is a short sample from my report showing the columns I use the most:
Step Two: Select a candidate
Using the output from my missing indexes query I select a candidate index to add. Usually it’s the top one on the list since that’s the one that will provide the most bang for the buck. I then compare this candidate index to the others found on the list for the same table. Quite frequently there are others that are similar; these “overlapping” candidates can sometimes be combined together to create one useful index for multiple queries. This does take a bit of sleuthing since the missing index entries came from different queries. Sometimes you can tell (like the example above) that the equality columns and the include columns are almost identical and creating a single index combining the columns from both will serve both queries. Other times it takes a bit more brain power. At this point I also like to know the query logic that generated the missing index candidates. Here is a query that I use to grab the procedures or queries that caused the missing index candidate to be generated:
-- missing indexes - query 2 -- from the Redgate ebook Accidental DBA - pages 185-187 -- Use this in coordination with the first missing indexes query because -- this shows not only the proc but the statement that created the missing index entry! -- ; WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT MissingIndexNode.value('(MissingIndexGroup/@Impact)', 'float') AS impact , OBJECT_NAME(sub.objectid, sub.dbid) AS calling_object_name , MissingIndexNode.value ('(MissingIndexGroup/MissingIndex/@Database)', 'VARCHAR(128)') + '.' + MissingIndexNode.value('(MissingIndexGroup/MissingIndex/@Schema)', 'VARCHAR(128)') + '.' + MissingIndexNode.value('(MissingIndexGroup/MissingIndex/@Table)', 'VARCHAR(128)') AS table_name , STUFF(( SELECT ',' + c.value('(@Name)', 'VARCHAR(128)') FROM MissingIndexNode.nodes('MissingIndexGroup/MissingIndex/ColumnGroup[@Usage=&quot;EQUALITY&quot;]/Column') AS t ( c ) FOR XML PATH('')), 1, 1, '') AS equality_columns , STUFF(( SELECT ',' + c.value('(@Name)', 'VARCHAR(128)') FROM MissingIndexNode.nodes('MissingIndexGroup/MissingIndex/ColumnGroup[@Usage=&quot;INEQUALITY&quot;]/Column') AS t ( c ) FOR XML PATH('')), 1, 1, '') AS inequality_columns , STUFF(( SELECT ',' + c.value('(@Name)', 'VARCHAR(128)') FROM MissingIndexNode.nodes('MissingIndexGroup/MissingIndex/ColumnGroup[@Usage=&quot;INCLUDE&quot;]/Column') AS t ( c ) FOR XML PATH('')), 1, 1, '') AS include_columns , sub.usecounts AS qp_usecounts , sub.refcounts AS qp_refcounts , qs.execution_count AS qs_execution_count , qs.last_execution_time AS qs_last_exec_time , qs.total_logical_reads AS qs_total_logical_reads , qs.total_elapsed_time AS qs_total_elapsed_time , qs.total_physical_reads AS qs_total_physical_reads , qs.total_worker_time AS qs_total_worker_time , StmtPlanStub.value('(StmtSimple/@StatementText)', 'varchar(8000)') AS statement_text FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY qs.plan_handle ORDER BY qs.statement_start_offset ) AS StatementID , qs.* FROM sys.dm_exec_query_stats qs ) AS qs JOIN ( SELECT x.query('../../..') AS StmtPlanStub , x.query('.') AS MissingIndexNode , x.value('(../../../@StatementId)', 'int') AS StatementID , cp.* , qp.* FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp CROSS APPLY qp.query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup') mi ( x ) ) AS sub ON qs.plan_handle = sub.plan_handle AND qs.StatementID = sub.StatementID ;
Here are some notes about this query:
- Lots of XML parsing in this query, so it takes some time to execute.
- Since this information comes from different DMVs I haven’t found a way to match this output up (automatically) with my first missing indexes query. I usually try to match up the impact, table name, and the equality, inequality and include columns to find my match.
- I don’t use this query all the time but it does provide some additional insight into the code that generated the missing index candidates.
Step Three: Review all Indexes for the Table
I could stop here and decide that I have a good candidate index to add and be done. Experience has shown that there is more work to do, however. At this point I have a candidate index, but next I review ALL of the indexes for the table. Before I add a new index I want to answer these questions:
- Are there already indexes on the table that I can combine with my candidate? If there is an existing index with similar columns and includes I can probably “combine” the two indexes and create one new one to cover both usages.
- Are there indexes on the table that are not used or under-used? If so I can remove those at this time.
- Is there a clustered index on the table? My rule is that there should always be one.
To perform this review I use SSMS to generate all of the indexes for the table so that I can review the columns for each. Then I run the following query to review the usage details for the existing indexes.
--Note: this should be run in the context of your database SELECT OBJECT_SCHEMA_NAME ( i.object_id , ius.database_id ) as SchemaName ,object_name(i.object_id ) as TableName ,i.name as IndexName, i.index_id, i.type_desc, p.rows ,user_seeks, user_scans, user_lookups, user_updates ,last_user_seek, last_user_scan, last_user_lookup FROM sys.indexes i INNER JOIN sys.objects o ON i.object_id = o.object_id INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id LEFT OUTER JOIN sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id where object_name(i.object_id ) = 'T_WIDGET' -- put your table name here ORDER BY i.name;
This query provides valuable information on the use of each index in the database. I use the User Seeks, User Scans and User Lookups columns to determine how much the index is used to read the table. If I find a zero in all of these columns I know that this index is worthless for querying and should be removed. The User updates column tells me how often the index has to be maintained by SQL Server due to inserts and updates. If this number is high and the “read” numbers are really low then I might consider dropping the index as well. Below is a short sample of the output (not all columns are shown):
|Table Name||Index Name||Index Type||User seeks||User scans||User lookups||User updates||Last user seek||Last user scan||Last user lookup|
|T_WIDGET||IX01_WIDGET||NONCLUSTERED||262393||45646||0||55790||6/20/16 2:58 PM||6/20/16 2:58 PM||NULL|
|T_WIDGET||IX02_WIDGET||NONCLUSTERED||120||0||0||147972||6/20/16 1:09 PM||NULL||NULL|
|T_WIDGET||IX03_WIDGET||NONCLUSTERED||2697453||11||0||50561||6/20/16 2:58 PM||6/20/16 1:03 AM||NULL|
|T_WIDGET||PK_T_WIDGET||CLUSTERED||25728173||13366||3146359||175293||6/20/16 2:58 PM||6/20/16 2:55 PM||6/20/16 2:58 PM|
From the sample above I would definitely drop IX04 since is hasn’t been used to read data from the table since the server was started (which has been about 3 weeks in this example). I would also seriously consider removing IX02, but I would do more research to find out what T-SQL statements were executed using the IX02 seeks. Dropping it may cause those statements to perform poorly.
Step Four: Modify the Indexes for the Table
Now that my review is complete I am ready to modify the indexes for the table. My review may have determined that my candidate index should be combined with another candidate, or combined with an existing index (both both), so I drop and create indexes as applicable for the candidate. I also will drop any unused or under-used indexes at this time.
Step Five: Check for Duplicate Indexes
I forgot about this when I wrote the original post, but it’s important…sometimes there can be duplicate indexes on a table. I suggest that, as part of your review, check all of the indexes to see if there are any duplicates and remove them. I’d share a script to do that, but there are many variations of this on the internet (you can grab your favorite) based on what your definition of “duplicate” happens to be. My definition is that the index columns are all the same and in the same order, and that the include columns are also the same (but might be in a different order).