T-SQL coding

Notes about JOIN

The JOIN clause seems to frequently be a source of mystery/frustration/wonderment for developers, so today I decided provide some information that may demystify it for you.

First, a quick definition:  a JOIN allows you to retrieve data from 2 (or more) tables based on logical relationships between the tables.

  1. A JOIN allows you to retrieve data from 2 (or more) tables. Since every result set in a relational database is, in fact, a table, your choices for joins are numerous.  You can JOIN tables, views, derived tables (i.e. sub queries), common table expressions (CTE), and even table valued functions (using the APPLY clause).  Note that joins to table valued functions are not the most efficient, but it is an option.

 

  1. The “new” style join syntax (i.e. INNER, OUTER, etc.) was introduced into SQL Server 2000 and is the current ANSI standard. The “old” style join syntax (i.e. table1, table2 WHERE…) was deprecated by Microsoft more than ten years ago, and they are starting to remove it from the codebase, so I suggest that you do not use it anymore.  For example, starting with SQL 2012 you cannot code an outer join using this syntax.

 

  1. There are 3 different types of JOINS. Use the type that matches your needs:

INNER JOIN – a join that displays only rows that have a match in both tables.  This is the default type, so if you only say “JOIN” this is what you get.  Note that nulls are not equal, so any rows that have null in the join column(s) will NOT be in the result set.

OUTER JOIN – a join that includes rows even if they don’t have a match in both tables.  There are 3 variations, LEFT, RIGHT, and FULL.  These variations determine which unmatched rows will be in the result set.  For example, if you use a LEFT OUTER JOIN, then all rows (matched or unmatched) in the “left” table will be included.

CROSS JOIN – this is a Cartesian join (remember that from match class?).  A cross join does not have an ON clause since all rows from table a are matched to all rows from table b.

For a more visual explanation you can go to this article:

https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

 

  1. A JOIN retrieves data from 2 tables based on a logical relationship. This logical relationship can be between ANY set of columns that have the same datatype.  Some developers think that the foreign key relationships define how you must join tables.  This is a myth!  However, the flip side of this is that any foreign key constraints define a formal relationship between two sets of data (and enforces this for better data integrity!), so each foreign key is often the most common path to join two tables.

 

  1. SQL Server determines the order in which JOINS are executed, not you. The order in which you code your JOIN clauses may not be the order in which they are actually executed.  The optimizer decides that for you (assuming that you aren’t using any query hints to force it).  So code them in whatever order makes sense to you and let the optimizer do the work.

 

  1. SQL Server has a set order of operations when retrieving data, and that order is:
    1. FROM
    2. ON
    3. JOIN
    4. WHERE
    5. GROUP BY
    6. CUBE | ROLLUP
    7. HAVING
    8. SELECT
    9. DISTINCT
    10. ORDER BY
    11. TOP

With this in mind, the placement of filters can have an effect on your result set when using the OUTER JOIN clause, especially when multiple JOINS are involved.  You can certainly place your filtering statements in either the ON or WHERE clause, but they are executed at different times.  Placing filters in the ON clause can cause records to be filtered at the time of the outer join, causing later join statements to behave differently than if the filter is placed in the WHERE clause.

Here’s a simple example to show this.  These two queries produce different results all because of the order of execution listed above. Try this in the AdventureWorks2014 database:

-- placing the filter in the ON clause doesn't filter properly for outer joins
select p.*
from Person.Person p
left outer join Person.BusinessEntityAddress bea on p.BusinessEntityID = bea.BusinessEntityID
left outer join Person.Address a on bea.AddressID = a.AddressID
     and a.City = 'Cliffside';

-- placing the filter in the WHERE clause is the way to go...
select p.*
from Person.Person p
left outer join Person.BusinessEntityAddress bea on p.BusinessEntityID = bea.BusinessEntityID
left outer join Person.Address a on bea.AddressID = a.AddressID
where a.City = 'Cliffside';

 

  1. Many times a JOIN and a subquery do the same thing, so a JOIN can be re-written as a sub-query, and vice versa. My experience shows that a JOIN is usually more efficient, so if you are using a subquery are experiencing performance problems, try re-writing it using a JOIN instead.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s