There is a plethora of new T-SQL features and enhancements in SQL Server 2005, and you could probably write a small book talking about all of them. However, this article will focus on one of them — specifically, Common Table Expressions (CTE), because personally they have come in handy on many occasions.
I have used a few of the other new features and enhancements, and if you have not had a chance to delve into any of them, I would highly recommend familiarizing yourself with some of them, such as the new xml data type, improved error handling, query and event notifications, new DDL (Data Definition Language) triggers, ranking functions, TOP operator enhancements, the PIVOT and UNPIVOT operators, and the new APPLY operator. There are a few others, and they all add tremendous value to T-SQL developers.
This article will introduce common table expressions and go as in-depth as space will allow, showing how CTEs can be used to improve T-SQL’s capabilities and how they can help better your T-SQL.
Common Table Expressions
A common table expression is expressed as a temporary table or view defined within an executing statement, such as SELECT INSERT, UPDATE, or DELETE. Think of a CTE as somewhat like a derived table in that a CTE only lasts throughout the duration of the executing statement. When that execution of the statement finishes, the CTE is gone. However, there are a few differences between a derived table and a CTE:
A CTE can reference itself. A derived table cannot.
A CTE can be referenced multiple times in the same query.
So what is the difference between a CTE and a true temporary table? The quick and easy answer is the overhead of having to create a temporary table and the performance consequence of using a temporary table. Think about what happens when you use a true temporary table. First, you have to define and create the table then populate it, and then run queries against it. When using CTEs you can forgo the first step and the associated overhead and use the built-in CTE, which provides all of that functionality for you.
CTE Structure
The structure of CTE is fairly simple. It basically contains two main parts, the first part being the CTE followed by the second part, which is the normal execution statement. The basic syntax for a CTE is as follows:
WITH common_table_expression_name [column(s)] AS ( query_definition ) EXECUTION STATEMENT
As defined in the syntax above, a common table expression contains several arguments that are defined as follows:
common_table_expression_name: The identifier for the common table expression. This name must be different than any other table or view used within the CTE. It can, however, be named the same as a base table or view, but when the common table expression is referenced in the normal execution statement, the common table expression name must be used, rather than the base name.
column(s): The column names in the common table expression. These column names must be unique (meaning, no duplicate names allowed) and must match the same number of columns returned by the query_definition.
query_definition: The SELECT statement in which the results are used to populate the common table expression. This statement follows the same requirements that the normal execution statement follows.
A CTE expression cannot define another CTE expression.
The following example uses the syntax outlined above to create a very simple common table expression query. Open a query window in SQL Server Management Studio and execute the following query against the AdventureWorks database.
WITH CTE_ProdMod (ProductModelID, ProductModelCount) AS ( SELECT ProductModelID, COUNT(*) FROM Production.Product WHERE ProductModelID IS NOT NULL GROUP BY ProductModelID ) SELECT ProductModelID, ProductModelCount FROM CTE_ProdMod ORDER BY ProductModelID
The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.
ProductModelID ProductModelCount -------------- ----------------- 1 3 2 1 3 3 4 3 5 10 6 11 7 8 8 10 9 12 10 10
A simple example, yet not that impressive because it doesn’t quite showcase the true power and flexibility of what a common table expression can really do. So, the following example uses a common table expression to return the count of products for each product model. The count is returned in the CTE, then the CTE is used to filter the results even further based on what was returned in the CTE.
WITH CTE_TranHist (ProductID, OrderQuantity) AS ( SELECT ProductID, COUNT(*) FROM Production.TransactionHistory WHERE TransactionDate > '05/01/2004' GROUP BY ProductID ) SELECT cte.ProductID, cte.OrderQuantity, pp.productmodelid, ProductSubcategoryID FROM Production.Product pp INNER JOIN CTE_TranHist AS cte ON pp.productID = cte.productid WHERE cte.productid BETWEEN 500 AND 750 AND ProductModelID IS NOT NULL ORDER BY cte.ProductID
The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.
ProductID OrderQuantity ProductModelID ProductSubcategoryID ----------- ------------- -------------- -------------------- 680 21 6 14 706 15 6 14 707 567 33 31 708 526 33 31 711 578 33 31 712 567 2 19 713 118 11 21 714 150 11 21 715 138 11 21 716 99 11 21 717 10 6 14 718 12 6 14 722 57 9 14 725 1 9 14 726 1 9 14 729 1 9 14 730 2 9 14 736 54 9 14 737 45 9 14 738 71 9 14 739 63 5 12 742 67 5 12 743 78 5 12 746 47 5 12 747 62 5 12 748 80 5 12
Imagine now that syntax you would have had to use to get the same results prior to SQL Server 2005. It would have looked something like this:
SELECT TranHist.ProductID, TranHist.OrderQuantity, pp.ProductModelID, pp.ProductSubcategoryID FROM Production.Product pp INNER JOIN (SELECT ProductID, COUNT(*) FROM Production.TransactionHistory WHERE TransactionDate > '05/01/2004' GROUP BY ProductID) AS TranHist (ProductID, OrderQuantity) ON pp.productID = TranHist.productid WHERE TranHist.productid BETWEEN 500 AND 750 AND ProductModelID IS NOT NULL
Now, which would you rather read, let alone write? I thought so. Common table expressions have a number of advantages, including better readability to make it much easier to maintain more complex queries.
Now that you have a general understanding of CTEs, the true value of CTEs appear when you have the need for recursive queries. This topic is discussed next.