Friday, February 19, 2010

SQL Server Recursive Common Table Expressions…And Other Ways to Generate a Stack Overflow in SQL Server

OK, so won’t actually generate a stack overflow error in SQL Server, but it makes for a good title. Some time ago I was asked to write a custom deferred revenue integration into GP. There were some quirks to their recognition schedule that prevented them from using the standard deferred revenue module.

For those of you not familiar with deferred revenue, here’s a quick primer. If you pay $24 for a 12 month magazine subscription, the publisher is not allowed to recognize and report to their stockholders the entire $24 when they receive it. They can only recognize revenue as they’ve delivered the service, in this case each magazine, to you. As such, they will recognize and report $2 in revenue each month.

OK, so how do you write a SQL statement that takes a set of rows and creates multiple rows from each individual row.  Let’s say that we have a sales table, tblSales, that has an amount to be recognized, a starting month and a number of months to be recognized over as follows:

OrderNumber Amount StartingMonth RecognitionMonths
1

$120.00

3/2010

3

2

$257.00

2/2010

7

To make this easy, we’ll recognize the revenue equally over the number of months. Often revenue needs to be recognized according to the number of days in the month and take into account starting and ending dates that are mid-month. In this case we’ll use a simple model and simply need to return a dataset of 10 rows as follows:

OrderNumber AmountRecognized MonthRecognized
1

$40.00

3/2010

1

$40.00

4/2010

1

$40.00

5/2010

2

$36.71

2/2010

2

$36.71

3/2010

2

$36.71

4/2010

2

$36.71

5/2010

2

$36.71

6/2010

2

$36.71

7/2010

2

$36.74

8/2010

So how, in a SQL statement, can we get from 2 rows to 10? As with any problem, I’m sure we could come up with any number of solutions using WHILE loops and table variables, but I’m going to use the opportunity to introduce recursion in SQL.

Recursion is a feature of the T-SQL Common Table Expression. It is essentially a common table expression (CTE) that references itself. Let’s look at the following statement:

WITH DeferredRevenueEntries(
OrderNumber
,AmountRecognized
,MonthRecognized
,RecognitionMonths
,CurrentMonthIndex) AS
(
-- Anchor member
-- The anchor member creates the entry for the first month.
SELECT
OrderNumber
,AmountRecognized = Amount / RecognitionMonths
,MonthRecognized = StartingMonth
,RecognitionMonths
,1 AS CurrentMonthIndex
FROM tblSales
UNION ALL
-- Recursive member
-- Recurse one time for each additional month to be recognized
SELECT
OrderNumber
,AmountRecognized
,MonthRecognized = DATEADD(month, 1, MonthRecognized)
,RecognitionMonths
,CurrentMonthIndex = CurrentMonthIndex + 1
FROM DeferredRevenueEntries
WHERE CurrentMonthIndex < RecognitionMonths
)

SELECT
OrderNumber
,AmountRecognized
,MonthRecognized
FROM DeferredRevenueEntries
ORDER BY
OrderNumber
,MonthRecognized

Here we’re defining a CTE, DeferredRevenueEntries. This is a UNION query. the first part of the union is called the anchor member. This is a SELECT statement against our source table, tblSales. This part of will return the first month rows for each of the sales records, with the amount to be recognized being the total amount divided by the number of months.

The second part is the recursive member. Note that the FROM clause is referencing the CTE itself. This will take the initial two records from the first part of the union, and keep returning them, adding one month each time. To make sure that we only return the number of records matching the number of months to be recognized, we add the field CurrentMonthIndex, increment it each time we recurse and make sure we only return records where the CurrentMonthIndex is less than the number of months to be recognized.

OK, so what happens if we forget to put in the WHERE clause in the recursive member? Will we in fact cause the aforementioned stack overflow? The answer is no. SQL is too smart for that. Instead you will recieve the following error:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

By default, SQL Server limits you to a maximum recursion level of 100. It is possible to override that limit by using the MAXRECURSION option in the final SELECT statement such that it looks as follows:

SELECT
OrderNumber
,AmountRecognized
,MonthRecognized
FROM DeferredRevenueEntries
ORDER BY
OrderNumber
,MonthRecognized
OPTION (MAXRECURSION 1000)

Now you might say, well what if I put in a MAXRECURSION of 10000000000? Is there some way I can create a stack overflow? Again, SQL will outwit you. The maximum MAXRECURSION limit is 32767.

OK, so there’s one final issue. You’ll note in my results example above that the recognition amount for order 2 was $36.71 except for the last entry where it’s $36.74. This is because never met an accountant who could deal with as much as a single penny off no matter how high the total. To fix that I create the following modified version of the statement:

WITH DeferredRevenueEntries(
OrderNumber
,AmountRecognized
,MonthRecognized
,RecognitionMonths
,CurrentMonthIndex) AS
(
-- Anchor member
-- The anchor member creates the entry for the first month.
SELECT
OrderNumber
,AmountRecognized = ROUND((Amount / RecognitionMonths), 2)
,MonthRecognized = StartingMonth
,RecognitionMonths
,1 AS CurrentMonthIndex
FROM tblSales
UNION ALL
-- Recursive member
-- Recurse one time for each additional month to be recognized
SELECT
OrderNumber
,AmountRecognized
,MonthRecognized = DATEADD(month, 1, MonthRecognized)
,RecognitionMonths
,CurrentMonthIndex = CurrentMonthIndex + 1
FROM DeferredRevenueEntries
WHERE CurrentMonthIndex < (RecognitionMonths - 1)
)

SELECT
OrderNumber
,AmountRecognized
,MonthRecognized
FROM DeferredRevenueEntries

UNION ALL

-- Create last entry manually to adjust for rounding errors.

SELECT
tblSales.OrderNumber
,AmountRecognized = ROUND(tblSales.Amount - DeferredRevenueEntryTotals.TotalAmountRecognized, 2)
,MonthRecognized = DATEADD(month, 1, DeferredRevenueEntryTotals.LastMonthRecognized)
FROM tblSales
INNER JOIN (
SELECT
OrderNumber
,TotalAmountRecognized = SUM(AmountRecognized)
,LastMonthRecognized = MAX(MonthRecognized)
FROM DeferredRevenueEntries
GROUP BY OrderNumber) AS DeferredRevenueEntryTotals
ON tblSales.OrderNumber = DeferredRevenueEntryTotals.OrderNumber

ORDER BY
OrderNumber
,MonthRecognized

You’ll notice a few changes here. The first is that I round the AmountRecognized value to 2 digits. The second is that the CTE is now only going to recurse through the number of months – 1. This is because we’re create the final entry in the last select in the UNION query by taking the total sales amount from tblSales and then subtracting the total recognized prior to the final month, giving us an amount that is adjusted for any rounding issues.

I hope that this has given you a brief look at the power of the recursive CTE and that you are reassured that you will not break SQL Server by using it. There are certainly many other ways to use the, such as traversing hierarchical lists. Go ahead, recurse away!

Wednesday, February 3, 2010

Wither the SQL Developer? Is T-SQL the Next Assembly Language?

So today I’m going to veer completely away from GP and talk about SQL in general. The reality is that I’m a data and database geek first and a Dynamics GP geek second; and my (current) database language of choice is T-SQL.

But what is the future of that technology?  Is T-SQL a data dinosaur?  Years ago I actually programmed in assembly language, the language of the CPU itself (ok so technically that would be machine code but let’s not split hairs).  Today, processing power and memory have gotten to the point that you can run a language like PHP or C# so many layers above the CPU as to make it invisible.  You have an JIT compiled language running on a CLR on top of an operating system on a virtual processor on a hypervisor on a physical processor.  Only the most determined driver developer would even consider assembly.

So what of T-SQL?  With Visual Studio 2010, Microsoft will be releasing their latest iteration of the ADO.NET Entity Framework, the latest in a long line of tools to abstract away the data layer and hide the dirty underwear of SQL from the developer.  The Entity Framework essentially allows you to design, build and implement your database without ever having to get your hands dirty in SQL Management Studio.

Tools like the Entity Framework, LINQ and even the old standby, ADO.NET datasets allow you to do some pretty cool things like being able to join data and enforce referential integrity across disparate data sources and in-memory data structures.  They make it easier to make your applications database agnostic.

To be fair, there has always been significant proportion of the developer community who have lived by the philosophy, for some very valid reasons, that the database should be simply a dumb data store.  Even GP was built with that philosophy when it supported multiple back end technologies.

I have tended towards embedding as much logic as possible in the database itself; to use the power of functions, views, referential integrity and constraints; to make make stored procedures the middle tier.  I have also not found anything quite like T-SQL for querying and analyzing data.  The ability to twist and munge and aggregate your data in a single statement of sub-queries (or better yet, Common Table Expressions)

So one day soon will T-SQL be standing among the assemblers and punch card readers in the dusty closet of computer history? Or will the it continue to have a place in our data toolbox?