Sql Code Re-Use. A worked example.

One of the most popular good-programming practices is that code should not be repeated. You may have heard this referred to as DRY (Don’t Repeat Yourself).

The idea behind this principle is that in the event of future revisions, such as: bugs, rethinks or shifting requirements, changes need only be made in one place. This prevents possible inconsistency between different parts of the application which implement similar logic. The codebase is kept as small as possible, making it easier to maintain.

That’s always a great idea in a modern, structured language. In T-SQL however, we have to be very careful, as code re-use is notoriously difficult to achieve without falling into the pitfalls that can significantly adversely affect performance.

This article attempts to address some of those pitfalls, providing a working solution. This is done in a SQL Server 2008 environment using table valued functions.

Assumed knowledge: Scalar functions are bad. Multi-statement table functions are bad. Nesting stored-procedures doesn’t work. Outputting stored-procedure results into temp tables can be unnecessary overhead. Ability to read basic execution plans.

The example uses the data structures needed to track a football league (no not american football). Hopefully, a familiar topic will  mean that everyone to know what’s going on without any introduction.

If you’re familiar with my earlier articles, then the same database is used here as well. If not, then you can use this script to create the database and this one to populate it with data. It is not necessary to have the database to follow the article.

The players in a team varies over time, so the following query returns player information for a given team at a given time. Let’s call this the Squad query.

DECLARE @teamId INT = 12
DECLARE @date DATE = ‘20020407’

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.name AS positionName
FROM     teams_player tp
JOIN     player p ON  tp.playerId = p.playerId
JOIN     positions pt ON  p.positionId = pt.positionId
WHERE    tp.teamId = @teamId
     AND tp.dateFrom <= @date
     AND tp.dateUntil > @date
ORDER BY p.positionId

Fig1Results

The next query returns the goals and players who scored them on a given date, for the same team on the same date. Let’s call this the Score query.

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , g.time
FROM     teams_player tp
JOIN     player p ON tp.playerId = p.playerId
JOIN     goal g ON p.playerId = g.playerId
WHERE    tp.teamId = @teamId
     AND tp.dateFrom <= @date
     AND tp.dateUntil > @date
     AND g.date = @date

ORDER BY g.time

Fig2Results

Notice that both queries contain the same complex WHERE clause on the teams_player table. According to the DRY principle we should try to factor out this logic so that it is only in one place. This can be done using the table-valued function shown below.

CREATE FUNCTION sideSnapshot (@date DATE, @teamId INT)
RETURNS TABLE
AS
RETURN
(
SELECT   tp.teamId

       , tp.playerId
FROM     teams_player tp
WHERE    tp.dateFrom <= @date
     AND tp.dateUntil > @date
     AND tp.teamId = @teamId
)

Squad query:

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.name AS positionName
FROM     sideSnapshot(@date, @teamId) ss
JOIN     player ON ss.playerId = p.playerId
JOIN     positions pt ON p.positionId = pt.positionId
ORDER BY p.positionId

Score query:

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , g.time
FROM     sideSnapshot(@date, @teamId) ss
JOIN     player ON ss.playerId = p.playerId
JOIN     goal ON p.playerId = g.playerId
WHERE    g.date = @date
ORDER BY g.time

I have not included the results sets or execution plans but I can assure you they are identical. Now, any future changes to how the teams_players table is queried is can be made in one place only; the DRY principle has been applied and our codebase is healthier for it.

Why should we stop there though? We could create a more comprehensive function which will perform all of our joins and do the name manipulation logic. According to books online:

“…the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query.”

So even if the function returns more information than is needed, the optimiser will optimise the query at execution time and not perform any additional work? No.

The following function calls our previous function but now also joins to player and positions. As you can see it greatly reduces the complexity of our two queries and certainly provides a better implementation of the DRY principle

CREATE FUNCTION sideSnapshotDetailed (@date DATE, @teamId INT)
RETURNS TABLE
AS
RETURN
(
SELECT   s.playerId
       , p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.positionId
       , pt.name AS positionName
FROM     sideSnapshot(@date, @teamId) s
JOIN     player ON s.playerId = p.playerId
JOIN     positions pt ON p.positionId = pt.positionId
)

Squad query:

SELECT   s.playerFullName
       , s.positionName
FROM     sideSnapshotDetailed(@date, @teamId) s
ORDER BY s.positionId

Score query:

SELECT   s.playerFullName
       , g.time
FROM     sideSnapshotDetailed(@date, @teamId) s
JOIN     goal ON s.playerId = g.playerId
WHERE    g.date = @date
ORDER BY g.time

Once again the result sets are identical, as is the execution plan for the Squad query. However, upon inspecting the execution plan for the Score query you can see that a join is being made to the positions table even though no information from this table is used. There is a foreign key constraint on the positionId column as well so joining will not restrict the result set.

DetailedExecutionPlan

This is one of the pitfalls that we must look out for while implementing code re-use in Sql Server. Although table-valued functions allow us to easily re-use our code, we must be aware that the optimiser is not fool-proof, complicated queries can result in inefficiencies.

Code re-use needs to be kept simple in order not to confuse the optimiser. We can implement the DRY principle safely in this case, but we need to split the function up into smaller pieces. Below we create a new function which only applies the name manipulation logic and cross apply to it.


CREATE FUNCTION nameConverter (@nameFirst VARCHAR(100), @nameSecond VARCHAR(100))
RETURNS TABLE
AS
RETURN
(
SELECT   @nameFirst + ‘ ‘ + @nameSecond AS fullName
       , ‘Mr ‘ + @nameSecond AS titledSecondName
)

Squad query:

SELECT      nc.fullName AS playerFullName
          , pt.name AS positionName
FROM        sideSnapshot(@date, @teamId) ss
JOIN        player ON ss.playerId = p.playerId
CROSS APPLY nameConverter(p.nameFirst, p.nameSecond) nc
JOIN        positions pt ON p.positionId = pt.positionId
ORDER BY    p.positionId

Score query:

SELECT      nc.fullName AS playerFullName
          , g.time
FROM        sideSnapshot(@date, @teamId) ss
JOIN        player ON ss.playerId = p.playerId
CROSS APPLY nameConverter(p.nameFirst, p.nameSecond) nc
JOIN        goal ON ss.playerId = g.playerId
WHERE       g.date = @date
ORDER BY    g.time

We have not reduced the number of lines in the query, but that was not the object of the DRY principle, the important point is that all the complex logic is now done in one place.

The DRY principle can be safely implemented in Sql Server T-SQL, with the help of table valued functions, but in order to do so you must be careful and follow some rules. Although this article only covers one specific example, the following are general rules that should be adherred to.

  • Don’t attempt to use table valued functions to provide denormalised views of your data.
  • Each function should perform a single task (single responsibility principle).
  • Only use functions to re-use complex logic. Using the function should be less complicated than the logic it contains!
  • Avoid nesting.

Personally, I would only use a table valued function to perform a join where the join logic is complex or non-intuitive. I use them more frequently in examples like nameConverter, where business logic needs to be applied in multiple places across the database.

Download the database script DbCreateScript.sql
or download the solution FootballDatabase.zip

Download the data script DataGeneration.sql

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s