Sql Code Re-Use. A worked example.

One of the most popular good-programming practices is that code should not be repeated. You may have heard this referred to as DRY (Don’t Repeat Yourself).

The idea behind this principle is that in the event of future revisions, such as: bugs, rethinks or shifting requirements, changes need only be made in one place. This prevents possible inconsistency between different parts of the application which implement similar logic. The codebase is kept as small as possible, making it easier to maintain.

That’s always a great idea in a modern, structured language. In T-SQL however, we have to be very careful, as code re-use is notoriously difficult to achieve without falling into the pitfalls that can significantly adversely affect performance.

This article attempts to address some of those pitfalls, providing a working solution. This is done in a SQL Server 2008 environment using table valued functions.

Assumed knowledge: Scalar functions are bad. Multi-statement table functions are bad. Nesting stored-procedures doesn’t work. Outputting stored-procedure results into temp tables can be unnecessary overhead. Ability to read basic execution plans.

The example uses the data structures needed to track a football league (no not american football). Hopefully, a familiar topic will  mean that everyone to know what’s going on without any introduction.

If you’re familiar with my earlier articles, then the same database is used here as well. If not, then you can use this script to create the database and this one to populate it with data. It is not necessary to have the database to follow the article.

The players in a team varies over time, so the following query returns player information for a given team at a given time. Let’s call this the Squad query.

DECLARE @teamId INT = 12
DECLARE @date DATE = ‘20020407’

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.name AS positionName
FROM     teams_player tp
JOIN     player p ON  tp.playerId = p.playerId
JOIN     positions pt ON  p.positionId = pt.positionId
WHERE    tp.teamId = @teamId
     AND tp.dateFrom <= @date
     AND tp.dateUntil > @date
ORDER BY p.positionId

Fig1Results

The next query returns the goals and players who scored them on a given date, for the same team on the same date. Let’s call this the Score query.

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , g.time
FROM     teams_player tp
JOIN     player p ON tp.playerId = p.playerId
JOIN     goal g ON p.playerId = g.playerId
WHERE    tp.teamId = @teamId
     AND tp.dateFrom <= @date
     AND tp.dateUntil > @date
     AND g.date = @date

ORDER BY g.time

Fig2Results

Notice that both queries contain the same complex WHERE clause on the teams_player table. According to the DRY principle we should try to factor out this logic so that it is only in one place. This can be done using the table-valued function shown below.

CREATE FUNCTION sideSnapshot (@date DATE, @teamId INT)
RETURNS TABLE
AS
RETURN
(
SELECT   tp.teamId

       , tp.playerId
FROM     teams_player tp
WHERE    tp.dateFrom <= @date
     AND tp.dateUntil > @date
     AND tp.teamId = @teamId
)

Squad query:

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.name AS positionName
FROM     sideSnapshot(@date, @teamId) ss
JOIN     player ON ss.playerId = p.playerId
JOIN     positions pt ON p.positionId = pt.positionId
ORDER BY p.positionId

Score query:

SELECT   p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , g.time
FROM     sideSnapshot(@date, @teamId) ss
JOIN     player ON ss.playerId = p.playerId
JOIN     goal ON p.playerId = g.playerId
WHERE    g.date = @date
ORDER BY g.time

I have not included the results sets or execution plans but I can assure you they are identical. Now, any future changes to how the teams_players table is queried is can be made in one place only; the DRY principle has been applied and our codebase is healthier for it.

Why should we stop there though? We could create a more comprehensive function which will perform all of our joins and do the name manipulation logic. According to books online:

“…the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query.”

So even if the function returns more information than is needed, the optimiser will optimise the query at execution time and not perform any additional work? No.

The following function calls our previous function but now also joins to player and positions. As you can see it greatly reduces the complexity of our two queries and certainly provides a better implementation of the DRY principle

CREATE FUNCTION sideSnapshotDetailed (@date DATE, @teamId INT)
RETURNS TABLE
AS
RETURN
(
SELECT   s.playerId
       , p.nameFirst + ‘ ‘ + p.nameSecond AS playerFullName
       , pt.positionId
       , pt.name AS positionName
FROM     sideSnapshot(@date, @teamId) s
JOIN     player ON s.playerId = p.playerId
JOIN     positions pt ON p.positionId = pt.positionId
)

Squad query:

SELECT   s.playerFullName
       , s.positionName
FROM     sideSnapshotDetailed(@date, @teamId) s
ORDER BY s.positionId

Score query:

SELECT   s.playerFullName
       , g.time
FROM     sideSnapshotDetailed(@date, @teamId) s
JOIN     goal ON s.playerId = g.playerId
WHERE    g.date = @date
ORDER BY g.time

Once again the result sets are identical, as is the execution plan for the Squad query. However, upon inspecting the execution plan for the Score query you can see that a join is being made to the positions table even though no information from this table is used. There is a foreign key constraint on the positionId column as well so joining will not restrict the result set.

DetailedExecutionPlan

This is one of the pitfalls that we must look out for while implementing code re-use in Sql Server. Although table-valued functions allow us to easily re-use our code, we must be aware that the optimiser is not fool-proof, complicated queries can result in inefficiencies.

Code re-use needs to be kept simple in order not to confuse the optimiser. We can implement the DRY principle safely in this case, but we need to split the function up into smaller pieces. Below we create a new function which only applies the name manipulation logic and cross apply to it.


CREATE FUNCTION nameConverter (@nameFirst VARCHAR(100), @nameSecond VARCHAR(100))
RETURNS TABLE
AS
RETURN
(
SELECT   @nameFirst + ‘ ‘ + @nameSecond AS fullName
       , ‘Mr ‘ + @nameSecond AS titledSecondName
)

Squad query:

SELECT      nc.fullName AS playerFullName
          , pt.name AS positionName
FROM        sideSnapshot(@date, @teamId) ss
JOIN        player ON ss.playerId = p.playerId
CROSS APPLY nameConverter(p.nameFirst, p.nameSecond) nc
JOIN        positions pt ON p.positionId = pt.positionId
ORDER BY    p.positionId

Score query:

SELECT      nc.fullName AS playerFullName
          , g.time
FROM        sideSnapshot(@date, @teamId) ss
JOIN        player ON ss.playerId = p.playerId
CROSS APPLY nameConverter(p.nameFirst, p.nameSecond) nc
JOIN        goal ON ss.playerId = g.playerId
WHERE       g.date = @date
ORDER BY    g.time

We have not reduced the number of lines in the query, but that was not the object of the DRY principle, the important point is that all the complex logic is now done in one place.

The DRY principle can be safely implemented in Sql Server T-SQL, with the help of table valued functions, but in order to do so you must be careful and follow some rules. Although this article only covers one specific example, the following are general rules that should be adherred to.

  • Don’t attempt to use table valued functions to provide denormalised views of your data.
  • Each function should perform a single task (single responsibility principle).
  • Only use functions to re-use complex logic. Using the function should be less complicated than the logic it contains!
  • Avoid nesting.

Personally, I would only use a table valued function to perform a join where the join logic is complex or non-intuitive. I use them more frequently in examples like nameConverter, where business logic needs to be applied in multiple places across the database.

Download the database script DbCreateScript.sql
or download the solution FootballDatabase.zip

Download the data script DataGeneration.sql

Database Deployment. Hacking the VS database project. Part 2.

In the last article we looked at how, with the aid of pre-build  tasks, we could combine hand-written update scripts with the Visual Studio database project. This concluding article will go through the steps needed to set up an automated build task in Team Foundation Server, based upon the solution we created earlier.

We are going to set up a continuous integration style build task, though for brevity I’m simply calling it ‘Test’. This is an automated build that will run every time a new check in is made. The benefit of this is that if developers make breaking changes to the codebase it is picked up almost immediately after check-in. Developers can work just a subset of the projects in a solution but still be confident that the whole solution builds as it will be built by the continuous integration build task. This is also an ideal place to put unit tests, these can be set up so that they raise build errors if they fail, however unit-testing is outside of the scope of this article.

Setting up a build

I’m assuming that you already have a build controller and a build agent set-up, if you don’t then don’t worry, it’s very straight forward. You can muddle through with the built-in wizard or read about it here http://msdn.microsoft.com/en-us/library/bb399135.aspx.

Go to Team Explorer, Home, Builds and Select “New Build Definition”.

Under General give your new build a descriptive name.

NewBuildGeneral

Under Trigger set the trigger type to Continuous Integration – Build each check-in. Have a read of the other types as well, they are fairly self-explanatory.

NewBuildTriggers

Under Workspace, make sure that every folder your solution uses is in this list, and that the folder structure is reflected in the same way in the Build Agent Folders. Essentially, Team Build is going to copy all of the source files onto the Build Agent before building the solution, this screen defines the folder mapping.

NewBuildWorkspace

Process is the tab where we have to make the majority of changes:

  1. Under “Items to Build” -> “Configurations to Build”. Add a configuration called “Test”, Platform: Any CPU. This will be important later.
  2. Under Advanced, set the following to False: “Associate Changesets and Work Items”, “Create Work Item on Failure” & “Label Sources”. These settings are very specific for bug tracking and labelling, I don’t know why they are not False by default.
  3. Also under Advanced, add the following for “MSBuild Arguments”. This is telling MSBuild that you want to build, but you also want to publish changes to a database.

 /t:Build /t:Publish /p:SqlPublishProfilePath=FootballTest.publish.xml
NewBuildProcess

The publish settings

You should follow the steps from the previous article, as if you were going to publish the solution, set the connection string to that of your test environment, then click “Save Profile As..” and save this profile in the root of your solution, so that the name matches that in the MSBuild arguments above.

Editing the project file

Open Configuration Manager, from the Build menu. Add a new solution configuration called test, and create new project configurations as well. I like to delete the default Debug & Release ones, but you can do as you wish so long as there is one that matches the configuration in the build we just defined.

This is where things get interesting. The problem at this point is that we do not need (or want) all of the PreBuild tasks to run on the build agent, so long as the solution is building when checked in, the dacpac file will be up to date so we don’t need to regenerate it. We don’t want all of these Pre-Build tasks running on the build server as then we would also need an instance of Sql Server. Unfortunately Visual Studio does not allow you to specify different Pre-Build tasks for different build configurations straight off the bat. We can achieve this by editing the project file manually.

In the Solution Explorer, right click the project and select “Unload Project”. Right click on the project again and you will get the option Edit ProjectName.sqlproj. You’ll notice that a number of the PropertyGroup elements have  Condition attributes, the PropertyGroup is only active when the condition is met. Handily, some already exist where the Configuration is the condition being tested. We need to find the element that contains the Pre-Build events, give it a condition to that it is only active for the Dev configuration, then copy it and enter a different set of Pre-Build tasks for the Test configuration.

<PropertyGroup Condition=” ‘$(Configuration)’ == ‘Dev’ “>
<PreBuildEvent>CD “$(SolutionDir)”
CD ..\FootballDatabase\Build
CALL 01RestoreStructure.bat “$(SolutionDir)”
CALL 02WriteStructureScript.bat “$(SolutionDir)”
CALL 03UpdateStructure.bat “$(SolutionDir)”
CALL 04ImportDatabaseObjects.bat “$(SolutionDir)” “$(VS110COMNTOOLS)”
CALL 05WriteDataScript.bat “$(SolutionDir)”
</PreBuildEvent>
</PropertyGroup>
<PropertyGroup Condition=” ‘$(Configuration)’ == ‘Test’ “>
<PreBuildEvent>CD “$(SolutionDir)”
CD ..\FootballDatabase\Build
CALL 02WriteStructureScript.bat “$(SolutionDir)”
CALL 05WriteDataScript.bat “$(SolutionDir)”
</PreBuildEvent>
</PropertyGroup>

The bat files that write the scripts should be left in as the generated scripts are not under source control.

That’s it. Reload your project file and check-in the changes which should trigger a new build. You can see the status of your builds via Team Explorer -> Builds or you can use the web interface provided with TFS. If your build is failing then you’ll need to check that the permissions are set up correctly, your build agent will need permission to alter the database in your Test environment for instance. In the build definition there was an option to copy build output to the following drop folder, you may have been tempted to turn this off (especially as only UNC paths are supported), this isn’t a good idea though as it greatly reduces the amount of build output you can access and makes debugging tricky.

Another great tool that comes as standard in the Visual Studio tools is Build Notifications. This adds a notification icon to the taskbar which will let you know if your continuous integration build has failed.

There is no limit to the number of builds you can have set up against a single solution. You may decide that you want a separate environment for load testing, or a UAT environment that is only triggered manually, if you have internal live environments you could use TFS to perform upgrades. I would suggest you add security to your build task if you are going to set it up with live connection strings, this is supported as standard.

I would strongly advise some sort of automated deployment, upgrading your software is always going to be a risk, this is one way that you can mitigate that risk by ensuring that your deployment process has as little manual intervention as possible. This is the best solution that I can come up with, the drawbacks of it are that it is quite a serious hack and there are a lot of steps where things could potential go wrong. Please comment if you can see any improvements that could be made, or any other 3rd party tools that can be employed to aid deployment.

Download the solution FootballDatabase.zip

Download the data script DataGeneration.sql

Database Deployment. Hacking the VS database project. Part 1.

When upgrading an application, it’s fairly easy to deploy your changes; delete all the existing dlls and replace them with a new set. The problem with databases is that they are often infested with pesky data. You can’t just drop the database and copy in a new one. Database changes need to be incremental, updating data where necessary and issuing ALTER statements to tables.

Broadly speaking, there are two ways you can go about database deployment, high tech & low tech. Low tech is keeping an update script with DDL statements making all the changes between the last release and the next one. High tech is using a third party comparison tool like Red Gate’s SQL Compare or Visual Studio’s database projects, these can generate a change script by comparing the schemas of two databases.

Going the low tech route can quickly get out of hand, especially with multiple developers working on a project or if you have many dependencies between your objects. The high tech path also has serious drawbacks, as the auto-generated change scripts are often limited on the kind of operations they can perform. For instance, you’d never be able to get a comparison tool to update all NULL values of an INT column to zero and then make the column NOT NULL.

This article describes workable solution which leverages the power of VS database projects but at the same time allows for incorporation of hand-written scripts which can handle the more involved schema changes.

The problem with database projects and the workaround

The idea behind database projects is that all the objects in the database are stored as CREATE scripts, from these CREATE scripts a schema is inferred, this is then compared against an existing database to generate the update script. If you rename or change the datatype of a column in the generated script will attempt to drop the table and recreate it (there are plenty of warnings to stop this happening by mistake though).

What database projects are very good at, is managing the dependencies between stored procedures/functions & views (the Programmability). We are going to set up a project which only contains Programmability, and leave the changes to tables (Structure) to our hand-written scripts. The problem is that now the project no longer builds as the Programmability has invalid references to the Structure.

Database projects support the idea of database references, similar to how ddl references work in code, you can references objects in a *.dacpac file and it is equivalent those objects being present as CREATE scripts in your project.

All of our Structure objects will be contained in a dacpac file, the project will build so we can use the generated script to manage our Programmability. The Structure changes just need to be applied beforehand.

Why bother?

Releases will be more robust and less labour intensive. This method will also allow you to seamlessly integrate with Team Foundation Server Team Build. Team build is a very powerful tool which you can use to set up multiple sets of builds all automatically triggered either by check-ins or scheduled. Details of how to go about this are in the second half of this article.

Prerequisites

  • Visual Studio 2012 with Sql Sever Data Tools
  • Team Foundation Sever
  • Sql Server (developer machine)
  • SqlCmd

Get started straight away by downloading this sample solution I’ve prepared, or follow the steps below.

 The database project

To create a database project you need to have the Visual Studio Database Tools installed, then just use the wizard.

This is how I have laid out my project. How you lay out your project doesn’t matter, Visual Studio doesn’t care where you keep your CREATE scripts so you should just lay them out as you see fit. I’ve opted to split mine up by schema and then into object types.

My folders are:

    • Build: Logic required to generate database reference, not part of the database itself.
    • Data: Static data creation scripts.
    • Schemas: Programmability as defined above.
    • Structure: Structure as defined above.Project

Structure database reference

Create an empty database in Sql Server, then use the below command to create a dacpac file. SqlPackage.exe is a tool that comes with Sql Server Data Tools (SSDT).

SqlPackage.exe /a:Extract /ssn:127.0.0.1 /sdn:DatabaseName /tf:DatabaseName.dacpac
Save your dacpac file in your Structure folder and then adding a reference is very similar to adding a dll reference:

ProjectAddDatabaseReference

It might be a good idea to include a test table or two in your database before creating the dacpac file, then add some test stored procedures/functions to see that the project builds successfully.

Structure update scripts

There is no built-in functionality that will allow us to apply update scripts, we will have to do this ourselves. I have chosen to split update scripts into folders for each major release. Each folder has a series of scripts that need to be run in alphabetical order. Each script contains a chunk of work, a group of related changes relating to a single enhancement or new requirement. The scripts are named so that they start with a number which determines the order they go in.

As these scripts will be run against the development database multiple times, they need to be written so that they don’t error if the change has already been applied.

IF OBJECT_ID(‘mySchema.myTable’) IS NULL
BEGIN
  CREATE TABLE myschema.mytable(myColumn INT)
END 

There is one file in the root of this directory which is marked in the project as a PreDeployment build type. When Visual studio generates the database update script, this will be tagged on at the beginning. This script will run, in turn, each of the individual update scripts.

Data update scripts

You can skip this step if you’re just looking to get a quick introduction.

The data scripts contain all static data that applications need in order to function. In my football-orientated example there is a script to create rows in the positions table for all of the allowed positions a footballer can play in eg.  Forward, Midfield, Goalie etc etc.

Similar to the Structure scripts, there is one script that references all of the others, but this time it is marked in the project as PostDeployment build type. This will run after all the schema changes have been applied. Again, it’s important that these can be run over and over again, so write them as MERGE statements not INSERTs.

Tricking the build

HackedProcess

In order for the project to build we need to convert our update scripts into a dacpac file. This is done with the help of PreBuild tasks. The diagram on the right shows the stages in the standard process, in green, with the additional steps in blue.

First of all all the various scripts are combined into a single script, then SqlCmd.exe is used to apply this script to a database on a Sql Server instance on the development machine, then SqlPackager.exe is used to extract the schema out into the dacpac file which is already referenced in our project. This ensures that all the schema objects are updated before the build is attempted.

I’ve included SqlPackager.exe in the solution so that the build agent doesn’t need to have SSDT installed.

The bat files used to do this are included in the sample solution file, they are of course dependent upon the hierarchy of my project, however, I have tried to write them so they are as flexible as possible. Go to the Properties dialog of the project file -> Build Events and add the calls to these bat files.

PreBuildEvents

Publishing

To publish via Visual Studio, go to the Build menu, then Publish… You’ll see the screen below, fill in the Target Database Settings then click on advanced. There are two checkboxes you’ll need to change.

  1. Ensure that “Include composite objects” is unchecked.
  2. Ensure that “Treat verification errors as warnings” is checked.

Publish

By unchecking “Include composite objects” you are telling the comparison tool to ignore everything in the referenced database project, we want this as we are making our own changes in the PreDeployment scripts. Treat verification errors as warnings means that although the build process thinks that the publish will fail, it will only show up warnings and the publish will continue.

So that’s it, you should now be able to manage all of your changes within the database project and easily deploy to your development environment. If you really want to reap the benefits of database projects though, you need to hook up to Team Foundation Server and take advantage of Team Build.

Download the solution FootballDatabase.zip

Download the data script DataGeneration.sql