Version control is an important aspect of writing code. From the initial check-in to the final build, it is crucial to keep track of every change that is made to the code. Like stacking blocks of legos, you will find what started with a single brick evolving into a city. Databases, however are often ignored from version control. If you are lucky you use some form of ORM to connect to your database like Active Record. This will handle database migrations for you and provide database version control to some degree. However for legacy projects you are out of luck. More often than not I have seen the code managed very carefully with awesome branch and merge strategies implemented. Whereas all the database scripts simply dumped in a directly called SQL. Or if you are lucky there will be sub directories with stored procedures, tables and functions in their own directories.
Here’s how it goes on a typical release day. The developer hands you a shiny release package with a nice bow on top. “Nice !” You say. Suddenly he reaches in his trench coat and hands you a soggy bundle wrapped hastily in brown paper. “What …is that ?” you ask. “SQL scripts” comes the simple reply. “Well…what about rollback ?” you ask helplessly. “Restore from backup !” the developer smiles. “But…the database is 3 TerraBytes!” you yell. The developer leaves.
Just kidding. That has never happened to me. And here is how you can avoid this from happening to you. You see there is a difference between versioning your database scripts and actually versioning the database. By versioning the database you get the ability to apply and rollback changes or “deltas” to your database. The simplest way to do this is , as mentioned above, is to use an ORM. However if you have an already established database and code and switching to ORM is not an option then there are other ways in which you can do this. There are products available on market that can help you version you database. For SQL Server Red Gate offers SQL Source Control that integrates with leading VCS. For MySQL there is dbv.php. Then there is software like Liquibase which is feature rich, free and supports a lot of databases.
And when all else fails grab a scripting language of your choice and automate it yourself. With a mix of automation and policies you can bring any database under version control. Before you start you will need baseline of the database. A baseline will help you create the initial empty copy of the database on top of which you will apply all your changes. There are two way you can do this. Extract the schema of the database or else maintain a skeleton copy of the database i.e. truncate all the data in all the tables. Personally for established projects I like to create skeleton copies of the databases rather than extracting schemas because sometimes extracting schemas can be quite hairy. And secondly even if you manage to extract schema there is no guarantee you will be able to recreate the whole database from schema. So take it as a pro tip, a zipped up skeleton copy will make your life easier. After than implement some simple policies. All scripts must be checked into scripts directory. Every scripts must have a corresponding valid rollback script checked into rollback directory. Every script must start with a integer which must be the next available integer. So if the last script was called 04_create_blah.sql then the next one should be 05_delete_foo.sql. Now create a table in your database and called it VersionControl which will contain version control meta data. This will include last version applies, the date and time, user name who applied it. Then write your automation script to read from the directory and sort by name, read from the table and figure out the last version and figure out which scripts need to be applies. You can also have your script take parameters for rollback to apply the scripts in reverse. And your have poor man’s DB version control. Remember this solution works if your database is not shared by multiple applications. You can always expand on this solution and make a more complex solution to suit your needs. Another problem you will face is very soon your scripts will reach in hundreds at which point you will find it is taking too long for you to apply all the scripts on top of the baseline. At some point you will have to take the baseline again. Just baseline your database, archive your scripts and rollback scripts. This is assuming your data loading scripts were handled else where and this mechanism was handling only DDL scripts. And you are all set again.
What has been your experience with database version control ? Let me know in comments below.
Note: Originally published on http://www.nimkar.net/index.php/9-release-management/3-database-versioning-the-ignored-aspect-of-version-control