February 2, 2008
A little over a year ago, I wrote about the importance of version control for databases.
When I ask development teams whether their database is under version control, I usually get blank stares.
The database is a critical part of your application. If you deploy version 2.0 of your application against version 1.0 of your database, what do you get? A broken application, that's what. That's why your database should always be under source control, right next to your application code. You deploy the app, and you deploy the database. Like peanut butter and chocolate, they are two great tastes that taste great together.
When it comes to version control, the database is often a second or even third-class citizen. From what I've seen, teams that would never think of writing code without version control in a million years-- and rightly so-- can somehow be completely oblivious to the need for version control around the critical databases their applications rely on. I don't know how you can call yourself a software engineer and maintain a straight face when your database isn't under exactly the same rigorous level of source control as the rest of your code. Don't let this happen to you. Get your database under version control.
I was thinking about this again because my friend and co-author K. Scott Allen just wrote a brilliant five part series on the philosophy and practice of database version control:
- Three rules for database work
- The Baseline
- Change Scripts
- Views, Stored Procedures and the Like
- Branching and Merging
K is one of the smartest software developers I know. Read it all; even if you currently have your database under version control (and bully for you if you do), there is much food for peanut buttery chocolatey thought here. It doesn't matter what tools you use-- per the agile manifesto, individuals and interactions are more important than processes and tools. Just get your database under version control already.
Posted by Jeff Atwood
Thanks!!, Database Under Version Control is great Idea
I just think the best strategy is to make your application create and update the database itself. So all the DDL code is a PART of your application and therefore it is automatically version controlled.
Once you have used ActiveRecord Migrations it is very hard to go back!
I absolutely agree. I see three things you can do to avoid any problems with database inconsistencies:
1. Keep your SQL scripts as part of the source code under the version control
2. Using those scripts automatically recreate the database as part of the build in order to...
3. Have integration tests that test the DAL code and run on the actual database.
Do you suggest to put the data under version control or the schema only?
I'm just printing your posts and casually leaving them at my bosses table. Keeping my fingers crossed :)
Once you have used ActiveRecord Migrations it is very hard to go back!
AR's Migrations have a problem scaling with the number of devs though, since the intrisic linear versioning model of migrations is, well, braindead (it's a centralized model for a distributed reality, basically), as soon as you have 2 or 3 devs creating migrations it breaks down *very* quickly.
That's, I think, the worst problem about migrations: they're really cool, as long as you have only one or two persons creating them, or you ensure that not two persons will create a migration at the same time.
His approach to change scripts is naive. We have a lot of problems in this area. We have nearly 60 developers working on an application. With a lot of schema churn. Big problems.
Quote: "You deploy the app, and you deploy the database. Like peanut butter and chocolate, they are two great tastes that taste great together."
I'm allergic to peanuts.
Do you suggest to put the data under version control or the schema only?
The part of your data that represents constant or configuration data should definitely be under version control as well. Every time you deploy the DB, anything that must be there for the app to run out of the box is a candidate for scripting and versioning. As for the *variable* part of your data, have a synthetic data generation plan for testing.. ideally.
I'm allergic to peanuts.
I invoke poetic license!
like duh ....
I mean how else would you upgrade from one version to the next? Sql is completely scriptable, so there's no excuse.
@anonymous - No, just schema changes, stored procedures, views etc. The data doesn't belong to you. You would also script data changes and new data if applicable (ie new reference tables, data migration, etc)
We also script all stored procedures and views, and they're recreated as part of a deployment script. Just like code, if it's not under source control, it's not guaranteed to work. Let the dba's and project owners know this, that way they don't add or change things without your knowledge.
Another good trick is getting a daily restore of production to dev, and as a part of your daily build, run the upgrade script against it. If it fails, so does the build.
Ditch the silly UI's. I mean seriously, it's pretty easy, and it will save your bacon.
When I ask development teams whether their database is under version control, I usually get blank stares.
As already answered by you above in response to ano nymous:
As someone who used to develop his share of DB based apps, I can understand this stare ;-) For DBAs and DB developers, the database is the whole big thing; ideally packed into one file on the disk, but it can also be a whole dir, a collection of dirs or scattered over several machines. How do you do VC for that, and does it make sense?
But since I use VC (not thaaat long ago...) I made a habit of checking in my DDL scripts along with all SQL scripts to fill in initial (constant) data, test data and migration scripts. That's the stuff that profits from VC (think of comparing your DDL to get the schema changes you made and forgot to document). So you should be more precise with that question, like "is your schema under version control"...
Thanks for the links! I have some create scripts in my solution, but the versioning really needs to have a way to make changes, account for loss of data, deployment on database with data etc. It's tough and I don't know how to do that, I wish I had Ruby on Rails' migrations.
We do something similar to ruby migrations, basically creating a separate update script for each individual change to the DB. The database has a table which keeps a record of what update scripts it has run. Then we run a simple console program to that updates the db to the latest version.
This is run in a post compilation phase on the developer machines so that each developers database is up-to-date with the latest committed changes. This is also run during continuous integration to setup the database for testing.
Works quite well for us. We are missing rollback scripts but that hasn't been a problem for us, and it would be easy to start doing them if we ever needed them.
we do a change to db, then script it and include it in a script file like 4.0.8DB, 4.0.9DB. if u want to upgrade an existing database u run the script. these scripts are run on production databases so we need to be careful...no dropping tables and recreating them. we use red-gate in the pre-packaging phase to make sure the script doesn't introduce discrepancies.
we keep backups of all previous versions for the occasion we need to patch prior versions.
the db data is also backed-up in the test environment so developers can quickly do a restore to replicate the screen on which any error has occurred.
yes we also use CruiseControl
We version control our database schema. We do this as follows:
1) We use Hibernate, so we can generate a full database schema from our Hibernate mappings.
2) Whenever we make a change to our Hibernate mappings, we regenerate the full database schema.
3) We then diff it against the previous version of the generated schema, and we use the diff to create a schema upgrade script.
4) We then check in the new generated schema and the upgrade script in a version-numbered directory (0059, 0060, 0061, etc.).
5) These directories have to be in a linear order, because there has to be a well-defined sequence in which they are applied. Sometimes we do get overlapping schema changes (e.g. someone on a release branch has to make a schema change for a high-priority bug), in which case we use a decimal point (e.g. 0059.1) for that schema change.
6) We can then take production database snapshots, download them to our QA environment (data and all), and upgrade the QA database with the schema upgrade scripts. This is how we QA the schema upgrades before we do the production release.
7) We only version-control the schema, not the data.
8) We do need to have occasional downtimes, but we're working on reducing the length of those. We've mostly been able to avoid large, expensive, touch-millions-of-rows table updates.
This post has been added to TechZeus!
the problem i've always had with db schema source control though is that application code can generally be rolled back to a prior version without issue, but db schema changes have to be _undone_, and i'm not aware of any automated way of generating the script to undo an arbitrary set of schema changes.
for example - how would you roll back a data normalization update, without a manual script, or loss of the data changes since the update?
Damn,... I mean,... I was doing that but I didn't even realize about it, it just felt the natural way to do it. Or just How could you not be doing it?
But then again, if I got to work in an existing environment that doesn't do it, I'm not sure if I will realize it's missing (well, I'll now pay a little more attention to this).
What I'd like to know is, how do you best manage a production site where running the upgrades could take tens of minutes, but you don't want the site offline for that long? How do large-scale sites manage their upgrades to avoid down time? Or do they simply accept that there will be 2am-4am windows where the site is planned to be down?
Hi Ned. This is an idea. Upgrade a copy of the site/server, and re-point to the new one once its ready. Of course from the point of migration, any live data would also need to be "upgraded" which may or may not be possible.
I work on a database that is embedded in a product, so it is probably smaller than most of your examples. The database was not baselined when I started. Someone would take an existing database delete the records make the changes and check the DB files into source control. There was no way to see what changed.
We are using Sybase SQL Server. It has a DB unload command that writes out a SQL script. I used that tool to create the baseline and put the script into source control. I found that each time you run this command the output is in a different order. The process I follow today is to prototype the changes in an instance of the DB and make the changes manually in the script. I test and when it is ready I checked in the changed script. I can then pull up a version tree and do differences of versions to see what changed.
We are working on a new version of the database code plus schema. In this new version the SQL script is the only thing checked in. When the product starts, if there is no database files it creates them with dbinit and then runs the script to create the database. We also have upgrade scripts so that we can update existing instances at customer sites.
As was suggested in some of the earlier posts, one of the first things I did after getting the new database running is to create a tool that generates test data. This is used by my Nunit tests that tests the application and the database schema. The unit tests round trip the data and compare what goes in with what comes out. There are also tests for the database triggers. This gives me the confidence to make any changes in the code that we want, to refactor, and to take patches from the database vendor.
Come on, Jeff! Version control systems *cannot* be used to manage database schemas without some serious fudging because they are short one dimension. As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?
The easiest solution I know of is to maintain a seperate folder with a set of pairs of database scripts, whose names correspond to the version number in your SCM system. As part of the build process, every script between the current version and target build gets executed: forward scripts for new code, and reverse scripts for rolling back to previous versions.
Yes, this is a more work: you have to create a reversed version of every database patch you make that unALTERS TABLEs, but at least you can roll back to previous system states easily. And yes, you will lose data if you roll back to a version with with a poorer data model, but what would you expect to happen?
lucky for me the only version I care about is the most recent. All I need to keep are my create scripts that I back up at the end of every day.
What's all the fuzz about? I use sqlalchemy models and sqlalchemy migrate to manage my database and write migrators, it's all under source control...
As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?
Rails does that by integrating upgrade and rollback in each migration.
Basically, a migration has a method "up" and a method "down", when you migrate forward "up" is invoked and alters the schema in a given way (adding columns, normalizing, filling values, ...) and "down" is supposed to do the opposite and downgrade the database.
It doesn't work if the migration creator doesn't fill the down method of course, but as long as you do it should be possible to roll the db forward and backward.
This might work for a small or medium size database. My database is well over a terabyte, how would you keep that data under version control? Data gets modified every day. Some of the data is encrypted. Some of the data gets inserted into audit tables. Data gets replicated to other servers/databases. There are jobs that pull in data from real time systems every second. I do have different versions of DBs on staging and QA servers but only one on the production server. Changes have to go through change management, you have to open a ticket to do a change. This is not something you would do one two three
Jeff, I saw your first post on DB version control. My problem is that I haven't the first clue HOW to do this properly. I work with a dev version of my DB when I've developing, so the data itself isn't accurate. Plus how can you version control the data, SQL dumps? Those would be BIG files, cumbersome, etc.
I could export the schema as SQL commands, but then how do I import this schema to my live DB, with existing data? I'd have to rewrite the schema output to update the existing tables. That's even more work than simply applying my changes when it comes time to go live.
Due to the hurdles above, I find it perfectly acceptable to make constant backups, and carefully apply DB changes when it comes time to go live. If you have some suggestions, I'm all ears.
Funny, year's ago I wrote an set of ant tasks that would bring the database up to the current version from ANY previous version. If multiple developers would each develop on their local installation, after a update they would get all the changes, including migration of data, including the creation of backups of the table, adding columns, migrating columns, etc. etc. This has worked perfectly for years. These same builds where used to upgraded the production databases too.
But at that moment I really didn't think anything of it, just common sense. Great was my surprise that I met 1) people (developers) who thought this was TOTALLY awesome, 2) people (developers) from whom I would just get blank stares and who couldn't for the love of god figure out WHY I would go through all this trouble.
PS Before you ask, i cannot share the tasks since they depend for a major part on the meta-data of our proprietary application. Otoh, it's really not rocket science.
Hey Now Jeff,
From now on when we eat peanut butter cups I'm going to link of DB's.
Coding Horror Fan,
What is version control? Seriously though, I still have not gotten the hang of using SubVersion and we don't use any version control at work. SQL Server does not offer any built-in support for version control. I just generate data dictionaries and SQL create scripts every once in awhile.
For anyone in Java land, Liquibase is a really great database migration tool. We've been using it to help manage our releases and its very flexible.
Good post and good links
Thanks a lot
Practically all the applications I work with are enterprise applications where the database is never redeployed; it gets upgraded in place. Tools like Red Gate's SQL Compare really come in handy for generating update scripts for our next roll-out.
Hmmmm, Peanutbutter Chocolate!
Database should keep under source code, but if we are working in environment where frequent changes and fixes are in progress, it's too pain full to maintain this much number of version. And maintaining data along with whole schema is tedious. There should be automated tool set require which can create version automatically after predefine interval or time.
WTF does the picture have to do with the article? You're constantly breaking your own blogging rules/recommendations. Trash.
My DBA's see no value in source control the db schema. And their manager out ranks me. We just do compares and generate modify scripts
There are plugins to get around the AR MIgrations problem when you have multiple developers.
@Denis, others - we're not talking about version controlling the data.
@Robby .. Why in Fords name would you create scripts to 'uninstall' your database to a previous version? I just can't for the life of me figure out why you'd do such a thing! I don't know where you come from, but NOT losing data is the whole point of having a db in the first place. It begs the question that if you can afford to lose data, it shouldn't be there in the first place (some exceptions apply) If you need an old version around to test, then save a backup mate!
(just because it's possible, doesn't mean you should do it. Ugh.....)
For the data portion, you back up the database. Cluster it. Ship logs. Offsite storage. Whatever. This is the dba's job and they usually have pretty good tools for it.
What we're talking about here is x developers working on a new release. You NEED to create a predictable and repeatable process in which to upgrade a 'production' copy of the database with all of the changes the dev guys have done so that the new code works as expected.
PS Don't give me crap about 'with 1 guy it's too much effort' .. I mean what's harder than writing a few lines of text in a text file? Now, just version control the upgrade file. Done. Easy.
Bonus: you can blame people because you can see their mistakes by doing diff's on the upgrade script.
-- adding newCol for xyz, requirement 1.2.33a
ALTER TABLE dbo.MyTable ADD newCol INT NOT NULL REFERENCES ... etc.
Finally, I think it's pretty safe to ignote baselining, I think this in only applicable to some developers based on the type of product. i.e. people who sell shrinkwrap software. For your in-house/large site - everything is just a change script. There's usually only ever 1 copy of the app in production, and only 1 database. Don't waste your time until somebody really needs a pristine database.
The main structure of the post goes like this: "Get your (ambiguously defined) database under version control or I'll drop ad hominems on you; oh, and for supporting evidence leave this blog". Not Jeff's best work.
You're so spot on, except for the peanut/butter combination. But I guess that one is personal ;-)
I agree with others this post was trash.
To support simple schema changes I started an Open Source project called Crank http://codeplex.com/Crank
It's still in it's infancy but it's a sort of Declarative Short-Hand for Transact-SQL.
Any feedback is welcome
@Anon : the reason i bring up the concept of a different script for patching backwards vs. forwards, and its interaction with the *data* is not because the data is important per se, but because in my experience there are times when the test database doesn't react the same way from a performance perspective as the production database, so you may find yourself needing to roll back a schema update as part of the re-engineering. also, when i work with a test database, i'll typically work with a subset of live data, because a developer's test data is rarely a match for samples of real data when it comes to looking for problems in your code, and in my most recent project that sample ran around 600-800mb -- difficult to keep a backup of this set everytime i make a schema change, and very slow to perform a full restore every time i need to roll back a schema change. you could argue that it would be easier to keep transaction log backups and just do a point-in-time restore, but that would get *all* data modifications involved in the rollback, like the full-restore concept.
the Rails migration method referenced by Masklinn sounds like a great solution; it's still manual, but it sounds like it automates about as well as it needs to. the goal as i understand it, is to be able to *without needing to think about the details*, be able to revert and roll forward through versions of your code, and be sure that all code associated with that revision number is internally self-consistent. the bonus is to have that rolling forward and backward be data-agnostic, i.e. it'll work around whatever's there. that way, it should also be safe to run on a production server without fear of data loss. fyi, the production database that shaped most of my opinions here was 85-100gb. backups and restores of any sort were out of the question, and schema modifications had to touch as little as possible.
@Don - "If you revert the database scheme, of course you loose some data, because the new data is not compatible with the old scheme." -- i don't know that i agree with that. if your schema change for a given revision requires new data, then yes, rolling back that change should lose data, and attempting to subsequently roll that forward again would leave some holes. but many changes are for reorganization or optimization purposes, and those (IMO) could/should be treated like any other refactoring - if your tools are up to the task, you can make some pretty sweeping changes with confidence. heck, if i could just hit CTRL-Z or CTRL-Y to undo/redo updates across my entire application (even if it wouldn't let me scroll through changes that required data modification), i'd be Speedy Gonzalez the Developer.
@Don - "You could rollback database by installing major release and patching all the change scripts to the minor release you were in." -- This would be a pretty cool idea, especially if the 'rollback' script was just the "roll back to the most recent update that would cause data loss", and then roll forward from there. it might be easier to automate an undo script against a fixed point of reference (from a schema perspective).
@Anon - "Why would you need to roll back? Just rebuild from scratch." -- this works great for a development database, ignoring any of the quality-of-test-data concerns from above. i've run into downtime issues with large production database modifications - doing *anything* to a high-availability db over maybe 30gb is tricky, and i have a hunch that most databases that size tend to be high-availability, simply because that concentration of data implies more frequent usage of that data.
sir, im a php programmer, iam recently joined php **company with fresher. i need some database how to cooneect/ select/ how to write php code with links, how to retrive data from database dynamically? plece send message me sir. my E-mail: raki_ki20002yahoo.com
Anybody annoyed by the weird threading limitations of this (and basically all) forums? Anyway...
@AnotherAnonDev: It's weird, but the truth is if you want be able to roll back the code to any previous version, you have to be able to roll back to the corresponding database schema. The schema and the code are joined at each version. The best way to do this is to be have scripts that unALTER tables.
@Masklinn: Yup, Rails is neato.
The problem is that version control only has one dimension: time. (Okay, there's branching, but really you just go forward and back). You can pluck out any version of your source code from any point in history, and it will work as well now as it did then.
But, that version of the source code won't necessarily work with the current version of the database. And if you want to preserve the data in the database when rolling back, you have to go through and carefully reverse all of the changes between the versions. Most SCM systems don't support this natively.
Jeff, I think you ought to go into a little more detail about putting the database in source control. It's not as easy as putting the source code in source control. It's not a simple controversy like putting the documentation or the images in source control. **It's an entirely different level of problem.**
Let me chime in with support for everyone who has pointed out that DB rollbacks to a previous version are an absolute must for large enterprise systems. There's no way someone with a 500gb+ database is just going to accept "wipe it and reload" as a fallback option if an enterprise application upgrade goes pear shaped. I made the mistake of suggesting that to an ops team handling a multimillion dollar enterprise app when I was a young DBA, still wet behind the ears. Suffice to say that I had that idea beat out of me in post haste.
Speaking from hard experience, a version controlled DB schema must consist of the following at minimum:
1. A script immediately available from source control to build a blank database for any given major/minor release. Either the DDL script itself or an equally available sister script must also load the basic fact data required for the app to run. The theory here is that anyone can arbitrarily pick a version of the DB schema that matches up with the version of the app they need to bugfix/test/demo and quickly install it to a working state.
2. A script immediately available from source control to upgrade an existing database at version n to version n+1. If version n+1 introduces new columns that didn't exist in version n, reasonable defaults must be populated if the app can't cope with NULLs. Such scripts should be able to be chained together also, so a user at version n who wishes to go to n+3 can run three upgrade scripts in a row and arrive at n+3 in a functional state.
3. A script immediately available from source control to downgrade an existing database at version n to version n-1. If version n-1 removes columns or whole tables that version n introduced, data must be both preserved in other schemas/tables that won't affect the app n-1, and then the new data should be removed or adjusted to be what app n-1 needs to function.
The problem with all this is that I've yet to encounter a tool that really does all three of things with any degree of reliability. Some DB specific tools are floating around out there, but for someone like myself who supports large applications that can be backed by one of many possible db architectures (Oracle/DB2/etc) this is all a tedious manual error-prone PITA.
Our team has the database in version control, and it has given us a huge leap in confidence when delivering a release. A branch of the database exists for each branch of the code.
We've been using DBGhost from Innovartis. It lets us make a database package that will upgrade the database from one version to another. Static data, mostly lookup tables, and pre- and post-migration scripts can be included.
The automation toolkit for DBGhost allows us to include the database in our continuous integration builds. Every time a database change is checked in, a test database is created, and all database code is verified.
We've run into the occasional automation toolkit bug, all promptly fixed, but I can't recommend this tool enough. We've come a long way from all the developers working with a shared, non-source-controlled development database.
hahahaha, we used to constantly have problems with programs and database structure, everyone would always disavow knowledge about why stuff changed. Now we have version control, and whenever anything gets messed up, the same people just blame the version control software. Never underestimate the ability of people to adapt their incompetence to any situation.
Thanks. Good post and good links. I get all my work under version control, and only need backup version control database every day.
funny to read this. I have been doing version control for all applications for at least 4 years now. And you put this as a new big thing..
One of the problems with "getting your database under version control" is trying to make an analogy between the database and the directory tree that most source code projects use.
A couple of years ago I worked at a client where the vast majority of the development effort was in the database, either as schema or metadata. All the "software" bits were an afterthought. We developed at CM technique I have (arrogantly, self-interestedly) named Longacre Deployment Management.
The technique acknowledges that you just can't "roll back" a big database. Instead, you roll forward -- if you have to "fix" a problem, you fix it by delivering more changes. Sometimes those changes revert the database to a state very close to something that existed in the past.
The technique is documented at CM Crossroads, in an article from April of last year: "Technical Introduction to Longacre Deployment Management." The url is http://www.cmcrossroads.com/content/view/7910/135/
integration solution for SQL Version Control
script database objects into sourcesafe.
I feel that the database server vendors should provide a -DEV switch for their server. This would handle version control inside the database server for large multi-user development teams. TFS is way to complicated to the average team and there are too many entry points to change a database object.
Very interesting blog and comments.
Currently we use redgate to carry forward schema changes.
I have a question for you who currently keep your db versioned:
Take this scenario:
A team has started working on a project with some new functionality that will go live in 8 weeks. In the first week a dev makes a change to some table schema, saves the .sql file (am I right here?) and checks this in to source control.
Now, a week later, an urgent production fix needs to go out that will involve changing the same tables.
Surely when the schema in source control goes live, there are chances of overlapping changes, meaning potentially the schema scripts that have been versioned will fail when they are run in to the live environment whereas they ran fine in test the first time?
I guess my misunderstanding here comes from the fact that at least with code you can merge your changes no matter where in the software cycle you are working whereas it seems here like the schema would 'track' the code, meaning it is effective for that branch in scm only with no relationship/effect to any other ongoing build?
You get the idea:
Isn't this just something a version control tool like www.accurev.com or www.perforce.com or www.SVN.com would integrate with rather than doing on their own?
Even if per agile manifesto the tool does not matter, let me point to deltasql.
deltasql is able to tag sql scripts with a version number, and produce the SQL needed to upgrade a database from one version to another, which is very useful for upgrade scripts.
You can test deltasql here: http://www.gpu-grid.net/deltasql (if you login with user admin and password testdbsync)
deltasql can be downloaded at http://sourceforge.net/projects/deltasql
dbMaestro (www.dbmaestro.com) makes a product called Teamwork which enables you to solve many of the issues mentioned in this article.
It provides database level locking of schema objects and enables check-in/check-out type operations. It also manages versioning of schema objects and deployement of objects.
I think I understand your last question. How we handle releases is that we have a release branch, the head of this release branch will mirror the production application and can be used to patch the current production release without introducing anything that is in development. Now this release branch would also contain any scripts needed for creating the database and change scripts. The urgent production fix (or change to the schema) would be done on this branch. Later on you would want to merge this fix back into the trunk to ensure that the next major release of your application also contains this fix.
1 3 \ 5
Ok, here is a tricky question for you, wizards!
Suppose you have a normal software development with one main (dev) branch and release branches for each version of your product. The x and numbers below them represent changes made into DB schema. So, in the branch for version R1 you have changes 1 and 2, in the Head you have 1,3,5 and in version R2 you have changes 1,3 and 4. Keeping your db scripts under version control you may go forward (and optionally backward) in the sequences (1,2), (1,3,5) and (1,3,4). But the most trivial use case is to upgrade a client from version R1 to version R2. Then how you transfer a database across branches from build#2 to build#4 ?
One way is to support rollback scripts. From 2 to go 1, then to 3, then to 4. But rollback doesn't always work. What if the change introduced in #2 drops a table? The client has build#2 - without that particular table, it is gone and you cannot restore it back just using SQL scripts. A typical situation could be - you give to your clients R1, initially with just change #1 in it. Then your clients complain that your application stores too much data and the database is growing big too fast. You make a quick fix - just drop a table and this is change #2. After a longer research, in your head version you decide not to drop the whole table but just to drop some columns from it - this is change #3. In change #4 you use the data from this hypothetical table for some calculations. You end up in release R2 and your client asks you for the latest version - you should transfer R1 build#2 to R2 build#4. You cannot do that because in order to execute changes #3 and #4 you need your table that your clients do not have.
A workaround is never to drop tables or columns. But this is against the main goal of change #2 - to reduce DB size.
Another workaround is to make full backups of clients' DB before transferring them to newer versions. This wouldn't save size too.
You have discussed the simple part, which is putting the DB schema and/or scripts under source control. That seems obvious to me. No big problem to solve here.
The harder question, one actually more deserving of a blog, is the situation where the data itself is the product you are selling. Consider a situation in which the schema stays relatively stable or even constant for years and what you sell to your customers, and need to control, is a constantly changing proprietary knowledge base.
Some customers will still be on old versions, others willing to upgrade to the newest. Sure, anytime your stubborn customers have a problem you can force them to upgrade, but that isn't practical. You need to correct data "bugs" in older versions of the data and re-release it. Further, when a customer does upgrade, if there are customer-created tables/DBs referencing your changed data, you may need to provide them a means to automatically upgrade the "relationships" between their tables and your data (e.g., if key fields such as IDs have changed).
After many years of development, I also still get blank stares from developers.
To this end, I've released an open-source tool for SQL server databases:
Let me know what you think.
What are those delicious looking sweets in the photo though? I bet you can't get them in the UK, mmm peanut butter and chocolate....
This post was very lacking; I believe the main point was to put up some links and declare your love for chocolate peanut butter cups. I have to click links now to get the info? Don't make me think! Summarize summarize summarize.
Delicious looking chocolates!.... They look grim!
As others have pointed out, the analogy of "source control" to databases breaks down in several instances: branches, rollbacks, test cases, patching, and so on. Code is stateless and data is, well, state; because data is intermingled with schema, you sometimes have to treat the schema as immutable.
That's not to say that effective database versioning isn't a positive goal, but it's a lofty one and it's important not to fool ourselves into believing that we've attained it. What you get with existing tools is a very limited kind of version control: the ability to create a new database from scratch that's compatible with any given version. That's all. It's great if you do a lot of new rollouts, but only marginally useful if you work on a long-term production system.
As long as it's understood that having your database scripts in SCM isn't going to afford the same protections as having your code in there, and that you should still be very conservative with schema changes, I think it's probably a good thing to do if you have more than 3 or 4 developers. If you're a pair or a one-man show, it's probably not worth the effort, because the frequency of changes should be rare.
'It doesn't matter what tools you use' What utter nonsense.
'-- per the agile manifesto, individuals and interactions are more important than processes and tools.'
From the agile manifesto:
"... we have come to value:
Individuals and interactions over processes and tools"
Putting a value on some entity is not the same as ascribing importance to it.
A litre of water costs 1, a kilogram of food costs 2. Which one is more important?
I think its an good idea to version control the database scheme and basic lookup data. That way the database stays under control. Of course then the version control process needs to be very smooth so that you don't have to curse the efforts.
@James: You could rollback database by installing major release and patching all the change scripts to the minor release you were in.
@Robby Slaughter: If you revert the database scheme, of course you loose some data, because the new data is not compatible with the old scheme. But lookup data and such is part of the database version, so you don't loose everything. And if you have the reverted patch script left, you can try to extract as much of the reverted data as possible of it.
If you revert some random patch from the version control stack, the database might end up messy. But so could end up messy program code too, if you don't know what you are doing.
If you use some test data for the database application, you have the test files version controlled too. You should not release a new version of the application before all the tests are upgraded to support the new version and are showing green lights in the unit tests. Each test case has its own test data. Before each test case, the test script wipes out all the data and inserts new test data into the database.
"His approach to change scripts is naive. We have a lot of problems in this area. We have nearly 60 developers working on an application. With a lot of schema churn. Big problems."
Are they not making source code changes? Have you not figured out how to handle this? They are the same thing.
"As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?"
Why would you need to roll back? Just rebuild from scratch.
"This might work for a small or medium size database. My database is well over a terabyte, how would you keep that data under version control? Data gets modified every day."
Nobody is talking about version controlling *data*...
Have you looked at dzo (http://sourceforge.net/projects/dzo), it have solution for some of the problems addressed here. It compares a file with native sql-code (create table ..., create view ..., grant ...) for all database objects in the schema with the actual database-schema and shows (or executes) the needed sql to transform the database schema to the sql in the file. It have support for lookup data (called referencedata in dzo), ie data that the application just reads. Dzo supports at the moment MySql, Oracle and Sql Server.
I've been working on a database schema management system for the past six months. All approaches I've investigated fell short in one aspect or another. Support for branching is a big concern. Also, none of the solutions I've investigated made any distinction between tables and programmables. Concurrent changes are also important.
I've written a 2-tier java swing app that manages a repository of database artifacts, by organizing them into projects. The design in inspired by industry standard versioning techniques and is artifact driven.
I've made a lot of progress; I am putting the finishing touches on the oracle schema import feature. It will import and provide a base-line for tables, constraints, indexes, views, functions/sps, packages, and sequences... More work remains, certainly.
I agree with many of the others that commented here. It is amazing that issues with controlling and versioning database schemas still pervade the IT landscape.
There is an interesting new product called DBVS that addresses these issues in a unique way by translating database objects into XML and versioning those files rather than SQL scripts. This methodology also allows DBVS to deploy and rollback changes automatically.
Their website is http://www.dbvsys.com.
If you want to go one step further with version control of your database schemas, branching / merging your developments, automate the incremental delivery generation from version information you can take a look at neXtep designer :
It is a free GPL product which currently supports Oracle, MySql and PostgreSql (DB2 is on the way).
It is a complete database IDE based on Eclipse RCP. In the environment, you actually work on an "offline" version control repository which you synchronize with your development database. Once a development is made you can commit it and generate a delivery from version information. A standalone installer will then be able to deploy your delivery and will check your target database structure to validate the deployment.
Among other things, the IDE offers data model diagrams, dependency management, version control, reverse synchronization, modular data models, integrated SQL client, SQL editors, auto-completion, etc.
The product has already been adopted by some leading e-business companies in France. Any feedback would be highly appreciated.
Complete documentation, tutorials, demos, concepts overview could also be found in the wiki :
I’m looking for 3 simple things... well, I thought of them as simple, but I’m realizing they’re not to be taken for granted!
1) a reverse engineering tool that I can point to an Oracle schema and get a “baseline” script to re-create that schema from scratch, with decently formatted DDL files (1 per object) neatly organized in a directory tree (by object type) and called in the correct order. Icing on the cake would be an option to pass the tool a list of tables containing static data and get DMLs to populate (insert) those tables as part of the script.
2) a diff tool that I can point to a pair of Oracle instances (source and target) containing a given schema and get a “delta” script to alter the target schema to become identical to the source schema. If data loss occurs on the target instance (i.e. drop a column) I would like to find a warning comment inserted in the script (e.g. “-- Attention: data migration DML needed here?”). Icing on the cake would be an option to pass the tool a list of tables containing static data and get DMLs to update (delete, update and insert) the data in the destination tables to become identical to the contents in the source tables *without* deleting and re-inserting all rows (or dropping and repopulating the table).
3) I would like the above two tools (that, as you will have recognized, are basic to putting your database design under version control) to be open-source, with a command-line interface and a vibrant community backing them.
I must be one out of a couple million people asking for the same things over and over again: I’ve seen the questions all over the internet but I could find no straight answer. Please help!
Thanks and take care.
Check out the tool Lure at http://earthly-software.com. The demos on the website will give you a quick introduction of how this tool works.
I believe it meets all your stated requirements except that it is new and commercial.