I consider this the golden rule of source control:
Check in early, check in often.
Developers who work for long periods -- and by long I mean more than a day -- without checking anything into source control are setting themselves up for some serious integration headaches down the line. Damon Poole concurs:
Developers often put off checking in. They put it off because they don't want to affect other people too early and they don't want to get blamed for breaking the build. But this leads to other problems such as losing work or not being able to go back to previous versions.My rule of thumb is "check-in early and often", but with the caveat that you have access to private versioning. If a check-in is immediately visible to other users, then you run the risk of introducing immature changes and/or breaking the build.
I'd much rather have small fragments checked in periodically than to go long periods with no idea whatsoever what my coworkers are writing. As far as I'm concerned, if the code isn't checked into source control, it doesn't exist. I suppose this is yet another form of Don't Go Dark; the code is invisible until it exists in the repository in some form.
I'm not proposing developers check in broken code -- but I also argue that there's a big difference between broken code and incomplete code. Isn't it possible, perhaps even desirable, to write your code and structure your source control tree in such a way that you can check your code in periodically as you're building it? I'd much rather have empty stubs and basic API skeletons in place than nothing at all. I can integrate my code against stubs. I can do code review on stubs. I can even help you build out the stubs!
But when there's nothing in source control for days or weeks, and then a giant dollop of code is suddenly dropped on the team's doorstep -- none of that is possible.
Developers that wouldn't even consider adopting the old-school waterfall method of software development somehow have no problem adopting essentially the very same model when it comes to their source control habits.
Perhaps what we need is a model of software accretion. Start with a tiny fragment of code that does almost nothing. Look on the bright side -- code that does nothing can't have many bugs! Test it, and check it in. Add one more small feature. Test that feature, and check it in. Add another small feature. Test that, and check it in. Daily. Hourly, even. You always have functional software. It may not do much, but it runs. And with every checkin it becomes infinitesimally more functional.
If you learn to check in early and check in often, you'll have ample time for feedback, integration, and review along the way. And who knows -- you might even manage to accrete that pearl of final code that you were looking for, too.
| [advertisement] Peer Code Review. No meetings. No busy-work. Customizable workflows and reports. Try Jolt Award-winning Code Collaborator. |
Posted by Jeff Atwood View blog reactions
« The Perils of FUI: Fake User Interface Deadlocked! »
Reminiscent of atomic coding: http://www.bitscribe.net/screencast.php?cast=atomic
Jonathan on August 21, 2008 06:07 AMA nice article, but it applies mostly to centralized version control systems. With DVCSs like hg and git you can pretty much have your cake and eat it too. You can check it broken code as much as you want so that the history is recorded and you can revert when you want and then when you feel the code is good and won't break the builds, you just push it upstream.
Alisic on August 21, 2008 06:08 AM> You can check it broken code as much as you want so that the history is recorded and you can revert when you want and then when you feel the code is good and won't break the builds, you just push it upstream.
I agree that "painless" merging is absolutely a strength of DVCS but this is also possible via branching and merging in centralized systems:
http://www.codinghorror.com/blog/archives/000968.html
The flip-side of people that won't check in are people that won't "get latest", and I think one tends to lead to the other. Small frequent check-ins and getting latest help build a rhythm of steady incremental change (dare I say it...improvement). A dozen or so people who don't check in for weeks on end leads to integration pain and broken builds, which leads to a fear of "getting latest" and bizarro bugs you SWORE you fixed weeks ago lingering on other people's systems. This in turn makes things harder to test and verify - Joe just updated XYZ and we're seeing that old bug 9876 again on his machine. Hmmm - is it a regression? Was 9876 never fixed "properly" in the first place? Or is Joe just running a cobbled together frankenstien build of code that is hopelessly out of date?
JosephCooney on August 21, 2008 06:11 AMThis is once again an argument for distributed SourceControl (git, mercurial...). You can check in as often as you want (to your local repository) and gain the full versioning over everything you do. You can share it with your coworkers (push or pull the changes).
And once you have running code, you push it into the repository (or branch) that the continuous integration server uses.
The more I use (and get used to) distributed version control, the more I like it! I don't see any reason to continue using svn/cvs based versioning and I can only encourage everyone to try it out! There is a learning curve (more advanced concepts give more opportunity to failure), but it is worth it.
Mo on August 21, 2008 06:13 AMWhen I'm developing a device driver, I often go days without a compiling build. You want me to check that in and break everyone? That's insane.
The only good reason I can see for checking stuff in before it works is so that I personally can go back a revision or two if I delete something I shouldn't during development. However, you don't really need revision control for that. A good editor will keep around old versions for you too.
T.E.D. on August 21, 2008 06:16 AMOh. I am in agreement with the "Git fixes this" comments too. Unless you are now saying that with Git, I should *merge* my changes every day. Then you are back into crazy talk.
T.E.D. on August 21, 2008 06:19 AM>I'd much rather have small fragments checked in periodically than to go long periods with no idea whatsoever what my coworkers are writing.
Really? Shocker... !! What gems can you pass onto us next?
@T.E.D. :
If you use something like SVN, you should have a *branch* for your code in progress. Commits to that will not break the main build, which is in the *trunk*. That way you're able to do lots of little incremental commits without affecting anyone else. It's great to be able to "revert" back to your last checkpoint, if you ever do something to screw things up in a major way (global search&replace, etc.). When you're code is finally finished, you then merge your branch into a local copy of the trunk. If it compiles,runs,passes tests, etc. then you commit the new trunk (and create a new branch for your next project).
Yup - I'm working on a project just now where one of the developers had source code checked out for more than a week and then, instead of checking in, pulled an old copy down from the server and that was all the work gone!!! Which meant we missed a test cycle and had to wait for the next one. This was also on a laptop, so if they had lost that then, again, it would all be gone. You can never check code in too often, in my opinion.
I go on the approach of "Always check in the code, even if it doesn't work or isn't finished, you'll still be able to change it later!". Then there's no chance of loosing the code. When it's all done and everyone's happy with their piece of code and each piece has passed unit testing, then we create a version and release it for business test. Then you only every roll-back to working versions - by working versions, I mean versions that actually compile and can be installed - obviously they may have bugs, but they'll be fixed the next time we do a version.
Charlie McMahon on August 21, 2008 06:36 AM"Perhaps what we need is a model of software accretion. Start with a tiny fragment of code that does almost nothing. Look on the bright side -- code that does nothing can't have many bugs! Test it, and check it in. Add one more small feature. Test that feature, and check it in. Add another small feature. Test that, and check it in. Daily. Hourly, even. You always have functional software."
Test Driven Development
Nikos on August 21, 2008 06:37 AMWhere have we heard this story before? Yeah, I remember, test-driven development and eXtreme Programming. Good to see those ideas spreading amongst .NET practitioners too.
Ivan Milosavljevic on August 21, 2008 06:40 AMsounds like Test Driven Development
http://en.wikipedia.org/wiki/Test-driven_development
or maybe the test-first concept of Extreme Programming
http://en.wikipedia.org/wiki/Extreme_Programming
> I agree that "painless" merging is absolutely a strength of DVCS
> but this is also possible via branching and merging in centralized > systems
Yeah, people talk about how that is a big feature of distributed systems, when it is actually just a nessecary component of them. There's nothing about distributing a revision control system that magically makes them merge better. But if they didn't, then they'd be a real PITA to use.
What's really different about distributed systems is that you have much better control over who sees what changes and when. That makes it much more difficult for a single sloppy developer to break everyone's build. It also means you can set up complex repository relationships that better match your internal working relationships.
Also (with Git at least) its much easier to work from remote locations with slow or bad network connections. That's because it only needs to communicate when doing merges.
"Perhaps what we need is a model of software accretion. Start with a tiny fragment of code that does almost nothing...Add one more small feature...Add another small feature..."
In Ye Olde Days we called that UNIX.
... small pieces, loosely joined ...
Paul Souders on August 21, 2008 06:44 AMWhat you are really talking about, is continuous integration.
The problem with checking in "too often" is that your development team end up consuming a fair amount of resources on these auxiliary things. While a necessary aspect, we still want to focus our attention on the problem at hand.
Branching in current state-of-the-art centralized SCM is cumbersome and impractical so not really an option, distributed SCM solves this problem as many pointed out.
Casper Bang on August 21, 2008 06:50 AMAs long as you have mechanism in place to enable *cumulative* code reviews (i.e., a feature spanning many check-ins), then I think there's nothing wrong with lots of little commits. But problems arise if you have lots of uncontrolled check-ins and no way to see the "big picture" for review once things are done.
We're using Crucible for this right now, which is working out great.
Vance Vagell on August 21, 2008 06:54 AM> If you use something like SVN, you should have a *branch* for your > code in progress. Commits to that will not break the main build,
Yes, I know the theory. However, that's not the way one needs to work, its jut one way to do it.
Let's take a step back here. The chief purposes of revision control are:
1) Coordinate changes between multiple developers.
2) Keep track of old working versions of the program so we can rebuild them or fix bugs in them if need be.
Two auxillary things people often end up using revision control for are:
o Keeping track of old non-working versions of source files, just in case you need to start over from that point for some reason.
o A file backup system (assuming your repository is backed up, and easy to recover when corrupted).
If you want to use your revision control system that way, that's fine. It creates a lot of junk branches and revisions, but that's not really a huge problem for most people.
However, there are other ways to do those things. A good editor (eg: emacs) will keep old versions of files around for you. I have mine set up to keep the last 10 versions. VMS used to do that for you too.
Working out of a directory that is periodicly backed-up is all you really need to do for backups. If you can't do that, then you'll have to periodically copy your work into a place that is backed up just to be safe. Leveraging the revision control system for this does seem sensible. However, if your dev machine *is* being backed up, you really don't need to use your revision control system for backups.
T.E.D. on August 21, 2008 07:08 AMSounds a lot like the Open Source Mantra - Release Early, Release Often. As always, great article (i'm always looking forward to the next coding horror post in my google reader).
http://www.samalamadingdong.com
Sam on August 21, 2008 07:11 AMI pretty much check in whenever I'm at a solid state... not nessassarily done, but not breaking things either. I like having a point I can go back to if I really mess my code up, and I get a warm and fuzzy feeling know that even if my computer goes toes up, my code will still be there. Don't check in code that is going to break any functionality of the app.
Also, somewhat related, I try to check in as often as I can because the merge tool in Visual Studio 2005 seriously sucks. There are times when I need to merge things and it won't allow me to choose the code from which file to take, and other times it lumps a block of code together when I really just want to keep one line out of it, but I'm forced to take either the whole block from one file or the whole block from another. Checking in often, you avoid most of these headaches. And if it doesn't merge right, you don't have that much to do to fix things.
Kris on August 21, 2008 07:15 AMI became convinced of the value of short check-in cycles over a decade ago after watching Jim McCarthy's presentation "How to ship great software on time." But I've worked with a number of people who just don't think that way and no amount of persuasion or policy will change their minds.
John on August 21, 2008 07:19 AMYou know it's funny that you and Damon point out that developers don't check in because they are afraid of breaking the build. I had an experience quite the opposite. We had a developer at my work who wasn't checking his code in, even though he made sure it would never break a build, yet was releasing it to the Dev and Integration servers. When I got the go-ahead to release my code I got latest and pushed out all files (which is standard practice where I'm at), which subsequently broke the build since he hadn't checked in anything he was working on. Thankfully he still had it locally and was able to integrate. He was of the mind that someone might find bugs and he would still have to work on it. We had a nice chat on why he needs to check-in code before he releases to any server.
Arcond on August 21, 2008 07:19 AMNot being able to check something in makes me feel unsafe :p
Whenever it compiles and is filled with // TODO: .... comments inside stubs to describe what I'm going to place there, it's checked in. Makes you go home with a good feeling that it's being backupped, and visible in case you suddenly fall sick
David Cumps on August 21, 2008 07:20 AM@T.E.D: "Working out of a directory that is periodicly backed-up is all you really need to do for backups."
Wow! What a great idea!
I think I can even improve on that, though. You could use a batch file (or shell script) to compress the files and copy them into that directory!
Oh, wait... I've done that before. Back in the '80s.
KenW on August 21, 2008 07:45 AMThis blog really needs a section called "Bleeding obvious"
BugFree on August 21, 2008 07:54 AM> this is also possible via branching and merging in centralized systems
It would if merging wasn't such an ugly pain, an history loser (at least under SVN) and so badly supported.
In other words, it isn't.
> I go on the approach of "Always check in the code, even if it doesn't work or isn't finished, you'll still be able to change it later!".
Code that isn't finished is ok, but code that doesn't work is only ok for small enough values of "doesn't work": if your changes make it impossible for others to work (because you broke compilation, or features, or...) then you don't under any circumstance commit it to the central/team repository.
> Yeah, people talk about how that is a big feature of distributed systems, when it is actually just a nessecary component of them.
Erm... it's both? Just because it's necessary for DVCS to work correctly doesn't mean it's not a feature.
> yet was releasing it to the Dev and Integration servers
How come he was able/allowed to release code that hadn't been committed in an SCM?
> Oh, wait... I've done that before. Back in the '80s.
Actually, Linux development worked that way until they started using Bitkeeper, and in his Git presentation (@google) Linus stated that timestamped tarballs were -- in his opinion -- vastly superior to CVS (or SVN)
Masklinn on August 21, 2008 08:04 AM"software accretion" is how I've always done all my development, that's how it always used to be done back in the day. Checking in at least a couple times a day and making sure my changes are stable (nothing to do with complete) at that time.
Brian Knoblauch on August 21, 2008 08:05 AM@BugFree: Unfortunately it seems the 'bleeding obvious' is not always so obvious to some people.
@Paul Souders: The philosophy in the UNIX world may well be small modules loosely coupled, but a pearl is small modules extremely tightly bound up. A string of pearls, though...
A pearl is also pretty much an oyster's allergic reaction to a piece of grit, so your analogy's mileage may vary :)
John Ferguson on August 21, 2008 08:27 AMHmmm...
- Mandatory Code Review
- Check In Often
Choose one.
I would be curious if someone knows a ways of doing both without crippling team productivity. Thanks.
I blame this (also) on piece-of-shittake source control systems (like Clearcase). Guess what, if commiting is a pain-in-the-behind, people will put it off as much as they can.
James on August 21, 2008 08:51 AM> I think I can even improve on that, though. You could use a batch
> file (or shell script) to compress the files and copy them into
> that directory!
Well, yeah. If its that much work, its probably easier to just check your stuff into a revision control system (that is already being backed up).
However, here the IT folks periodically back up our user profiles. So as long as I work under there I'm getting backed up. What's better, I don't have to *remember* to do it manually.
Ditto with the file versions from Emacs. I don't have to stop and think which versions to save. I don't have to run a special VC command, I don't have to do anything. Thus nothing to mess up.
Plus at the end of it all I don't have a jillion junk versions of files being saved in our production revision control system forever like they are important pieces of code.
T.E.D. on August 21, 2008 09:01 AMAnother post reminiscent of "Code Complete"; BTW, that's a good thing!
Mark Kohalmy on August 21, 2008 09:26 AMWhat DVCS does is make explicit important concepts, like patch review and branching. Branching and merging can be done with CVS and SVN. The trick is, SVN fixed the wrong half of CVS. CVS required a long time to branch or merge; what SVN advertised is really fast branching.
But merging is where the pain is. You want to be able to correctly resolve as many conflicts automatically as possible. This is something that DVCS has to think about and do, because it's the intended use case. Whereas many groups refuse to use branching on SVN or CVS because merge is too painful.
jldugger on August 21, 2008 09:27 AMI hear ya Jeff, but unfortunately where I work, if its checked in, its supposed be through the SOX controls and ready for the next release.
Crazy system I know.
Matt Ridley on August 21, 2008 09:39 AMAt my previous work I used to just 'check in at the end of the day' and considered the CVS repository more of a backup system than a version control system. I didn't have to work on code together with someone else a lot, so it wasn't much of a problem.
However when we changed to svn I read the svn book front to back and got more insight into version control. And coincidently at college I had to use svn for a project shortly after that and we got a class about it. Because I had to work together with other people on the same code I quite quickly picked up good practices and checked in after every feature, bug fix, or whatever small change.
Now at my new work even though I work alone on the code most of the time I still employ this work flow and works really well. Every little fix or change is a revision so it makes restoring and reviewing easier too.
Check in early and often is all very exciting if you're working on new code or working alone, but if you are working on shared code as part of a team, checking in bad stuff will just hurt everyone. I agree hoarding a bunch of changes is bad too, but you have to do enough unit testing to be sure you won't pollute the pond before checking in code if it will affect your colleagues. The balance between getting something working independently first and sharing the burden of getting it working is a tough one.
It helps to design systems such that biggish chunks are independent from one another to minimize the crosstalk - good design anyway - but there will always be core code that's shared, and where you have to be careful before checking in changes...
Ole Eichhorn on August 21, 2008 10:02 AMEver heard of tools like Perforce? You work in a branch, and integrate from the main trunk into yours, so your code is always up to date with main. Then when you're finally ready, you reverse integrate into the main trunk.
Checkin as often as you like, you won't break the main build. Integrating with main doesn't affect the rest the team. And the tools do most of the merging for you, except when there are actual conflicts. Works great.
One of the concepts that I really like to try for is, if you can't check in a working, tested system at night, roll everything back and start fresh the next morning.
It seems frightening, but here are the possibilities:
1) You were nearly done but just couldn't get it to work.
--The next day you will have all that work done in 2 hours, and it will be much better code. Chances are you were confused by the end of the day anyway, and without doubt that showed in your code.
2) You were no where near done.
--well obviously you took too big a chunk, break your task apart better the next day and take on a smaller chunk. You will understand it better and be able to implement it more quickly.
The alternative: At the end of every subsequent day, you end up saying "Damn, now I REALLY can't afford to throw away all this work", and raise the chances of having a difficult merge.
Don't underestimate the power of rewriting your code either. Take the chance every opportunity you get! This is how you learn, grow and also saves HUGE amounts of time the next time someone has to touch the code (which, more often than not, will be before your current deliverable).
Coding quickly, fewer keystrokes, expressiveness, elegence etc.. are all useless crap.
The only things that matter are that you have readable, understandable and DRY code, and this usually only comes through spending some time with your code and rewriting it a couple times.
Bill on August 21, 2008 10:58 AM> Ever heard of tools like Perforce? You work in a branch, and
> integrate from the main trunk into yours, so your code is always up > to date with main. Then when you're finally ready, you reverse
> integrate into the main trunk.
What he's talking about is making sure your changes don't suddenly appear to everyone else fully formed, like Athena jumping out of Zeus' head grown up and in full armor.
For you Perforce, Bitkeeper, and Git users, the way to translate this post is that he is suggesting you make sure to do that reverse integration into the main trunk every day.
Right, because every service and application can easily be broken down into a set of tiny features that don't interact and are all individually simple to test.
I often wonder what world some developers live in, to be unable to imagine a large module that's impractical to partition and even more impractical to unit-test. It's just not that easy... some things aren't functional at all until they're mostly done.
Aaron G on August 21, 2008 11:09 AM> I agree that "painless" merging is absolutely a strength of DVCS but this is also possible via branching and merging in centralized systems.
@Jeff: Yes, but IMHO DVCS make it easier, especially those which include some kind of history rewriting tools, like amending last commit, or rebasing (in Git), transplanting (in Mercurial) or grafting (in Bazaar) branch to a new base.
And with centralized VCS you have to allow creation (and deletion, and renaming) of branches, and develop some convention (or some tools) to avoid conflict in naming branches, e.g. branch namespaces like <login>/<branchname> or <initials>/<branchname> or <login>@<branchname>. All of this assuming that centralized VCS has good support for *merges*; all distributed VCS have it because they have to, while Subversion up to version 1.5 (not mentioning CVS) had very poor support for easy merging branches.
But I agree completely with the *software accretion* model. If beside "check in early, check in often" (or "commit early, commit often") you also follow "one feature per commit", then when there would be bugs in the code it would be easy to find them by history bisecting aka diff debugging. See:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
http://kerneltrap.org/node/11753
http://boinkor.net/archives/2006/11/using_git_bisect_to_locate_bug_1.html
@J.Peterson:
You beat me to it! I learned that pattern when I was a heavy Perforce user. I branched every time I was heading into unfamiliar territory, even if only for a "simple" change. I've long done this as a solo developer, and when client needs would force me to change a "parent" or "sibling" branch before I was finished with my digression, I'd merge back (up and/or) down to bring my exploratory branch up to date.
To me, this working style is essential to any XP-ish development style, since it's a powerful safety net, allowing complete freedom to "refactor mercilessly." I've never worried that a series of refactorings would fundamentally break anything, or that I'd work up a series of refactorings, only to end up "stuck," somehow, and lose the entire series because I wasn't be able to roll back.
I'm a Subversion user now (not because I prefer it over Perforce, but, you know, "because.") With svnmerge (http://www.orcaware.com/svn/wiki/Svnmerge.py), it still isn't "p4 integrate", but it's good enough to keep me using the patterns.
@T.E.D.:
> However, there are other ways to do those things. A good editor (eg:
> emacs) will keep old versions of files around for you. I have mine
> set up to keep the last 10 versions. VMS used to do that for you too.
>
> Working out of a directory that is periodicly backed-up is all you
> really need to do for backups. If you can't do that, then you'll
> have to periodically copy your work into a place that is backed up
> just to be safe. Leveraging the revision control system for this
> does seem sensible. However, if your dev machine *is* being backed
> up, you really don't need to use your revision control system for backups.
Both of these approaches lack some fundamental development benefits I fear you've overlooked.
- These aren't backups. They are commits. They may be commits of works in progress, skeletons, etc., but they are you -committing- and annotating artifacts of your process. These commits offer visibility to other members of your team, a record of your thinking, and, as I mentioned above, a safety net allowing you to roll back to a state you previously explicitly declared "useful."
- When you have a well-appointed development infrastructure, these interim commits offer a basis for automated testing tied to checkins.
No, you might not want to know about every regression you've introduced in the "interim" steps, but if you check in immediately after a successful compile while you continue your thought, you'll be able to check the results of a background test suite. When you have this policy tunable on a per-branch basis, you have a continuous dashboard view into the progress of your work. This is especially important when you're doing a "resync" merge of the mainline or parent branch down into your dev branch.
- Whether or not you run a full unit and regression test suite, the success of a triggered automatic build on checkin after a succesful local build at -least- gives you confidence that all of the necessary artifacts for a build are in the repository, and a clean checkout into a pristine build environment produces results equivalent to those in your local sandbox.
- As a corollary to my "commits" point, above, periodic backups capture a state in time, presumably scheduled, whereas -commits- capture an explicit, meaningful state in your process. Do you care about restoring what you were working on an hour ago, or what you were working on before you broke what you broke?
- Periodic backups, depending on how you implement them, and editor backups, almost universally, still leave the history on your local machine. This means you don't get any of the durability advantages offered by your SCM hosting platform. I.e., if you're on an enterprise SAN, or even a "beefy server," you may get local and remote replication, automatic "copy on write" snapshots of the repository, etc.
- Apropos finding the "broke what you broke" point, adopting this "continuous commit" practice makes "Binary Search Debugging" (http://www.joelonsoftware.com/news/20030128.html) not only easier, but, fundamentally, -possible-.
These benefits do not solely accrue to those using SCM on teams. My rigorous SCM discipline has paid off throughout my (considerable) history as a "lone gunman." We can sometimes be our own worst teammates, and these practices help protect us from ourselves.
> A nice article, but it applies mostly to centralized version control
> systems. With DVCSs like hg and git you can pretty much have your
> cake and eat it too. You can check it broken code as much as you want
> so that the history is recorded and you can revert when you want and
> then when you feel the code is good and won't break the builds, you
> just push it upstream.
That's actually the big *disadvantage* of distributed version control systems -- it is so easy to leave the pack and go off on your own world.
I had been a ClearCase administrator for almost 15 years. In ClearCase, branching is extremely easy, and merges are quite simple. The standard way you use ClearCase is to give each developer their own branch, have the developer do their work, and then merge it back into the "integration" branch.
With ClearCase, developers could base their code on another developer's branch, and deliver their changes to that branch. Or they could checkout their code from the main branch, make a dozen sub branches testing a wide variety of stuff, and then deliver whatever changes they want to the main branch. In other words, you used ClearCase as you would a distributed version control system.
A typical ClearCase shop would have two dozen or so developers. Each one would check out code from the main line onto their branch, and then merge their changes back onto the main line once they were satisfied with their changes. Developers could create as many development branches as they pleased, have branches coming off of branches, and even share branches.
My job as the CM was to keep the developers in line. I ran reports looking for developers who hadn't check in their changes back into the main branch in a few days (we actually had a case where one developer didn't check in any code into the main branch for 11 months!) I also watched what the developers were working on. Were they making changes in class interfaces? Did everyone else know about that? Were two developers working on incompatible changes?
Worse part were release deadlines. Suddenly, developers would merge all of their changes needed for a release and nothing would work. QA was furious because getting everything to work again went against their testing schedule. At one job, the entire QA team quit en mass in protest.
After 15 years of working with "the most sophisticated version control system ever invented", I went to work for a place that was using old fashion, out of date, feature free CVS. It was another shop with about two dozen developers. How, I asked, did the developers create their own branches and do the merging since CVS makes creating branches difficult and merging is poor? The answer: We all work off of the trunk.
I couldn't believe it. What about problems with developers creating incompatible code? What about changes that might break the build? How can two dozen people share the same branch. Even worse, it's the branch that the code is actually released from.
To my absolute surprise, it worked. It worked better than I've ever seen development work. Builds were done whenever a change was made. If the build failed, you had to undo your change and try again later. Developers made small changes and coordinated their work. No one tried rewriting a complete section of classes. QA got builds throughout the entire life cycle, and the software always worked.
By using a centralized source repository, the developers were forced to work together. They talked to each other, no one tried to bite off more than they could chew. Everyone checked in their code almost every day.
There was a recent study on traffic lights vs. traffic circles. On the face of it, traffic circles would seem more dangerous because of all the merging. But there are actually fewer accidents in a traffic circle than a compatible traffic light. Drivers in traffic circles have to be careful. They watch what is going around them and are more observant. Traffic lights -- besides causing frustration -- make people more careless because they simply assume that they have the right of way.
The same is with centralized version control systems. Developers are forced to take small steps and work closely with their fellow developers. You can't go off in a corner and make a masterpiece of programming art. You have to work with everyone. Everyone sees everything you're doing. You can't hide.
So, why does Linus Torvalds prefer using something like git -- a decentralized version control system? Because Linus never made a promise to a customer about what the Linux kernel will contain. If you make a large set of massive changes in your own private repository, good luck getting that incorporated into Linux.
Distributed version control systems off great power and flexibility, but can cause a breakdown in the discipline needed to produce large commercial projects with hard release requirements and specific deadlines.
David W. on August 21, 2008 11:40 AM"I'd much rather have empty stubs and basic API skeletons in place than nothing at all. I can integrate my code against stubs. I can do code review on stubs. I can even help you build out the stubs!"
I tend to work by TDD, which sounds like the opposite of what you propose. The first thing I want to write is a test that fails, not an API skeleton (which is probably going to be wrong, anyway). Integrating against a skeleton API is useless, because any skeleton API today is going to be wrong tomorrow when the code is actually written. It wastes your time today (working with stubs that will go away), and wastes my time tomorrow (when I have to refactor your integration points to make any changes to my code).
Put another way, adding mass to a system does not make it faster.
"Developers that wouldn't even consider adopting the old-school waterfall method of software development somehow have no problem adopting essentially the very same model when it comes to their source control habits."
It sounds like you're proposing the more waterfall-like system: type in the interface, and then fill it out. Besides, real waterfall has cycles (I've been there!) on the order of months. Waiting 2 days for my interface to settle down is a couple orders of magnitude faster than that.
Also, the people you quote don't mention "a day" -- they simply say "often". It feels as if you're putting words in their mouths.
Do you have actual problems with people waiting "more than a day" to check in? I'd be curious to hear about your team: how many people total, how many working on a feature, etc. I simply can't imagine how checking in code every 2 days gives you "serious integration headaches".
tc on August 21, 2008 11:42 AMI'd check in hourly - but the problem is, that our trunk is build by an automatic build server and many of my check-ins would lead to either build failures or builds that are horribly broken when being used. And there is no way to tell when a build is a good one or a bad one.
There is no risk of losing data if you make sure your code always has an external back-up. Despite the fact, that the chance of having a data loss is very small nowadays.
Okay, you should maybe not forgo a check-in for very long. If you add new features, like you said, you can already put in some stubs. But if you touch previously working code and temporarily bring it into a non-working state (and pretty much describes my daily work), checking in too often will not only cause build regression for several other developers, as our tool is very low level, it might cause everything from app crashes to system freezes if they work with my not-ready-to-use-yet code.
Mecki on August 21, 2008 11:54 AMBTW, I agree that you should check in "often", and most people (including me, probably) don't check in often enough. I simply think your idea of "more than once a day, regardless of whether you have a useful complete tested feature-unit of code" is too much.
When I need to communicate with somebody else about the interface a feature should have, we've got whiteboards and email and lunchtime for that. Checking in empty stubs just clutters up the repository history.
tc on August 21, 2008 11:57 AMI've seen classes in libraries that were versioned by name. I.e. class Foo is a typedef or other facade for Foo_1, Foo_2, or Foo_OtherImplementation. This means that you can completely redo the class without old code breaking. This same strategy can replace VCS branching if you don't like you VCS's branching.
(But really, I've never found dealing with branches to be as hard as some people seem to find them... as long as people are conservative and sane about editing the source code files and don't do dumb things like redo all the intentation or whatever..)
reed on August 21, 2008 12:03 PMI like to follow this rules and have always tried to do so and encourage the same from others. It doesn't always go without conflict from other practices, of course.
With a new team I've been working with for a little under two weeks now, I branched for my first task (combining three search paths into one) and got to work. I found my way around a new code base and got my work done in good time. I thought I was in a good place when I handed it over.
Turns out one of the other developers was meanwhile working on a set of incompatible changes to the original search path. When he finishes, now I have to try to integrate what he did into the new path I created.
We both checked in constantly, and followed all the good rules, and neither of us were the wiser.
Calvin Spealman on August 21, 2008 12:24 PM"The same is with centralized version control systems. Developers are forced to take small steps and work closely with their fellow developers. You can't go off in a corner and make a masterpiece of programming art. You have to work with everyone. Everyone sees everything you're doing. You can't hide."
I agree with David, centralized source control works great if you have good communication with your mates. We almost follow the practice exactly as he has mentioned.
Azlam on August 21, 2008 12:40 PMAs I like to say: It’s better to have a broken build in your working repository than a working build on your broken hard drive.
People who don't commit often--or don't understand why they would need to--usually have a less than ideal approach to writing code in my opinion. If you naturally split your work into smaller parts, committing your changes after each part is the logical thing to do. However, if you do a million things at once and code away for three weeks before you even have anything that compiles, there is something fundamentally wrong with the way you are working. Simply telling those people to commit often won't solve the underlying problem. They need to change they way the work in a more fundamental way. In my experience, this is usually caused by a lack of conceptual understanding for version control systems and the reasons they being used in the first place. If someone doesn't see the point in keeping the version control system up-to-date, they simply won't do it.
Personally, I prefer to follow these basic guidelines when working with version control sysems:
1. Put everything under version control.
2. Create sandbox home folders.
3. Use a common project structure and naming convention.
4. Commit often and in logical chunks.
5. Write meaningful commit messages.
6. Do all file operations in the version control system.
7. Set up change notifications.
I wrote down my thoughts on these guidelines in my blog a few weeks ago:
http://blog.looplabel.net/2008/07/28/best-practices-for-version-control/
When it comes to the issue of breaking other people's build by committing "too often" or "too early", this can easily be solved by using branches. There is no reason why every developer can't have their own branch if they want to or need to.
If you DO have a team environment, and ARE supposed to be using a repository, I whole heartedly agree... anyway, I skimmed your article.
I hope to get subversion setup ASAP in my small shop. Life has been sucking lately, without it.
JasonMichael on August 21, 2008 01:12 PM
If you DO have a team environment, and ARE supposed to be using a repository, I whole heartedly agree... anyway, I skimmed your article.
I hope to get subversion setup ASAP in my small shop. Life has been sucking lately, without it.
JasonMichael on August 21, 2008 01:13 PMJesus Christ I just pooped my pants in your source control.
Ibod Catooga on August 21, 2008 01:38 PMI support the idea of "check in early somewhere" with both my hands and both my feet and my dangling monkey tail. But I have reservations about "merge early". Why? Because it limits testing. If everyone checks in whatever they feel like in the trunk, you can only test the code after the code freeze. Before the code freeze, there is no "good state" of the code, and you're not only testing your changes, you're also testing everyone else's changes. (I know that your unit tests will test your code, but they will not test how it interacts with everyone else's code.)
katastrofa on August 21, 2008 02:10 PMThis sounds like a nightmare in the environment I work in. We're using SVN and I'm pretty happy with it. The problem is we often have to merge a new feature or change to another, or multiple branches. The way I do things now, I make one big atomic commit once the code is perfect, then when I have to merge it into two other places, it only takes me a couple of minutes thanks to atomic commits. One new feature equals one commit. Now imagine if I had to merge 30 tiny changes to two branches? 60 extra merge & commits. Sounds like a late and miserable night. I'll sometimes get three requests to merge code to other branches in one day. Now imagine I each of these features/fixes constituted 10 to 50 individual commits. If they're not sequential (and they won't be) I'm looking at an average of 120 individual dry run, merge, and commits. Often across multiple projects and branches.
NO THANK YOU.
T on August 21, 2008 02:22 PMI once had a disagreement with The Jerk, some newbie wannabe manager with a still dripping diploma from somewhere. The Jerk was trying to get all pompous and tell some younger guys that they must check in everything each day. "It was policy. As long as it compiled we're all good."
Not sure where The Jerk pulled that rabbit out of, because it couldn't be farther from the truth. I think he then tried to misquote McConnel, but I own Code Complete too.
My response, tempered by some years of experience, was:
- break jobs into as small a chunk as possible that still makes sense and produces useable code.
- if it doesn't work, don't check it in.
- if it's not cleaned up, don't check it in. (commented correctly, good names, etc)
- merge from the main trunk at least 3 times a day, if not more.
- merge from the trunk before you check in
- communicate a lot, especially if you're editing the same classes as someone else.
- if you've been off the radar for a week, you're taking far too long.
The Jerk talked about getting hit by a bus, etc. I said who cares about code if a guy's in hospital or there's a war, we've got better things to worry about than a few days of lost code.
I feel the pain. I do. I understand. I get scared. I want to check in early and often, but source control isn't your personal unlimited undo. When I look at history I don't want to see rubbish. I want to see snapshots of working code.
Don't hold back, and don't go dark, but use your f'ing head. It's far better to locally integrate multiple times per day and merge back after a few days because guess what, no surprises.
Some things just take time, and my driving rationale was to be able to ship anytime. If you always try to keep the code in source control golden, it staves off the effect of code rot. If you check in daily because The Jerk tells you to, then you start taking shortcuts, and you forget things. You start to not edit other peoples code because maybe they're still working on it.
FYI We did TDD and used Cruise for continuous build, and tried to stick to an agile-like process, and typically it would be multiple features per day checked in per person. You sometimes need to change the process of feature granularity to solve forest/tree problems elsewhere.
someguy on August 21, 2008 02:34 PMEven though this seems like common sense, I unfortunately run into developers all the time that don't check in at least once before going home at the end of the day. So although many of us take this stuff for granted, and may even want to give Jeff a hard time about this blog post, there are many people reading this that probably think it is okay to go days without checking their code in.
Jason Bunting on August 21, 2008 03:05 PMWhether you use DVCS or a centralised system with branching, you can check in code that doesn't even compile, yet alone run or even pass its tests, and not break anything because noone is forced to get a copy of that code.
But if you pick either option, then people *can* choose to take a look any time they choose to do so, and you can put your code into the "main" branch any time it is compiling and working.
If you can't do distributed and you can't do branching, well, ok. Sometimes it's important to remember that a "rule" is actually just a guidline and carries the implicit assumption that you won't be a deliberate moron just so things will go wrong and you can blame the guy who wrote the rule.
And your repository history is a pretty poor system to use for tracking your list of new features for the marketing department. Sure, you're "embarassed" that the record of the two days you spent not seeing that semicolon are recorded for posterity, but if people are going to use your repository history as a weapon, you've got much more important issues to worry about - such as updatign your resume.
Here's my little bit of code to check in, and then add to:
C (this is a comment in FORTRAN, but only if the 'C' is in col 1)
Recently I got into the habit of typing "hg init" as the first thing when starting a project, even when the "project" is a small script I'll probably spend an hour coding and use a dozen times before throwing away.
(that's mercurial command for "create empty repository here")
Nicolas on August 21, 2008 03:27 PMT what the hell are you talking about? In SVN you merge revision *ranges*, not individual changes.
Nicolas on August 21, 2008 03:46 PMWho said anything about the repository being something for marketing?
It's death by 1000 cuts if you have to look at source control changes and determine if the code is meant to be done, or it's just essentially a checkpoint mid-edit. I'd just rather know that if it's checked in, it's supposed to work, and is supposed to be done. Sure it's not always the case, but it's better that nothing if that's the goal.
someguy on August 21, 2008 03:54 PMCommenters are mixing concepts up. Don't check in broken code. Don't wait until you're "done." Just check it in when it's close to how you want other coders to see it. When your class is stubbed out. When your functions are declared. When you've implemented a sweet ass method... This way, your coworkers can see your though process and the direction you're heading.
Someone brought up branching. I haven't worked on anything big enough t o need to branch. Do other contributors actually look at 'your' branch? If not, then this doesn't help, because when they check out the latest version.. your changes aren't there.
I've actually worked with people in class who have checked in their code after not updating for a week- then run off to class. *breaks keyboard*
Jordan on August 21, 2008 05:54 PMA great reason to turn off check-in policies in TFS. It's an automated roadblock on the last mile of your journey. Why should I care if your Code Analysis rules don't pass? Let me check in!
Michael L Perry on August 21, 2008 08:45 PMBranching, shelvesets, checking in often, Get Latest often...
... none of it matters. No, truly.
What's important is *communication*.
The problems with source control happen when developers don't communicate between each other, as peers and professionals, about what they're doing.
Whether the checkins happen hourly, daily or weekly is irrelevant. What's important is that developers talk to each other.
Jeremy on August 21, 2008 10:34 PMI don't usually check in stuff that is not completed.
I just shelf it, which meand that it stores it in version control but doesn't include it in the version.
Atleast Team Foundation has this feature, and I'm pretty sure that other version controls also have it.
Jeff, I couldn't agree with you more. I also agree with most points in Anders Sandvig post. Most developers tend to treat source control only like a backup system, which is certainly not the real purpose of it.
- Ritesh.
Ritesh Rajani on August 21, 2008 10:52 PMEarly and often can be subjective. Does often mean 5 or 50 times a day. Codes check in frequency is subjective and should be base on case by case scenario. In a team, if one developer is working on something that the other team members depend on, then the codes should be commit once is ready for use. If there isn't any dependency, then the frequency can be cut down to twice a day.
Imagine, working in a team of 10 developers, and everyone checkin the codes hourly, does that also mean I have to update my codes every hours too? Then probably most of the time will be spend on merging and integrate the codes. This can also break the momentun of actual development work.
Lastly, a big project that takes 5-10 min to build, will result in most developers waiting for the build to complete and ensure everything is working before they want to integrate and commit their codes. So will everyone be contenting to commit their codes upon a sucessful build?
http://techrockguy.blogspot.com/2008/05/source-control-with-continuous.html
TechRockGuy on August 21, 2008 10:53 PMYour basic premise is hard to argue. However, in reality checking in as often as you advocate just isn't a reality for the software projects that I've worked on. Sure, you can check in code that compiles, but if it doesn't work, it could prevent someone else from testing their feature. Checking in in the way you advocate requires a lot more communication as team size grows, which in my opinion isn't a great use of time. Plus, IDEs such as intelij have a local repository that is automatically built, so you do no need to check in until your feature is complete. I'm a fan of your blog, and I agree with most of your posts, but I just can't get behind you on this one.
John Grimes on August 21, 2008 11:03 PMWhatever value lies in this approach does not lie in particular tooling. Git certainly does not "solve" the problem; as long as the changes aren't merged, you can still get conflicts or worse yet, difficult to track down semantic errors. If you can use this way of working (and sometimes it's just not possible), then you have to actually share the changes with the other people working on the same codebase - nothing less actually achieves much. It's not so much your committing, as it is their checking out of your changes (or a testers checking out of your changes) that provides the safety net.
Eamon Nerbonne on August 22, 2008 01:06 AMThe most funny thing is many things go away from us only for not checking things often
Doppstadt on August 22, 2008 03:37 AMI agree I agree. I usually set up a personal subversion for me to use to cover my butt and commit my source to the "enterprise" version control system when ready to integrate.
Regards!
JMinadeo on August 22, 2008 05:28 AMYou really love code complete dont you. I love it too >.<
Pace on August 22, 2008 05:52 AM> These aren't backups. They are commits. They may be commits of
> works in progress, skeletons, etc., but they are you -committing-
> and annotating artifacts of your process.
I agreee. That's what I'm saying I see no value to.
> These commits offer visibility to other members of your team, a
> record of your thinking,
Perhaps you'd care to elaborate on why this is helpful? Perhaps a situation where you've seen it come in handy? Personally, I find my fellow developer's *working* code to be hard enough to read. I can't imagine ever wanting to troll through their non-working code.
> and, as I mentioned above, a safety net
> allowing you to roll back to a state you previously explicitly
> declared "useful."
Yes, but the way I do it is a much *better* safety net. I get *every* old version of my file saved automaticly, whether I thought it was important at the time or not. This is better because we aren't allways the best judge of when we need to save off a fallback position. When you need to go back, you had to have guessed right (and thought to check in at the right time). I have it no matter what.
> When you have a well-appointed development infrastructure, these interim commits offer a basis for automated testing tied to checkins.
That might be nice. However, my environment cross-compiles, and I'm typically working on device drivers. There's no way to create useful automated tests.
I'm still not convinced it would be that useful either though. If I thought my code was good enough to pass all the automated tests, I'd probably be checking it in anyway. Alternatively, if I know its not going to pass, wouldn't it just cause problems to check it in and break the tests?
The issue I don't see anybody addressing is what to do about those times I work days, or even (occasionally) weeks on a load that doesn't compile. Perhaps I'm just slower than y'all, so this only comes up for me?
T.E.D. on August 22, 2008 07:05 AMI do recommend use of developer branches if you're using a centralised version control system. (We use Perforce.) That way it's always safe to check code into your branch, because you know it won't cause problems for anyone else. It also makes it easier to control how others' changes are introduced to your code. If you have a bunch of files open for edit in the trunk and somebody drops a huge set of changes on trunk, it can be very difficult to sync and resolve those changes with your changelist, and your changelist becomes quite complicated. However, if you're working on your own branch, you can check in your changes when you're finished, then handle the integration from trunk into your branch cleanly and separately, before finally (and trivially) propagating back to the trunk.
Weeble on August 22, 2008 10:36 AM@T.E.D.
"The issue I don't see anybody addressing is what to do about those times I work days, or even (occasionally) weeks on a load that doesn't compile. Perhaps I'm just slower than y'all, so this only comes up for me?"
The idea is that you are not supposed to work for weeks without being able to compile.
My solution would be to make the code compile more often, i.e. by splitting the work into smaller parts. Note that a successful compile does not mean that it has to do anything meaningful, you could implement stubs and empty interfaces for the pieces that don't have any useful code yet.
When working with auto-compiling environments, such as Eclipse, this is not really an issue most of the time anyway.
Anders Sandvig on August 22, 2008 11:20 AMI just use a nightly backup script that copies any files marked "modified compared to the repository" to a shared network location that is in turn backed up nightly to permanent media.
Personally, I find the annoyance of having to fight against the constant stream of implementation bugs of half finished code to be a perpetual headache.
I have no problem with people checking into a repository branch, but checking tiny little changes into the trunk causes everyone else to have to constantly checkout, resolve conflicts, and rebuild their code and if it is within the section of code that everything else is dependent on, this can cause a huge amount of time to be spent in almost perpetual rebuilding of the entire source tree. Not very pleasant.
Just saw The Shining again last night. There wouldn't have been any problems if Jack Nicholson checked in his writing every day, then Shelley Duvall would have noticed that it was all "All work and no play makes jack a dull boy." before they were snowed in.
N. Velope on August 22, 2008 11:39 AMPeople should checkout Accurev (the company that Damon Poole is CTO of). They do a pretty cool source control system that I've been using for about 6 years. They do an awesome job of making merging and parallel development for many users on the same code base dead simple. I've used a lot of ClearCase and Perforce in the past, and I've become a big Accurev fan.
Chris Boran on August 22, 2008 12:07 PMThe Accretion metaphor is McConnell's idea (page 15, Code Complete, Second edition), Jeff forgot to link/mention it this time.
Locivars on August 22, 2008 12:12 PM> Note that a successful compile does not mean that it has to do
> anything meaningful, you could implement stubs and empty interfaces
> for the pieces that don't have any useful code yet.
I suppose I could see that for new work. The only real problem I have is that when I'm working on a big new bit of code, the classes (and thus file names) are in a rather constant state of flux. Files are guaranteed to get renamed or merged, and sometimes a file with the same name might even reappear later with different contents and function. With Git that might not be too big of a deal, but that would give our current revision control system fits.
But that's for new development. The other common issue I have with things being broken for a long time is when I'm refactoring really messy code. Typical activity is to split a humongous file into two or more smaller files, and then spend the next several days trying move the various other little bits around so that the result will compile.
I really like the concept, and found this short blog here very easy to picture the difference between trying to merge often with branch-based tools and a tool like Accurev:
http://daveonscm.blogspot.com/2007/09/agile-branches-vs-streams.html
Next little gem to check out is multi-stage continuous integration.
Mica
Mica on August 22, 2008 01:20 PM>Files are guaranteed to get renamed or merged, and sometimes a file >with the same name might even reappear later with different contents >and function. With Git that might not be too big of a deal, but that >would give our current revision control system fits.
>But that's for new development. The other common issue I have with >things being broken for a long time is when I'm refactoring really >messy code. Typical activity is to split a humongous file into two or >more smaller files, and then spend the next several days trying move >the various other little bits around so that the result will compile.
Then you should take a really hard look at changing your version control system!
Nowadays it's easier than ever...
Locivars on August 22, 2008 01:20 PMI don't think that DVCS is the entire answer. Part of the point of the post is that the changes are made available to other developers. With DVCS you will still need to check in often to the main repository. Obviously if you are working on major build breaking changes then the disconnect from the main repository is welcome.
BTW, That is where DVCS shines in my opinion. I haven't quite gotten a handle on why we aren't all using DVCS: it offers all the benefits of traditional VCS plus some of the coolest features to ever come to version control.
In either case, new development or build breaking changes, a *TEAM* will cope because they are a team.
Michael C on August 22, 2008 01:56 PMJeff,
Thanks for the mention. Looks like you really stirred up a hornets nest! You may be interested in reading some of my latest thinking on integration patterns in a post on Multi-Stage Continuous Integration: http://damonpoole.blogspot.com/2007/12/multi-stage-continuous-integration.html .
Cheers!
Damon
Damon Poole on August 22, 2008 02:08 PMBoth myself and my team are fans of the daily checkin. Saves us headache, and we all know what everyone is working on. It's not a big hassle, and I definitely think it's worth it.
Joe on August 22, 2008 02:58 PMAs with anything, check-in early and often is just a best practice but isn't applicable to all situations. The example of the device driver is a good one. Likely you'll be the only developer working in this particular piece of code, therefore early and often merges of your code with the main branch isn't that useful. The real benefit is a situation where multiple developers may be working on the same piece of code. In this situation it's much less painful to merge your LAG code with the main branch in small doses than to do it all in one big lump. It's a lot easier to manually merge a few small conflicts than an entire file of conflicts.
The other key think here is having a continuous build that you can execute post get but pre check in. In our current project our check in dance goes like this:
1. Do some work.
2. Get latest
3. Execute a local one click build which compiles, configures, runs tests, etc.
4. On a successful build, check in.
5. Any check in initiates another automated build in the background and tests against an integrated environment.
6. Handle broken builds IMMEDIATELY!!
With an SCM that makes branching very easy (e.g. git, though this is orthogonal to distributed vs centralized), it's even possible to commit broken code, just in another branch.
In several projects I've worked on, experimental side-branches frequently *didn't* build for a long time, but it was possible to check them out and see what people were working on and thinking about.
Peter Burns on August 22, 2008 04:41 PMGee. We used to call that stepwise refinement.
lee doolan on August 22, 2008 06:40 PMHi Jeff,
I'm guessing that you didn't mention branches because YOU are in charge of the release cycle, and therefore it does not matter what you want to check in to the trunk (because you can assume that you will make time to fill-things-out before release).
Your advice can only hold for larger teams, or teams where there are multiple parallel dev streams (e.g. feature A releases half way through feature B development) if branches are in use (or DVCS).
You have said before that many developers do not understand simple VC systems, let alone Distributed ones -
@MattRidley, I suggest that you need to do your check-ins to branches, prepare a merge / intergration test environment for compliance purposes, and then check-in that to the trunk once all the tests have passed.
Nij
Nij on August 23, 2008 02:44 AMLook on the bright side, the development "lead" on an adjacent team to us has decided that the right Clearcase strategy is to have everyone develop directly on main, doing unreserved checkouts and merging manually when we're finished.
Mike G on August 23, 2008 08:01 AMJeff,
Thanks for another posting on version control. I enjoy your experiences with it and because you have so many folks posting back on the topic that I often learn something from that as well.
For my 2 cents on the matter, yeah I agree that check-in early and often is a *sound* policy. I also believe all work should be done in branches with some designated "coordinator" being the guy/gal who has final say so and does the merging from the branches into master. I've seen lots of folks get burned big time on a project, because some code cowboy on a Friday late afternoon thought some new break-the-api-but-it's-so-cool-dude!!-feature checked it into the trunk, left for the day, and hosed everybody.
I started out with subversion about 3 years ago and liked it. It was easy to learn, it came with any current linux distro, it worked pretty fast. It was easy to make branches. The only problem I could see at the time was the merging of branches. That could be technically hairy. Hairy enough that my implementation of merges was:
a) Remove old trunk
b) Rename working branch as trunk
That "works" sorta if you're the only guy on the project, but clearly it's not the way to do branches. I eventually figured out how to determine the start and end of branch that I wanted to merge with trunk, but still it's was awkward and it was brittle. Any histories changes in the branch involving files/directories that didn't exist in the trunk could bork the commit. I would end up to have to merge the branch in small chunks. This clearly was not the way to do merges.
A buddy who I had introduced to git (he was new Debian) fell in love with it and used it in all his home projects got on my case to try it out which I did in January this year. Coming from subversion getting used to git was a learning curve. There are *so* many commands with git (~180 vs. ~40 for subversion) that I couldn't grok it all. Finally what rescued me was realizing that I only use a small subset of the commands anyway to do work (init, clone, diff, log, status, commit, checkout, branch, stash, pull, remote, add, rm) and that I only needed to focus on the fundamentals. Once I got those down pat life was good.
Git comes with a graphical visualizer called gitk that makes it easy to see what's going on with the branches and who branched from whom, and who merged with whom, and what the current status is, etc.
So why is git better than subversion for me:
1) It's fast. Turbo fast. Ridiculous fast. Subversion isn't noticeably slow, but git is practically instantaneous in comparison to subversion. A check in with git takes on the order of 1 millisecond. With subversion, 1 or 2 seconds if it's a local filesystem, longer if you're dealing with a network based repository.
2) The decentralization. Holy parallelization batman!!!! No more stomping on each other's toes. No more code cowboy committing crap into master branch and borking everybody. The only copy of the repository that is canonical is the coordinator's. He/She is the person who *pulls* from the developers, sees what they like, if it's good it goes into the his/her master branch. If it needs touch up, the coordinator does it (or better yet tells the offending developer to fix it or it doesn't go in.) The developers then pull from the coordinator's master branch and merge with their own. Then merge their local master with their development branch.
On a large project you can't do that with a centralized repository for several reasons: bandwidth being sucked up, if the centralized repository goes down, no checkins, merges, checkouts, nothing. You can still work on your local copy, but you can't commit the history of what you're doing, and there is always the risk of some code cowboy committing into *your* branch something that could bork your build, or worse, code cowboy commits into master, you pull from master and *everybody* is hosed.
Torvals in his google talk video (http://www.youtube.com/watch?v=4XpnKHJAok8) claimed you should never underestimate the effect of how fast being able to commit, checkout, merge, etc. had with development. To be honest I was skeptical of the claim. But after spending 8 months with git I see his point. Damn, git is fast!!!
Jeff I urge you to give git a try. I think you'll be surprised how good it is even if you believe that there's not much need on a project with just a few folks, I'm telling you that git really is that good.
I wanted to give you some data points to give you an idea why decentralization is *better* than centralization for development purposes. Here are the data points to look at:
Microsoft starts to use scrum to help development (late 2005):
http://www.eweek.com/c/a/IT-Management/Microsoft-Lauds-Scrum-Method-for-Software-Projects/
I don't know if it was applied to Vista or not. But here is a Microsoft developer's perspective on why Vista was delayed and why it didn't have the features folks wanted:
http://blogs.msdn.com/philipsu/archive/2006/06/14/631438.aspx
Unfortunately he doesn't mention which version control software they used. Several years ago someone at Microsoft had posted the workflow for how Windows NT (2000? XP?) kernel developers worked with version control. I tried to remember the url but unfortunately I can't and I haven't been able to find it on google either. I do remember the version control was something Microsoft had developed for their internal use and it was definitively centralized version control. Some of the highlights were things like how developers only had limited access to the source code, commit had to go through a gauntlet of management, builds were stunning -- instead of just recompiling the parts the developer had written and unit testing that part, the *whole* source code for the Windows kernel *and* the Microsoft products had to be recompiled and all tested. They had like 4 build servers and turn around time between submission and a response was ridiculously long -- something like 2 weeks I think.
Here is how development on Windows 7 is being done:
http://blogs.msdn.com/e7/archive/2008/08/18/windows_5F00_7_5F00_team.aspx
Lastly as a comparison, here's a presentation by the USB subsystem maintainer for the Linux kernel, Greg Kroah-Harmant:
http://ols.108.redhat.com/2007/Reprints/kroah-hartman-Reprint.pdf
The pdf paper includes all sorts of stats on the development process, but the thing that gets to me as a Linux user is how much code gets checked in/modified/dropped, etc. and it *still* works with releases every 3 months or so. They've got something like 1100 developers and the only way they manage to keep the pace they're going at is a) decentralization b) pyramid structure of trusted developers who do the pulling and merging.
Johnny on August 23, 2008 10:30 AMShouldn't check-in be an automatic text/code editor function in an IDE?
Writing code that always compiles when you hit "save" shouldn't be that difficult for professionals.
Lepto Spirosis on August 24, 2008 03:59 PMJeff, your rss feed hasn't been updated since July 29th. Don't know why, it just hasn't.
Mike Brown on August 25, 2008 06:46 AMA properly configured SCM environment allows for users to have personal branches so CI early and often is a very good thing. When you want to have your code mix it up with other developers merge it... There are very good tools and methods that support merging. Merging is your Friend!
Patrick on August 26, 2008 08:34 AMI think many people use "continuous integration" in the wrong direction and that is where is faulty "checkin often" idea is born.
Imaging I'm working on something big, e.g. a new file system.
Continuous integration doesn't mean I add my changes to the build every day (so I disturb everybody's work for months ).
Continuous integration means I have to download all changes make by others into my private build on a daily basis and I have to build the whole shebang myself to make sure my new module integrates perfectly.
Then, someday, when my module is mature, I can checkin without breaking everything.
I think many people get it wrong because they have (only)
one big build environment where the whole build is compiled,
which is wrong. Truth is, every project needs its own big build
environment where they can build the big thing (fast enough).
Cant agree more, that's my rule of thumb too, besides if you don't do that you almost automatically forget to check-in something
Ion Todirel on August 29, 2008 12:58 AMI've mainly worked with cvs and then svn but have satisfied my urge to checkin as I think, in tiny increments, by having a SEPARATE store.
For years I used VOODOO Personal on the Mac, now that it has failed to make the jump to OS/X I am using Mercurial. That allows me to have a "personal" repository which can be unbuildable and still checkin to the client repository when I have a contribution which is sufficiently complete to fit in with the rest of the team. I have suffered working on a site which had a philosophy of allowing the trunk to be broken (strong personalities) and I would never wish that on anyone. Using multiple repository technologies satisfies things all around.
Andy Dent on August 29, 2008 02:39 AMThere is some excellent albeit expensive commercial software available for automating and streamlining check in and other team based activities.
http://www.dragonlasers.com
Sod
I make parfait every once in a while when I have friends over for dinner - it's such a handy dessert, easy to prepare and you can just have it waiting for you in the freezer and don't have to fiddle around with it while your guest are here!
I most frequently flavour it with grated almonds, vanilla and a little bit of rum and serve it with some kind of fruit purée, but I am always grateful for new variations, so thanks a lot for this, will try it out for sure! I particularly like the combination with the red wine reduction.
yobel on September 2, 2008 01:37 AMNicolas,
You can only merge ranges in SVN if they're sequential. If someone checks in between two of your commits, you have to merge each individually.
Hence holding back until I'm reasonably sure that the sum total of my changes constitutes the entirety of the feature or fix. So that when I have to merge it later, it's just one merge instead of, say, 17.
T on September 8, 2008 11:05 AMI think the best thing is to use something like git-svn, you get git for your local system, and svn to share code with others.
Swaroop on September 9, 2008 07:10 PMI absolutely agree that developers should check-in often. It is crucial to integration. I think Erik's post about continuous integration going in the wrong direction is an interesting point and worth more consideration. In addition, I haven't used git, but svn has worked really well for the teams I've been on.
Coder Blues on September 23, 2008 06:57 PMOne voice here in favor of infrequent check ins to the trunk, and major use of private branches.
If you use CVS or other unsophisticated version control, you check in often because the pain of branch/merge is larger. But there is a huge penalty for this: no large comprehensive refactoring is ever really possible. Something that requires someone to rip into the whole design, reorganize and re-gell the whole thing is feasible unless you can branch, do that work, merge, fail because of unexpected effects, fix up, merge again, etc. Good example of unexpected results would be that all tests pass, but backward compatability with prior releases isn't perfect. (Generally, you've discovered that there are missing tests.., gee does that ever happen?)
To avoid disrupting everyone else, the penalty for someone merging in faulty code must be fairly low. I.e., the trunk is broken because I tried to merge in my new refactored stuff, which is tested, but something unexpected is going on..... gotta go develop some more tests. In the mean time the rest of the team can simply branch around me, or it's painless to back out the check-in on the trunk.
These kinds of big refactorings are not the everyday stuff of software design, more like monthly stuff, but it's like unit testing, if you make it even slightly difficult, it will just never get done, and if you introduce fear of doing it, by way of rules like "never ever break the trunk", you will simply never ever make major changes to a software design.
Mike Beckerle on September 24, 2008 05:12 AMI'm new to PHP and recently setup my local machine with PHP and MySQL for doing development. I was sort of stuck when I needed to post my work for the user to test and review. After looking around a bit I found a site that hosts PHP and MySQL apps. I was surprised that it was free - it seems they're offering the service at no cost until 2012. At that point they'll change over to a fee-based service. However, in the meantime, it's a great place to do anything from demo and sandbox right up to posting sites for real.
Their pitch is as follows:
"This is absolutely free, there is no catch. You get 350 MB of disk space and 100 GB bandwidth. They also have cPanel control panel which is amazing and easy to use website builder. Moreover, there is not any kind of advertising on your pages."
Check it out using this link:
http://www.000webhost.com/83188.html
Important: There's one catch in that you must make sure you visit the account every 14 days - otherwise the account is marked 'Inactive' and the files are deleted!!!
Thanks and good luck!
| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |