Wednesday, April 13, 2005

Why a free BitKeeper clone was a bad idea (which time had come)

I've been reading all the comments about the end-of-life of BitKeeper, the source control management system used by the Linux kernel developers. After plodding through hundreds of comments I began to understand how little the general public knows about the operation of BitKeeper and why having a free (as in speech) BitKeeper clone from anyone besides BitMover was a bad idea. Let me explain.

First, lets establish some context. According to this article, Larry McVoy had two reasons for dropping the free (as in beer) BitKeeper program: (1) Corruption, and (2) IP Loss. He goes on to explain how repository corruption would have caused major problems and how his IP would have been compromised by the free clone. Not many people bought it. To be sincere, I didn't buy it at first, but after thinking about it and thinking about BitKeeper's mode of operation, I decided that Larry McVoy is completely right. Let me explain.

Corruption

Here is Larry McVoy's explanation of corruption:
BK is a complicated system, there are >10,000 replicas of the BK database holding Linux floating around. If a problem starts moving through those there is no way to fix them all by hand. This happened once before, a user tweaked the ChangeSet file, and it costs $35,000 plus a custom release to fix it.
Many people deemed this problem as a simple bug in the BitKeeper server. Others saw it as bad design. Few people understood that it was neither, and the corruption would have been inevitable.

Working with distributed systems is different than working in a client-server environment.

Client-server environments have a well defined protocol (or an API) used to pass data between the client and the server. There is a layer of abstraction between the data files the server uses to represent the data, and the data exchange that takes place. It should not be possible to corrupt the server's data by using (or abusing) the protocol. Most of us feel really comfortable with this model. Most of the software in the world behaves this way.

Distributed systems (and peer to peer systems) behave slightly different. There is no server, and there is no client. Instead, a collection of peers interact by passing data between themselves. There is usually a protocol for transferring the data, and that protocol should have the same properties of a client-server protocol, but there is a slight difference. Each peer has a copy of the data files that in the client-server model belong only to the server. This is an important difference.

BitKeeper belongs to the second class of systems. A BitKeeper repository consists of not only the data (source code), but also all the metadata needed for revision control. Each user has a full copy of the repository, including all the history. There are two ways in which users can exchange data in BitKeeper: (1) clones, which amount to no more than creating a tarball of the entire repository and transferring it to the requesting peer (using, say HTTP), and (2) push/pull, which have some proprietary protocol for transferring ChangeSets between the peers. Note that these repositories contain all the internal data files that BitKeeper uses to manage the changes.

Now imagine a free BitKeeper program, let's call it FreeBK (FBK). It is reasonable to assume that FBK will corrupt repositories. After all, it's dealing with proprietary data file formats and its authors are learning, by reverse-engineering the data files, what are the invariants that BitKeeper maintains.

Now imagine a developer with access to both BK and FBK. This developer can clone repositories using either tool. BK will notice corrupt repositories and warn him. FBK however, still lacks the ability to detect certain kinds of corruption. If he chooses to clone a repository with FBK, the corruption has already spread. Working on top of this bad repository is fraught with danger. Work might be lost, bad ChangeSets could be spread (using either BK or FBK, as the corruption is now hidden under new ChangeSets), and information obtained from this repository can no longer be trusted (diffs and such).

If said developer has a commercial BitKeeper license, he will certainly call BitMover support and expect them to fix his corrupt repository and help him recover his work. In order to be able to do this, BitMover would have to track FreeBK and be aware of the ways in which it can cause corruption. This is clearly a very big support load for BitMover.

Loss of intellectual Property

The other reason Larry McVoy has for not wanting a Free BitKeeper program is loss of IP. The best way I can explain how this is a real problem is by comparing it to the PalmPilot.

When USRobotics came up with the PalmPilot, they had a very simple, but very important, insight: computers sucked at recognizing handwriting, but people are very good at learning how to draw new symbols. The Apple Newton had been a big failure, mainly because it never recognized what users wrote. The PalmPilot required users to learn a new simple alphabet, and that simplified the problem of handwriting recognition the point where the PalmPilot got it correct 99.9% of the time. This idea was very hard to come up with, and very expensive to develop, but once you have seen it done, it is obvious and very easily copied.

How is this related to BitKeeper? Well, anyone who has used BitKeeper knows that there are some ideas in it that are as simple as the handwriting recognition in the PalmPilot. Once you have seen them, they become obvious. Now, BitMover has already paid a big price for coming up with this ideas, and another big sum for developing them, but they are trivial to copy. You can see how it would not be in BitMover's best interest to have developers of competing source control management systems using BitKeeper.

A BitKeeper Free Linux

Overall, I think having BitKeeper was really good for both the Linux kernel and BitMover. It gave BitMover much needed visibility when it was a small company, and it gave the Linux kernel developers a much needed tool when they needed it the most.

Like every relationship, there has to be about equal giving from both sides in order to continue. In this case, I think the breakup was inevitable as BitMover was giving more to Linux than it was receiving from it. I mostly feel sorry for all the small Open Source projects that used BitKeeper for managing their source. Also for those of us that used it for keeping personal files in single-user mode. We have lost a very powerful free (gratis) tool.

On the other hand, I feel projects like monotone, darcs, arch, etc. will get a much needed push by the community to produce better tools. It will probably take them a couple of years to be as good as the BitKeeper of today, and by then BitMover might have come up with a new and improved BitKeeper. But by then the open source community will have very good tools and people with money will be able to buy state-of-the-art tools. Isn't that the way the world works?