Bitkeeper after the storm - Part 1
Joe Barr
News Forge
May 11, 2004
It has been a couple of years since the Linux kernel mailing list was debating the issues of Linus Torvalds' scalability and the use of a proprietary source management tool called Bitkeeper to handle kernel patches. Now that the dust has settled, and intrigued by a press release [ http://www.bitkeeper.com/press/2004-03-17.html ] from Bitkeeper author Larry McVoy that claimed impressive productivity gains for Linus Torvalds and other kernel hackers using Bitkeeper, NewsForge decided it was time to talk with McVoy on the current state of affairs between the free software hackers and his proprietary code. This is part one of that interview; part two will appear tomorrow.
NF: It has been about a month since the press release with the startling news that BitKeeper more than doubled Linus Torvalds' productivity. What was the reaction to the news by other kernel hackers?
McVoy: It wasn't really news to the senior developers. They already knew.
Here's how that announcement came about. I asked someone we were considering hiring why he wanted to come work for us. His response was, "I hang out on the kernel list and it is obvious that Linus is ten times more effective since he switched to BitKeeper." That sounded pretty nice, but I didn't believe it. I knew things were better, but ten times better? That sounded a little too good to be true.
I know some of the senior kernel people personally so I started asking around. I spoke with Dave Miller, Jeff Garzik, Greg Kroah-Hartman, Andrew Morton, and Linus about this. Dave was the first person I spoke with and he said that he thought that 10x wasn't at all unlikely, and it was certainly 8x. Interesting. So I talked to Jeff and his comment was, "Oh, man, it's so much better, it has to be 10x." Greg had a fairly similar reaction. I was having lunch with Linus, Andrew and Ted T'so to talk about digital signatures for the kernel (those are implemented now, by the way) and I brought this up as a question. Andrew thought that anything would have been an improvement over what Linus was doing before and he agreed that BitKeeper was a lot better than CVS. But his take was that just a move to CVS would have been an improvement. Linus disagreed. Linus was adamant that if he had moved to CVS it would have slowed him down. So in Linus's mind, whatever improvement had happened was due to BitKeeper.
Greg has written a paper about the rate of change since the switch to BitKeeper. He has a lot to say about how BitKeeper has helped -- you might ping him for details. Some of the things I remember are:
The senior developers were well aware that things are better. The 2x announcement wasn't news at all. From their point of view, the 2x claim is an understatement because for them the improvement is bigger than that. But any claim is likely to be challenged, so what we did to arrive at that number was to simply measure the amount of change over the two-year period in BitKeeper and contrast that with the two-year period before BitKeeper. It worked out to about 2.5x more change. The metric I'd love to have is the number of patches integrated. We're all in agreement that it is far more than 2.5x more than it was before. Linus is processing around 50 patches a day, 365 days a year. That's an amazingly high number. Nobody in the software industry has ever processed that much change to my knowledge and I have worked at SCO, Sun, SGI, and Google as well as a few smaller companies.
Before we made the productivity announcement we talked it over with Linus. It was Greg who suggested the idea of measuring the diffs as a way of getting a quantitative handle on the problem. I showed the numbers to Linus and asked him if he agreed and he did. So that's how it happened.
NF: What is the level of acceptance for BitKeeper now as opposed to when Torvalds first announced he was going to use it?
McVoy: Most people use it so the acceptance level seems high. There was concern at the beginning that maybe we were trying to exploit the kernel team somehow. Our position has always been that we were and are sincere in our desire to help the community. Nobody believes in a free lunch, so many people try to figure out "what's the catch?" The more vocal we were regarding our sincerity, the more suspicious people became. That's human nature and there isn't much we can do about it other than continue to demonstrate that we will do the right thing. I used to work at Google and their "do no evil" motto is something that I took away from them. It's a good way to run a business, but it makes people wonder a bit. People expect corporations to be "evil," but not all of the corporations are evil. Google is a very visible company trying to do the right thing; we're a far less visible company but we are also trying to do the right thing. It is possible to do the right thing and make money and maybe Google's example will inspire other companies to follow suit.
I believe that a lot of the concerns have faded away because it is years later and we are still here and still supporting the free use of BitKeeper. Linus has used BK for more than two years, but the Power PC folks have been in BK since 1999, so we have been supporting kernel people in BK for at least four years. The MySQL folks have been using BK for about the same amount of time, so it ought to be clear that we are committed to helping the free software community.
It's worth pointing out that we are profitable and have no outside investors. That means that we, the employees and owners of BitKeeper, decide if it is a good idea to support the kernel and the other free users. We, not some outside money-focused investors, decide if what we are doing is a good thing. And we like the free software community.
There are some people who will always be worked up about any infrastructure that isn't GPLed. We understand their concerns and that's why we built the BK2CVS gateway. That way people know that no matter what they have the history in a GPLed tool. We do the export nightly and mirror the CVS root to master.kernel.org so there is simply no question that the data is available in a free form.
Along with BitKeeper itself, we provide bkbits.net, a free hosting service for BK repositories (Linux is there, MySQL is there, so are lots of other projects), and we provide a free public server (kernel.bkbits.net) that anyone can use if they are working on the kernel, BK user or not. The amount of service that we provide for free should, in theory, help convince people that our intentions are good and we are really trying to help the community of free software developers. People didn't trust that initially, but the longer we keep helping the more people tend to trust us.
NF: Is your pro bono work for Linux kernel development paying off in sales of your proprietary product?
McVoy: Absolutely. People look at how the kernel is being managed in BK and they believe that if BK can do that then it can handle their problems as well. A big marketing win for us is bkbits.net, our free hosting service. Managers look at that and at the sheer volume of the data (6 million files in 55GB of data) and when they learn that we spend less than a man-week a year on supporting bkbits.net, they are sold. That's a good thing for everyone; we're providing a useful service and we get some marketing value from it.
We derive benefit from the pro bono work in other ways as well. When we are testing out a new release we can put it on bkbits.net and we know in seconds if we have broken something important; people use old versions of BK to talk to bkbits.net every few seconds.
We are strongly committed to helping the Linux kernel community and other open source projects. Not everyone may believe this, but we'd be doing it even if there was no benefit to us. It is our way of giving back some value for all the great free software we use every day. We run our business on free software, we develop our product with free software, the free software community has been great for our business. All companies who benefit from free software ought to find a way to help the people who are producing that software.
I'm aware that some in the community would prefer that we gave back by adding to the pool of free software, but our product space doesn't seem to work well in that model. So we give back in other ways. The majority of the people in the community has come to trust that we will continue to do so.
Bitkeeper after the storm - Part 2
Joe Barr
News Forge
May 12, 2004
In Part 1 of this interview, we learned just how much Linus Torvalds and others have increased their productivity through the use of Bitkeeper to handle kernel patches. In this conclusion to the interview, we examine the consequences of that increase. Is it good or bad for the Linux kernel that more patches than ever are being applied? Both Larry McVoy, author of Bitkeeper, and Linus Torvalds, creator of Linux, offer their opinions.
NF: Thinking back to the chant "Linus doesn't scale" and having clearly demonstrated that with the right tools he has scaled, is there any concern on your part that the accelerated pace of Linux development we're seeing today might be taking too great a toll on Linus, or that the quality of Linux might suffer?
McVoy: Good questions. I'm going to answer in opposite order because the first one is a longer answer.
I don't think that the quality is suffering, we run our company on Linux and we see Linux steadily improving. There are definitely things going into the kernel that I don't agree with (mostly fake realtime stuff or fine-grained threading that less than .001% of the machines in the world will ever use) but I'm not the guy who gets to choose. So if I leave my personal views aside and try to be objective about it, it certainly seems to me that the kernel keeps getting better. 2.6 looks pretty good and the rate of change is dramatically higher. If the faster pace was going to cause problems, I suspect it would have done so by now.
The first question is more involved. The short answer is that I think that rather than taking a toll, Linus is more relaxed and able to spend more time doing what he should do, educating people, teaching them good taste, acting as a filter, etc. He and I talk periodically and he certainly seems more relaxed to me. I've seen him take interest in people issues that he would have let slide when he was under more pressure.
The longer answer, which addresses why the increased pace is not taking a toll on Linus, requires some background. If you look at software development, there are two common models, each optimizing one thing at the expense of the other. I call the two models "maintainer" and "commercial."
Development models
The maintainer model is one where all the code goes through one person who acts as a filter. This model is used by many open source projects where there is an acknowledged leader who asserts control over the source base. The advantage of this model is that the source base doesn't turn into a mess. The bad changes are filtered out. The disadvantage is that it is slow; you are going only as fast as the maintainer can filter.
The commercial model is one where changes are pushed into the tree as fast as possible. This could be called the "time to market model." Many commercial efforts start out in maintainer mode but then switch to commercial mode because in the commercial world, time to market is critical. The advantage of the commercial model is speed (gets to market first) and the disadvantage is a loss of quality control.
Scaling development
Everyone knows that small team development works well but problems emerge as the team grows. With a team of five or six people, filtering all changes works fine -- one person can handle the load.
What happens when you try to grow the team? Commercial and open source efforts diverge at this point, but both have growing pains.
The commercial approach is to abandon the filtering process and move quickly to get something out the door. It's simply not effective to try and filter the work of a few hundred developers through one person; nobody can keep up with that load. The commercial world has tried many different ways to have their cake and eat it too. Management would love to have speed and quality, but the reality is that if they get speed then they sacrifice quality.
The maintainer-model process has scaling problems as well. It works as long as the maintainer can keep up and then it starts to fall apart. For a lot of open source projects, it works really well because the projects never get above five or six people. That may seem small, but the reality is that most good things have come from small teams. But some projects are bigger than that: the Linux kernel, X11, KDE, Gnome, etc. Some projects are much larger -- the 2.5/2.6 branch of the Linux kernel shows more than 1,400 different people who have committed using BitKeeper.
It is obvious that trying to keep up with the efforts of more than 1,000 people is impossible for one person, so how do maintainer-model projects scale? They divide and conquer. Imagine a basic building block consisting of a set of workers and a maintainer. I think of these as triangles with the maintainer at the top and the workers along the bottom. You can start out with a maintainer and a couple of workers and you keep adding until you can't fit any more in the triangle. When the triangle is full you create another layer of maintainers. The top triangle is filled with the ultimate maintainer who then delegates to sub-maintainers. So what were workers are now the first line of maintainers. Each of those sub-maintainers is leader of a second level triangle, and there are several of those below the top triangle. All I'm describing is a log(N) fan-in model where the same process of filtering is applied in layers.
The Linux kernel had moved to this model before they started using BitKeeper and it was troublesome. What is not explicitly stated in the layered maintainer model is that as you add these layers the workers are farther away from the authoritative version of the tree and all versions of the tree are changing. The farther away from the tree the more merging is required to put all the versions together. The sub-maintainers of Linux, who are the usual suspects like Dave Miller, Greg KH, Jeff Garzik, etc., were in "merge hell" every time Linus did a new release. Maintainer mode worked quite well for small teams but as it scaled up, the divide and conquer solution forces the sub-maintainers pay the price in repeated and difficult merging.
Scaling maintainer mode with BitKeeper
BitKeeper was designed with the maintainer model in mind, to enable that model (among others) by removing some of the repeated work such as merging. We knew that the maintainer model would be dominated by trees with various differences being merged and remerged constantly, so good merging had to be a key BitKeeper feature. BitKeeper is enough better at merging that it allows the model described above to work and to scale into the hundreds or thousands of developers. The fact that BitKeeper works well in this model is a big part of why the sub-maintainers all thought things were ten times better. For them, it was easily ten times better because they were doing much less work, because BitKeeper was doing all the merging for them. The sub-maintainers were doing more work and BitKeeper made most of that work go away, so the improvement for them was dramatic.
The fan-in/fan-out variation of the maintainer model is the way that Linus reduces his load. A sub-maintainer emerges as someone who can be trusted, a sub-section of the kernel is spun off as a somewhat autonomous sub-project, Linus works with that person to make sure that the filtering is done well, and the development scales a little further.
The point of this long-winded response to your question is to explain why the increased rate of change hasn't taken a toll on Linus. If a tool can support the maintainer plus multiple sub-maintainers (and even sub-sub-maintainers and so on) then the top-level maintainer can learn over time which of his sub-maintainers can be trusted to do a good job of filtering. There are some people from which Linus pulls changes and more or less trusts the changes without review. He's counting on those sub-maintainers to have filtered out the bad change and he has learned which ones actually do it. There are other people who send in patches and Linus reads every line of the patch carefully.
If I've done a good job explaining, then you can see how this model can scale. It's log(N), and log(N) approaches can handle very big Ns easily. The goal of the model is to make sure that changes can happen quickly but be carefully filtered even with a large number of developers. Without BitKeeper doing a lot of the grunt work, a project has to choose between the faster commercial model and the more careful maintainer model, but with BitKeeper you get to have your cake and eat it too. The process moves fast, close to as fast as the commercial model, but without losing the quality control that is so crucial to any source base, large or small.
To some extent, Linus's job becomes one of working with sub-maintainers to make sure they are as good as he is at filtering. He still does a lot of "real work" himself but he is scaling by enabling other people to do part of his job.
NF: Linus, since the number of patches handled has gone up so dramatically, do you still have time to give them the same sort of attention you did the old way?
Torvalds: Larry already answered, I'll just throw in my 2 cents'.
To me, the big thing BK allows me to do is to not worry about the people I trust, and who actively maintain their own subsystems. Merging with them is easier on both sides, without losing any history or commentary.
So the answer to your question is that to a large degree BK makes it much easier to give attention to those patches that need it, by allowing me to not have to care about every single patch. That, in turn, is what makes it possible for me to take many more patches.
So in that sense, I don't give the "same sort of attention" that I did in the old way. But that's the whole point -- allowing me (and others, for that matter) to scale better, exactly because I can direct the attention.
A lot of my time used to be taken up by the "obvious" patches -- patches that came in from major subsystem maintainers that I trusted. That has always been the bulk of the work, and the patches that require attention are comparatively few. But when everything was done with patches, I basically needed to do the same thing for the "hard" cases as for the "easy" ones. And a fair amount of the work was just looking at the email to decide into which category it fell.
That's where BK helps.
There is another part to it too -- BK allows me to give much more control to the people I trust, without losing track of what is going on.
Traditionally, when you have multiple people working on the same source tree, they all have "write access" to whatever source control management system they use. That in turn leads to having to have strict "commit" rules, since nobody wants anybody else to actually make changes without having had a chance to review the changes. That in turn tends to mean that the limiting point becomes the "commit review" process. Either the process is very lax ("we'll fix the problems later," which never works), or the process is so strict that it puts a brake on everybody.
In contrast, the distributed nature of BK means that I don't give any "write access" to anybody up-front, but once they are done and explain what they did, we can both just synchronize, and there is no issue of patches being "stuck" in the review process.
So not only does BK allow me to concentrate my attention on stuff I feel I need to think about (or the other side asks me to think about, which is actually more common), but it also allows me to literally give people more control. That makes it much easier to pass the maintenance around more, which is, after all, what it's all about.
Copyright 2004