So what's not to love?
Really, Github does seem to be the cat's meow of the open source world at the moment, with a nod to BitBucket, where Mercurial-based projects are hosted. As I've said I've only recently begun to use Github, so I've discovered it has some significant oddities for someone just joining the ecosystem. I'm surprised there hasn't been more discussion of this, a quick search of the interwebs only really turned up this post by Andrew Wilkinson.
I'll summarize his points and then add some spice.
- Coders are "rock stars" that are emphasized over their projects and the interface is designed from a contributor point of view. He goes on to later point out that projects of any decent size typically have more users than contributors, so the interface is a bit quirky for someone just wanting to use the project.
- Each fork gets its own issues and wiki, making it confusing where to discuss the project.
- Determining which fork to use is not trivial
The second is definitely a problem, but I think is a necessary one. A fork potentially has its own features and bugs, so these need to be put somewhere. However, anything not specific to the fork should be put in the wiki/issue tracker/whatever of the main project. I think this would be less of an issue if the next point were fixed.
Finally, I come to the heart of my Github confusion: which fork does one use? By 'use' I mean either fork and contribute to or, as a user, download and install. The developers have said they are working on a way to better identify the "main" project, but the solution is yet to be seen.
I'll show how I find the "main" project, and then examine the issue itself. There are two "find main" methods that probably need to be used together. The Network graph (read up on it here, you really need to understand these graphs to understand the my later figures) is not the place I start because it is relative to the current project, it will be used in a bit. What I mean by "relative to the current project" is, if you're looking at a project that is say the fork of the original project, all the commits for the original up to the point that the fork occurred are put into the forked project's timeline. Thus, I first try to find the "grandfather" or original project that started the chain. I do this by following the "forked from" links until I get up to the one that is not a fork of another.
Currently looking at dcramer's version of the project, a fork of robhudson's, which happens to be the grandfather. |
Okay, so hopefully you can see this is about as clear as mud and a rather inexact science. Wilkinson's "rock star" description is apt: projects are first identified first by the coder--robhudson's django-debug-toolbar--rather than the project itself. Admittedly this makes sense based on how git and forks work, but it leaves the interface muddied. Whom do I trust? robhudson or dcramer? Side note: I'm glad people generally use their names or sensible nicknames as identifiers, if "l33tskillz393" had a fork I don't think I'd even give it the time of day.
To further clarify lets look at some pictures of the Network graphs for a couple of projects I've looked at recently.
(django-pagination) |
What I have identified as the grandfather and main branch is the line on the top. There are more forks not shown here, but none below hgrimelid's have any "recent" commits. It is pretty clear here that the grandfather branch is the "main" version of the project: past forks have either died or merged their changes back in (merges can be seen on the blue and neon green lines in the upper left). There are some recent forks off the latest grandfather commit, possibly with important bug changes or features, so it makes the decision a bit less clear. Go with the main branch and assume important changes will be merged in a future version, or go with a fork and hope it doesn't turn into a dead end?
Lets look at another with a slightly different situation.
(django-sorting) |
Github has highlighted an unforeseen problem with distributed version control when the participants aren't under some guiding light, such as working for the same company. The traditional model of a project is that there is some entity--a person, a committee, a company--that determines a project's versions, features, etc. Distributed source control may be used to develop the project, but at some point someone says "this is the next version" and everyone trusts that authority. This can be seen commercially in, say, how Microsoft releases new versions of Windows every so often. In the more complicated world of distributed version control, look at what git was created for developing in the first place: the kernel. It gets all kinds of forks and such but in the end the idea is that they get merged back in to the mainline kernel, Torvalds baptizes it, and distributions push this authoritative version out to users. In this traditional model, users and programmers really only care about the project as a whole, not the forks that went into it.
Github flips this on its head, I assume because of the "rock star" approach. Each programmer is given equal stage and there is no definitive project. Without an authority people go on their merry way and we get these spider webs of Network graphs.
I'm going to eagerly await Github's "find the main branch" solution. My admittedly rather warm-and-fuzzy suggestion is to strongly encourage merging forks. Traditionally forks have been a big deal because they split a project's development due to legal reasons or vision disagreements or whatever. But the assumption is there is no other recourse and no reconciliation. Perhaps this trained behavior is one reason for the lack of merges? Anyway, forks usually make significant changes to a project that are not necessarily meant to play nice with the original project. In the Github world forks are THE way to update projects. Thus forks are not splits, they are the way you do even small-scale things like fix bugs and add minor features. These are things that should bubble up to the main project, not languish in a soon-to-be-forgotten fork. It seems from my own experience that people fork, fix the bugs they need for their personal use, and then forget about the project altogether. If you look at the figures I've provided you see an abundance of forks, but merges are rare.
If Github could convince all the forkers to be mergers it would be a much happier place.
No comments:
Post a Comment