On projects and their goals

Posted Apr 6, 2010 23:32 UTC (Tue) by iabervon (subscriber, #722)
In reply to: On projects and their goals by vonbrand
Parent article: On projects and their goals

Allow a user to prepare a commit that includes some directories without disclosing the contents of those directories to the user (where the user leaves them unchanged relative to some commit).

Allow a user to defer downloading the content of some files which do not compress well (even against the rest of the project) until that content is actually required.

Support users communicating with each other the intent to change a particular unmergeable file in a particular branch of a particular repository such that a user can be sure that it will be unnecessary to resolve merge conflicts in order to push changes to that branch of that repository. (And other users may make changes to these files, but will not be unaware that they may be forced to redo their work because of a conflict.)

There's also the issue that, if a project has an enormous SVN installation already, such that reading the whole thing is impractically slow, DVCSes currently don't support only importing (and reading) the portion necessary for some particular operations.

There are also sites using SVN as a distribution mechanism and namespace for large binary files, where they want to keep a history of what was there. Sure, SVN is the wrong tool for the job, but a DVCS is even more wrong, and they should be moving, when they move, to something else entirely.

There are probably more odd usages that haven't come up yet because the users haven't tried anything but SVN. It's generally nothing that can't be solved, but they require development targeted at usage that would be bad practice for software development but may be appropriate or even required for other sorts of content.

to post comments

On projects and their goals

Posted Apr 7, 2010 0:13 UTC (Wed) by dlang (guest, #313) [Link] (7 responses)

quote: Allow a user to prepare a commit that includes some directories without disclosing the contents of those directories to the user (where the user leaves them unchanged relative to some commit).

explain this a bit more please.

deferring downloading of some files can be done with git narrow/shallow clone options

as for this one:
Support users communicating with each other the intent to change a particular unmergeable file in a particular branch of a particular repository such that a user can be sure that it will be unnecessary to resolve merge conflicts in order to push changes to that branch of that repository. (And other users may make changes to these files, but will not be unaware that they may be forced to redo their work because of a conflict.)

how does SVN do this? isn't all the communication between users done outside of SVN? or are you referring to locking when you check something out?

your next point: There's also the issue that, if a project has an enormous SVN installation already, such that reading the whole thing is impractically slow, DVCSes currently don't support only importing (and reading) the portion necessary for some particular operations.

has nothing to do with SVN being better, merely with it being painful to switch away from SVN

re: large binary files, how large does SNV support? I thought I saw comments here indicating that it also has a fairly small limit.

On projects and their goals

Posted Apr 7, 2010 2:04 UTC (Wed) by iabervon (subscriber, #722) [Link] (6 responses)

First one: there are projects that want to have some files that some users can't read in the same directory as other files that the user can modify (and make commits to change). SVN doesn't need the user to be able to read all of the files in a revision in order to create the revision, simply because the client side doesn't need the information and the server side has it. Git (for example) could deal with this, but gets upset about not having access to all of the content in the commit.

Second: Last I checked, git's shallow clone support wasn't sufficient to let you get the arbitrary set of blobs that you actually care about, and there's only narrow checkout, not narrow clone. Furthermore, you can't clone the whole history, figure out (from the commit messages and changed files) which blobs matter to you, and download those particular blobs.

Third: Locking. As I just mentioned in http://lwn.net/Articles/382416/, there are situations where development tasks can't be parallelized with the available tools, and users want to avoid wasting their times by getting (advisory) locks and making sure that the user who got the lock doesn't have to redo their work. SVN offers mandatory locks, which is not optimal for users that choose to ignore them, but better than doing pointless work accidentally.

Fourth: I didn't say that other systems weren't better than SVN. I said that there were things that other systems didn't deal with as well as SVN does. One of these is your existing SVN server being unable to produce the whole history in a timely fashion.

Re file size: I've never stored a large file in SVN, but I've heard of people trying to switch to git and having problems, and it turning out that what they were version-controlling was video recordings that they were editing, and they had a many-TB fileserver and a laptop that could store maybe three copies of the video. I think it was SVN that they were using, but I may be mixing up stories.

On projects and their goals

Posted Apr 7, 2010 20:06 UTC (Wed) by vonbrand (subscriber, #4458) [Link] (5 responses)

Re: "Unreadable files": I just fail to see the point. If I develop, I need to see what I'm working with. Else, I don't need it, and shouldn't have to include it in my commits.

Re: "Shallow clone in git": Right, git is designed to work having all history available. Disk space is cheap nowadays... if it becomes a real problem, it could be ~~kludg~~worked around, I suppose. Not a show killer to me.

Re: "Locking": Sorry, but this is needed in SVN because there is a central repository (lest you lose all version control). With a DVCS, each one works in her own space, and integration (and conflict resolution, etc) don't need to happen during development.

On the others, I'm unable to comment.

On projects and their goals

Posted Apr 7, 2010 20:23 UTC (Wed) by njs (subscriber, #40338) [Link] (2 responses)

Yes, sure, we all know that most people on LWN don't need these features, that's why git is so successful :-). But all that's being claimed is that other people -- not you -- do. Is that really so unbelievable? :-)

BTW, I used to scoff at locking too, but it actually does make sense for files where merging is simply not possible. (Think PNGs, or, like, CAD documents with an undocumented format and no vendor-provided merge tool.) Any simultaneous development requires you to throw away one side and re-do that work from scratch, so delaying integration is a terrible idea.

On projects and their goals

Posted Apr 8, 2010 16:24 UTC (Thu) by vonbrand (subscriber, #4458) [Link] (1 responses)

I do know my set of requirements isn't the same as everybody else's, that is precisely why I'm asking.

What I'm trying to say is that such coordination among developers can't be just done by the tool: If we both start work on some unmergeable file, and then you lock it, I won't become aware of that until I try to lock it myself (perhaps much later, after much work has been wasted). This has to be handled in some other way, AFAICS.

On projects and their goals

Posted Apr 8, 2010 17:10 UTC (Thu) by farnz (subscriber, #17727) [Link]

Normally, the tool support for locking is good enough; working copies of files that need locking are read-only, so that when you come to save, the underlying OS says "no, file is read-only". You then get the lock before you continue working.

What's more, most tools don't even let you start work before you make the file in your working copy read-write. Because it's part of your working copy, your training makes you do that via the version control tool, which grabs the lock for you. Thus, normally it's a matter of seconds between opening a file with intent to work on it, and the tool telling you that you've forgotten a workflow step.

On projects and their goals

Posted Apr 8, 2010 16:44 UTC (Thu) by jschrod (subscriber, #1646) [Link] (1 responses)

You should not that version control systems are used for more purposes than software development. These other use cases have constraints that make your counterpoints invalid; for them you *need* support of processes that include locks, or unreadable files.

On projects and their goals

Posted Apr 12, 2010 1:05 UTC (Mon) by vonbrand (subscriber, #4458) [Link]

Yes, I'm interested in uses other than "software development". But just stating that there are applications that require some strange features doesn't help understanding if they are really needed, just an artifact of the tool currently being used, or even "that's how we are used to do it since way back when".

On what SVN can do that DVCSes can't

Posted Apr 7, 2010 0:36 UTC (Wed) by vonbrand (subscriber, #4458) [Link] (5 responses)

Allow a user to prepare a commit that includes some directories without disclosing the contents of those directories to the user (where the user leaves them unchanged relative to some commit).

I fail to see any use for this... said commiting user is (by definition) unable to check if the result works, or even makes any sense.

Plus if you distrust your users with commit rights that badly, you should rethink your workflow and organization assumptions very carefully indeed.

Allow a user to defer downloading the content of some files which do not compress well (even against the rest of the project) until that content is actually required.

Good point. But that kills one of the advantages of DVCSes, any operation can be done offline. Plus disks are getting cheaper (I very much doubt I can really fill a terabyte with meaningful data I'll ever use), and being able to get said contents at leisure when doing the first clone, and getting it from the nearest clone (not necesarily the "home repo") does offest that somewhat.

Support users communicating with each other the intent to change a particular unmergeable file in a particular branch of a particular repository such that a user can be sure that it will be unnecessary to resolve merge conflicts in order to push changes to that branch of that repository. (And other users may make changes to these files, but will not be unaware that they may be forced to redo their work because of a conflict.)

I don't follow you here. What do you mean by "unmergeable files"? If branches are laid out right, "unmergeable files" on the same branch touched by different developers just can't arise (the only possibility for merge conflicts in git is between branches, when you decide to merge them). And with git, once you resolve a particular conflict on a particular branch, it just can't arise again there. The merge algorithms in git have been tuned to minimize noise conflicts, and it is even possible to use ones' own if need be.

There's also the issue that, if a project has an enormous SVN installation already, such that reading the whole thing is impractically slow, DVCSes currently don't support only importing (and reading) the portion necessary for some particular operations.

This was handled e.g. by X.org by breaking up the all-in-one monster into manageable pieces (which they wanted to do anyway before migrating, AFAIU). Different tools, different ways of using them right. In a sense, this is relevant only for interacting with repositories in legacy VCSes or for (shortish?) transition periods, it is not a downside of DVCSes per se.

The others are arguably (mis)uses of a VCS, there is no reason to cater for them with any DVCS; as you correctly state, they should probably use other tools.

On what SVN can do that DVCSes can't

Posted Apr 7, 2010 1:44 UTC (Wed) by iabervon (subscriber, #722) [Link] (4 responses)

(1) An organization may have a secret key included in the project, which is used for making a signed release, but which developers are not allowed to have access to. It doesn't make sense in open-source setups, and only sort of makes sense to arrange that way in a closed-source organization, but there are organizations that do this. I never said that these use cases were a good idea, but there are sites that work like this.

(2) Try putting all of the CAD models of all the revisions of all of the parts of a car on every single engineer's workstation. A workstation could handle the recent revisions of all of the parts that are important to the particular engineer, but that's a tiny fraction of the whole history, particularly because the files don't compress well, even when there are (semantically) small changes between different revisions of the same part. Furthermore, it wouldn't be hard to download all of the commit messages and enough of the revisions of enough of the parts to be able to work offline.

(3) The "unmergeable files" are binary files for some closed-source program, often one where there aren't known diff/merge algorithms or an easy way of looking at multiple different options and preparing a merge result. The problem isn't branches, it's that there's just no way available to do a three-way merge of versions of a file that isn't more work than just starting over and making your modification again.

Obviously, git is perfectly sufficient for reasonable software development as it is. But there are a number of important uses of version control for applications that aren't software. Companies designing cars or circuit boards can't do it entirely in easily-merged text-based files that compress well, diff meaningfully with current tools, and generally are what you'd recognize as "source", but which are, nonetheless, the preferred form for making changes to what they work on.

On what SVN can do that DVCSes can't

Posted Apr 7, 2010 3:40 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

part of this depends on if you are looking for ways to make things work, or looking for ways that they won't work.

there were two cases mentioned as "git can't do this" that I see as reasonably easy to deal with

for your #2 (secret key) you could do a publicly accessible repository without the key, then a private repository that uses the subproject feature to refer to specific states in the public tree and also has the key in it.

I'm not finding it, but someone mentioned circuits changing and the effort to re-route the circuit board when schematic changes were made. This could be a similar thing, one project where multiple people can change the schematic, but then someone (project lead, whoever) decides that this version of the schematic is likely enough to be useful to have someone spend the day routing the board. that version of the schematic would be checked in as a subproject to the repository that contains the board routing.

or you could just only do the routing work when assigned to do so, rather than after any change.

an engineer's workstation with a TB of space could probably handle the cad files without much trouble. but is that really what the person cares about? In reality you don't have an engineer who cares about every detail of every part on the car. You have the engineers working on the axle who care about all the details of that, but the engineer working on the car as a whole is going to select axle version X and use it (ore more likely, axle model X, with specific external attachments, and variations of that model are invisible to that engineer)

This sounds like another perfect case for subprojects.

your final case (propriatary formats that can't be diffed) is a case where any VCS can do no better than storing copies of each one, but it may not be appropriate for these files to be in the VCS in the first place. Just like datbases have the concept of large objects where the database contains how to get to the object, not a copy of the object itself, the VCS may be better off storing such things elsewhere and just keeping a link to it. Git doesn't do this today, but it's been discussed, just nobody has wanted the feature badly enough to code it (or pay someone else to do so)

On what SVN can do that DVCSes can't

Posted Apr 10, 2010 13:17 UTC (Sat) by jpnp (guest, #63341) [Link] (1 responses)

> (propriatary formats that can't be diffed) is a case where any VCS can do no better than storing copies of each one, but it may not be appropriate for these files to be in the VCS in the first place. Just like datbases have the concept of large objects where the database contains how to get to the object, not a copy of the object itself

Many binary formats can be usefully diffed using a binary diff algorithm with large savings in storage space over separate files. SVN has done this from the start in its storage system (built on xdelta IIRC). The point with binary files is that the diffs have no semantic meaning and are not useful for merging.

While some DB systems do offer the ability to store BLOBs externally, all that I can think of also allow BLOBs to be kept within the DBMS, and this is widely used too as there are management/tooling advantages to one cohesive system.

Just because git and other DVCSs have been a phenomenal success for the OSS project use-case, and many developers are keen to leave SVN behind, doesn't mean that there isn't a place for the technology. Just because git could be made to do it, SVN isn't the better fit for some use cases.

In my professional job I have just moved a development project from SVN to mercurial. Most developers where sceptical as they only have experience of CVS & SVN, but I think it'll be worth it. However, I also have projects which are storing scientific data in a VCS, that's just as valid a use for version control (we no longer call them Source Code Control Systems, do we), and I would not consider moving that away from SVN.

On what SVN can do that DVCSes can't

Posted Apr 12, 2010 1:09 UTC (Mon) by vonbrand (subscriber, #4458) [Link]

One thing to consider is that it is better all around to use one tool than having each one be up to snuff with several.

On what SVN can do that DVCSes can't

Posted Apr 8, 2010 16:27 UTC (Thu) by vonbrand (subscriber, #4458) [Link]

(1) Secret keys (or such confidential data) have no place in a shared repo. They should be way better protected than "regular files" at the repo servers.