summaryrefslogtreecommitdiff
path: root/doc/rcs/details.mdwn
blob: a6174f439974df05398fa88597dbf027c3eb3063 (plain)

A few bits about the RCS backends

[[toc ]]

Terminology

``web-edit'' means that a page is edited by using the web (CGI) interface as opposed to using a editor and the RCS interface.

[[svn]]

Subversion was the first RCS to be supported by ikiwiki.

How does it work internally?

Master repository M.

RCS commits from the outside are installed into M.

There is a working copy of M (a checkout of M): W.

HTML is generated from W. rcs_update() will update from M to W.

CGI operates on W. rcs_commit() will commit from W to M.

For all the gory details of how ikiwiki handles this behind the scenes, see [[commit-internals]].

You browse and web-edit the wiki on W.

W "belongs" to ikiwiki and should not be edited directly.

darcs (not yet included)

Support for using darcs as a backend is being worked on by Thomas Schwinge, although development is on hold curretly. There is a patch in [[todo/darcs]].

How will it work internally?

``Master'' repository R1.

RCS commits from the outside are installed into R1.

HTML is generated from R1. HTML is automatically generated (by using a ``post-hook'') each time a new change is installed into R1. It follows that rcs_update() is not needed.

There is a working copy of R1: R2.

CGI operates on R2. rcs_commit() will push from R2 to R1.

You browse the wiki on R1 and web-edit it on R2. This means for example that R2 needs to be updated from R1 if you are going to web-edit a page, as the user otherwise might be irritated otherwise...

How do changes get from R1 to R2? Currently only internally in rcs_commit(). Is rcs_prepedit() suitable?

It follows that the HTML rendering and the CGI handling can be completely separated parts in ikiwiki.

What repository should [[RecentChanges]] and History work on? R1?

Rationale for doing it differently than in the Subversion case

darcs is a distributed RCS, which means that every checkout of a repository is equal to the repository it was checked-out from. There is no forced hierarchy.

R1 is nevertheless called the master repository. It's used for collecting all the changes and publishing them: on the one hand via the rendered HTML and on the other via the standard darcs RCS interface.

R2, the repository the CGI operates on, is just a checkout of R1 and doesn't really differ from the other checkouts that people will branch off from R1.

(To be continued.)

Another possible approach

Here's what I (tuomov) think, would be a “cleaner” approach:

  1. Upon starting to edit, Ikiwiki gets a copy of the page, and darcs changes --context. This context and the present version of the page are stored in as the “version” of the page in a hidden control of the HTML. Thus the HTML includes all that is needed to generate a patch wrt. to the state of the repository at the time the edit was started. This is of course all that darcs needs.
  2. Once the user is done with editing, Ikiwiki generates a patch bundle for darcs. This should be easy with existing Text::Diff or somesuch modules, as the Web edits only concern single files. The reason why the old version of the page is stored in the HTML (possibly compressed) is that the diff can be generated.
  3. Now this patch bundle is applied with darcs apply, or sent by email for moderation… there are many possibilities.

This approach avoids some of the problems of concurrent edits that the previous one may have, although there may be conflicts, which may or may not propagate to the displayed web page. (Unfortunately there is not an option to darcs apply to generate some sort of ‘confliction resolution bundle’.) Also, only one repository is needed, as it is never directly modified by Ikiwiki.

This approach might be applicable to other distributed VCSs as well, although they're not as oriented towards transmitting changes with standalone patch bundles (often by email) as darcs is.

The mercurial plugin seems to just use one repo and edit it directly - is there some reason that's okay there but not for darcs? I agree with tuomov that having just the one repo would be preferable; the point of a dvcs is that there's no difference between one repo and another. I've got a darcs.pm based on mercurial.pm, that's almost usable... --bma

IMHO it comes down to whatever works well for a given RCS. Seems like the darcs approach could be done with most any distributed system, but it might be overkill for some (or all?) While there is the incomplete darcs plugin in [[todo/darcs]], if you submit one that's complete, I will probably accept it into ikiwiki.. --[[Joey]]

I'd like to help make a robust darcs (2) backend. I also think ikiwiki should use exactly one darcs repo. I think we can simplify and say conflicting web edits are not allowed, like most current wiki engines. I don't see that saving (so much) context in the html is necessary, then. bma, I would like to see your code. --[[Simon_Michael]] PS ah, there it is. Let's continue on the [[todo/darcs]] page.

[[Git]]

Regarding the Git support, Recai says:

I have been testing it for the past few days and it seems satisfactory. I haven't observed any race condition regarding the concurrent blog commits and it handles merge conflicts gracefully as far as I can see.

(After about a year, git support is nearly as solid as subversion support --[[Joey]])

As you may notice from the patch size, GIT support is not so trivial to implement (for me, at least). It has some drawbacks (especially wrt merge which was the hard part). GIT doesn't have a similar functionality like 'svn merge -rOLD:NEW FILE' (please see the relevant comment in _merge_past for more details), so I had to invent an ugly hack just for the purpose.

I was looking at this, and WRT the problem of uncommitted local changes, it seems to me you could just git-stash them now that git-stash exists. I think it didn't when you first added the git support.. --[[Joey]]

Yes, git-stash had not existed before. What about sth like below? It seems to work (I haven't given much thought on the specific implementation details). --[[roktas]]

 # create test files
 cd /tmp
 seq 6 >page
 cat page
 1
 2
 3
 4
 5
 6
 sed -e 's/2/2ME/' page >page.me # my changes
 cat page
 1
 2ME
 3
 4
 5
 6
 sed -e 's/5/5SOMEONE/' page >page.someone # someone's changes
 cat page
 1
 2
 3
 4
 5SOMEONE
 6

 # create a test repository
 mkdir t
 cd t
 cp ../page .
 git init
 git add .
 git commit -m init

 # save the current HEAD
 ME=$(git rev-list HEAD -- page)
 $EDITOR page # assume that I'm starting to edit page via web

 # simulates someone's concurrent commit
 cp ../page.someone page
 git commit -m someone -- page

 # My editing session ended, the resulting content is in page.me
 cp ../page.me page
 cat page
 1
 2ME
 3
 4
 5
 6

 # let's start to save my uncommitted changes
 git stash clear
 git stash save "changes by me"
 # we've reached a clean state
 cat page
 1
 2
 3
 4
 5SOMEONE
 6

 # roll-back to the $ME state
 git reset --soft $ME
 # now, the file is marked as modified
 git stash save "changes by someone"

 # now, we're at the $ME state
 cat page
 1
 2
 3
 4
 5
 6
 git stash list
 stash@{0}: On master: changes by someone
 stash@{1}: On master: changes by me

 # first apply my changes
 git stash apply stash@{1}
 cat page
 1
 2ME
 3
 4
 5
 6
 # ... and commit
 git commit -m me -- page

 # apply someone's changes
 git stash apply stash@{0}
 cat page
 1
 2ME
 3
 4
 5SOMEONE
 6
 # ... and commit
 git commit -m me+someone -- page

By design, Git backend uses a "master-clone" repository pair approach in contrast to the single repository approach (here, clone may be considered as the working copy of a fictious web user). Even though a single repository implementation is possible, it somewhat increases the code complexity of backend (I couldn't figure out a uniform method which doesn't depend on the prefered repository model, yet). By exploiting the fact that the master repo and web user's repo (srcdir) are all on the same local machine, I suggest to create the latter with the "git clone -l -s" command to save disk space.

Note that, as a rule of thumb, you should always put the rcs wrapper (post-update) into the master repository (.git/hooks/) as can be noticed in the Git wrappers of the sample [[ikiwiki.setup]].

Here is how a web edit works with ikiwiki and git:

  • ikiwiki cgi modifies the page source in the clone
  • git-commit in the clone
  • git push origin master, pushes the commit from the clone to the master repo
  • the master repo's post-update hook notices this update, and runs ikiwiki
  • ikiwiki notices the modifies page source, and compiles it

Here is a how a commit from a remote repository works:

  • git-commit in the remote repository
  • git-push, pushes the commit to the master repo on the server
  • the master repo's post-update hook notices this update, and runs ikiwiki
  • ikiwiki notices the modifies page source, and compiles it

[[Mercurial]]

The Mercurial backend is still in a early phase, so it may not be mature enough, but it should be simple to understand and use.

As Mercurial is a distributed RCS, it lacks the distinction between repository and working copy (every wc is a repo).

This means that the Mercurial backend uses directly the repository as working copy (the master M and the working copy W described in the svn example are the same thing).

You only need to specify 'srcdir' (the repository M) and 'destdir' (where the HTML will be generated).

Master repository M.

RCS commit from the outside are installed into M.

M is directly used as working copy (M is also W).

HTML is generated from the working copy in M. rcs_update() will update to the last committed revision in M (the same as 'hg update'). If you use an 'update' hook you can generate automatically the HTML in the destination directory each time 'hg update' is called.

CGI operates on M. rcs_commit() will commit directly in M.

If you have any question or suggestion about the Mercurial backend please refer to Emanuele

[[tla]]

rcs

There is a patch that needs a bit of work linked to from [[todo/rcs]].

[[Monotone]]

In normal use, monotone has a local database as well as a workspace/working copy. In ikiwiki terms, the local database takes the role of the master repository, and the srcdir is the workspace. As all monotone workspaces point to a default database, there is no need to tell ikiwiki explicitly about the "master" database. It will know.

The backend currently supports normal committing and getting the history of the page. To understand the parallel commit approach, you need to understand monotone's approach to conflicts:

Monotone allows multiple micro-branches in the database. There is a command, mtn merge, that takes the heads of all these branches and merges them back together (turning the tree of branches into a dag). Conflicts in monotone (at time of writing) need to be resolved interactively during this merge process. It is important to note that having multiple heads is not an error condition in a monotone database. This condition will occur in normal use. In this case 'update' will choose a head if it can, or complain and tell the user to merge.

For the ikiwiki plugin, the monotone ikiwiki plugin borrows some ideas from the svn ikiwiki plugin. On prepedit() we record the revision that this change is based on (I'll refer to this as the prepedit revision). When the web user saves the page, we check if that is still the current revision. If it is, then we commit. If it isn't then we check to see if there were any changes by anyone else to the file we're editing while we've been editing (a diff bewteen the prepedit revision and the current rev). If there were no changes to the file we're editing then we commit as normal.

It is only if there have been parallel changes to the file we're trying to commit that things get hairy. In this case the current approach is to commit the web changes as a branch from the prepedit revision. This will leave the repository with multiple heads. At this point, all data is saved. The system then tries to merge the heads with a merger that will fail if it cannot resolve the conflict. If the merge succeeds then everything is ok.

If that merge failed then there are conflicts. In this case, the current code calls merge again with a merger that inserts conflict markers. It commits this new revision with conflict markers to the repository. It then returns the text to the user for cleanup. This is less neat than it could be, in that a conflict marked revision gets committed to the repository.

[[bzr]]