summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/tips/convert_mediawiki_to_ikiwiki.mdwn101
1 files changed, 97 insertions, 4 deletions
diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
index f03703b46..c522eaec3 100644
--- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
+++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
@@ -1,4 +1,97 @@
-[[sabr]] explains how to [import MediaWiki content into
-git](http://u32.net/Mediawiki_Conversion/index.html?updated), including
-full edit hostory. The [[plugins/contrib/mediawiki]] plugin can then be
-used by ikiwiki to build the wiki.
+Mediawiki is a dynamically-generated wiki which stores it's data in a
+relational database. Pages are marked up using a proprietary markup. It is
+possible to import the contents of a Mediawiki site into an ikiwiki,
+converting some of the Mediawiki conventions into Ikiwiki ones.
+
+The following instructions describe ways of obtaining the current version of
+the wiki. We do not yet cover importing the history of edits.
+
+## Step 1: Getting a list of pages
+
+The first bit of information you require is a list of pages in the Mediawiki.
+There are several different ways of obtaining these.
+
+### Parsing the output of `Special:Allpages`
+
+Mediawikis have a special page called `Special:Allpages` which list all the
+pages for a given namespace on the wiki.
+
+If you fetch the output of this page to a local file with something like
+
+ wget -q -O tmpfile 'http://your-mediawiki/wiki/Special:Allpages'
+
+You can extract the list of page names using the following python script. Note
+that this script is sensitive to the specific markup used on the page, so if
+you have tweaked your mediawiki theme a lot from the original, you will need
+to adjust this script too:
+
+ from xml.dom.minidom import parse, parseString
+
+ dom = parse(argv[1])
+ tables = dom.getElementsByTagName("table")
+ pagetable = tables[-1]
+ anchors = pagetable.getElementsByTagName("a")
+ for a in anchors:
+ print a.firstChild.toxml().\
+ replace('&,'&').\
+ replace('&lt;','<').\
+ replace('&gt;','>')
+
+Also, if you have pages with titles that need to be encoded to be represented
+in HTML, you may need to add further processing to the last line.
+
+### Querying the database
+
+If you have access to the relational database in which your mediawiki data is
+stored, it is possible to derive a list of page names from this.
+
+## Step 2: fetching the page data
+
+Once you have a list of page names, you can fetch the data for each page.
+
+### Method 1: via HTTP and `action=raw`
+
+You need to create two derived strings from the page titles already: the
+destination path for the page and the source URL. Assuming `$pagename`
+contains a pagename obtained above, and `$wiki` contains the URL to your
+mediawiki's `index.php` file:
+
+ src=`echo "$pagename" | tr ' ' _ | sed 's,&,&amp;,g'`
+ dest=`"$pagename" | tr ' ' _ | sed 's,&,__38__,g'`
+
+ mkdir -p `dirname "$dest"`
+ wget -q "$wiki?title=$src&action=raw" -O "$dest"
+
+### Method 2: via HTTP and `Special:Export`
+
+Mediawiki also has a special page `Special:Export` which can be used to obtain
+the source of the page and other metadata such as the last contributor, or the
+full history, etc.
+
+You need to send a `POST` request to the `Special:Export` page. See the source
+of the page fetched via `GET` to determine the correct arguments.
+
+You will then need to write an XML parser to extract the data you need from
+the result.
+
+### Method 3: via the database
+
+It is possible to extract the page data from the database with some
+well-crafted queries.
+
+## Step 2: format conversion
+
+The next step is to convert Mediawiki conventions into Ikiwiki ones. These
+include
+
+ * convert Categories into tags
+ * ...
+
+## External links
+
+[[sabr]] used to explain how to [import MediaWiki content into
+git](http://u32.net/Mediawiki_Conversion/index.html?updated), including full
+edit history, but as of 2009/10/16 that site is not available.
+
+The [[plugins/contrib/mediawiki]] plugin can then be used by ikiwiki to build
+the wiki.