From 0a2e4e167dc0a6b9fc0b038c4174694117b74628 Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 10:44:14 +0100 Subject: substantially expand the mediawiki tip with some of the steps. More to come --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 101 +++++++++++++++++++++++++++-- 1 file changed, 97 insertions(+), 4 deletions(-) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index f03703b46..c522eaec3 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -1,4 +1,97 @@ -[[sabr]] explains how to [import MediaWiki content into -git](http://u32.net/Mediawiki_Conversion/index.html?updated), including -full edit hostory. The [[plugins/contrib/mediawiki]] plugin can then be -used by ikiwiki to build the wiki. +Mediawiki is a dynamically-generated wiki which stores it's data in a +relational database. Pages are marked up using a proprietary markup. It is +possible to import the contents of a Mediawiki site into an ikiwiki, +converting some of the Mediawiki conventions into Ikiwiki ones. + +The following instructions describe ways of obtaining the current version of +the wiki. We do not yet cover importing the history of edits. + +## Step 1: Getting a list of pages + +The first bit of information you require is a list of pages in the Mediawiki. +There are several different ways of obtaining these. + +### Parsing the output of `Special:Allpages` + +Mediawikis have a special page called `Special:Allpages` which list all the +pages for a given namespace on the wiki. + +If you fetch the output of this page to a local file with something like + + wget -q -O tmpfile 'http://your-mediawiki/wiki/Special:Allpages' + +You can extract the list of page names using the following python script. Note +that this script is sensitive to the specific markup used on the page, so if +you have tweaked your mediawiki theme a lot from the original, you will need +to adjust this script too: + + from xml.dom.minidom import parse, parseString + + dom = parse(argv[1]) + tables = dom.getElementsByTagName("table") + pagetable = tables[-1] + anchors = pagetable.getElementsByTagName("a") + for a in anchors: + print a.firstChild.toxml().\ + replace('&,'&').\ + replace('<','<').\ + replace('>','>') + +Also, if you have pages with titles that need to be encoded to be represented +in HTML, you may need to add further processing to the last line. + +### Querying the database + +If you have access to the relational database in which your mediawiki data is +stored, it is possible to derive a list of page names from this. + +## Step 2: fetching the page data + +Once you have a list of page names, you can fetch the data for each page. + +### Method 1: via HTTP and `action=raw` + +You need to create two derived strings from the page titles already: the +destination path for the page and the source URL. Assuming `$pagename` +contains a pagename obtained above, and `$wiki` contains the URL to your +mediawiki's `index.php` file: + + src=`echo "$pagename" | tr ' ' _ | sed 's,&,&,g'` + dest=`"$pagename" | tr ' ' _ | sed 's,&,__38__,g'` + + mkdir -p `dirname "$dest"` + wget -q "$wiki?title=$src&action=raw" -O "$dest" + +### Method 2: via HTTP and `Special:Export` + +Mediawiki also has a special page `Special:Export` which can be used to obtain +the source of the page and other metadata such as the last contributor, or the +full history, etc. + +You need to send a `POST` request to the `Special:Export` page. See the source +of the page fetched via `GET` to determine the correct arguments. + +You will then need to write an XML parser to extract the data you need from +the result. + +### Method 3: via the database + +It is possible to extract the page data from the database with some +well-crafted queries. + +## Step 2: format conversion + +The next step is to convert Mediawiki conventions into Ikiwiki ones. These +include + + * convert Categories into tags + * ... + +## External links + +[[sabr]] used to explain how to [import MediaWiki content into +git](http://u32.net/Mediawiki_Conversion/index.html?updated), including full +edit history, but as of 2009/10/16 that site is not available. + +The [[plugins/contrib/mediawiki]] plugin can then be used by ikiwiki to build +the wiki. -- cgit v1.2.3 From 811394ec92205112d3fec0d4f6aac380202f114b Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 11:20:32 +0100 Subject: add snippet for converting categories --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index c522eaec3..ce0c684e7 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -1,3 +1,5 @@ +[[!toc levels=2]] + Mediawiki is a dynamically-generated wiki which stores it's data in a relational database. Pages are marked up using a proprietary markup. It is possible to import the contents of a Mediawiki site into an ikiwiki, @@ -85,7 +87,23 @@ The next step is to convert Mediawiki conventions into Ikiwiki ones. These include * convert Categories into tags - * ... + + import sys, re + pattern = r'\[\[Category:([^\]]+)\]\]' + + def manglecat(mo): + return '[[!tag %s]]' % mo.group(1).strip().replace(' ','_') + + for line in sys.stdin.readlines(): + res = re.match(pattern, line) + if res: + sys.stdout.write(re.sub(pattern, manglecat, line)) + else: sys.stdout.write(line) + +## Step 3: Mediawiki plugin + +The [[plugins/contrib/mediawiki]] plugin can be used by ikiwiki to interpret +most of the Mediawiki syntax. ## External links @@ -93,5 +111,3 @@ include git](http://u32.net/Mediawiki_Conversion/index.html?updated), including full edit history, but as of 2009/10/16 that site is not available. -The [[plugins/contrib/mediawiki]] plugin can then be used by ikiwiki to build -the wiki. -- cgit v1.2.3 From 58287a929b0ea209a062d11940232a3d31805885 Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 11:21:26 +0100 Subject: explanatory text --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index ce0c684e7..3f771ed9b 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -83,10 +83,13 @@ well-crafted queries. ## Step 2: format conversion -The next step is to convert Mediawiki conventions into Ikiwiki ones. These -include +The next step is to convert Mediawiki conventions into Ikiwiki ones. - * convert Categories into tags +### categories + +Mediawiki uses a special page name prefix to define "Categories", which +otherwise behave like ikiwiki tags. You can convert every Mediawiki category +into an ikiwiki tag name using a script such as import sys, re pattern = r'\[\[Category:([^\]]+)\]\]' -- cgit v1.2.3 From 0bee92347020e3034faea5d3cc045ca443e4221c Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 11:23:55 +0100 Subject: minor reworking of page fetch section --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index 3f771ed9b..4a750882b 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -53,7 +53,7 @@ Once you have a list of page names, you can fetch the data for each page. ### Method 1: via HTTP and `action=raw` -You need to create two derived strings from the page titles already: the +You need to create two derived strings from the page titles: the destination path for the page and the source URL. Assuming `$pagename` contains a pagename obtained above, and `$wiki` contains the URL to your mediawiki's `index.php` file: @@ -64,6 +64,9 @@ mediawiki's `index.php` file: mkdir -p `dirname "$dest"` wget -q "$wiki?title=$src&action=raw" -O "$dest" +You may need to add more conversions here depending on the precise page titles +used in your wiki. + ### Method 2: via HTTP and `Special:Export` Mediawiki also has a special page `Special:Export` which can be used to obtain -- cgit v1.2.3 From f6d08ceb358f6d5e51c54ab1b9e90cc4a86fad37 Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 11:44:13 +0100 Subject: notes about namespaces --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index 4a750882b..87b1ebc48 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -42,6 +42,16 @@ to adjust this script too: Also, if you have pages with titles that need to be encoded to be represented in HTML, you may need to add further processing to the last line. +Note that by default, `Special:Allpages` will only list pages in the main +namespace. You need to add a `&namespace=XX` argument to get pages in a +different namespace. The following numbers correspond to common namespaces: + + * 10 - templates (`Template:foo`) + * 14 - categories (`Category:bar`) + +Note that the page names obtained this way will not include any namespace +specific prefix: e.g. `Category:` will be stripped off. + ### Querying the database If you have access to the relational database in which your mediawiki data is @@ -67,6 +77,12 @@ mediawiki's `index.php` file: You may need to add more conversions here depending on the precise page titles used in your wiki. +If you are trying to fetch pages from a different namespace to the default, +you will need to prefix the page title with the relevant prefix, e.g. +`Category:` for category pages. You probably don't want to prefix it to the +output page, but you may want to vary the destination path (i.e. insert an +extra directory component corresponding to your ikiwiki's `tagbase`). + ### Method 2: via HTTP and `Special:Export` Mediawiki also has a special page `Special:Export` which can be used to obtain -- cgit v1.2.3 From 7a9512918d981762016bdaf7e6196fe2d46c10c7 Mon Sep 17 00:00:00 2001 From: Jon Dowland Date: Fri, 16 Oct 2009 11:59:27 +0100 Subject: fix step numbering --- doc/tips/convert_mediawiki_to_ikiwiki.mdwn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'doc/tips') diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn index 87b1ebc48..1e5b912a9 100644 --- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn +++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn @@ -100,7 +100,7 @@ the result. It is possible to extract the page data from the database with some well-crafted queries. -## Step 2: format conversion +## Step 3: format conversion The next step is to convert Mediawiki conventions into Ikiwiki ones. @@ -122,7 +122,7 @@ into an ikiwiki tag name using a script such as sys.stdout.write(re.sub(pattern, manglecat, line)) else: sys.stdout.write(line) -## Step 3: Mediawiki plugin +## Step 4: Mediawiki plugin The [[plugins/contrib/mediawiki]] plugin can be used by ikiwiki to interpret most of the Mediawiki syntax. -- cgit v1.2.3