Archive for March, 2007

I recently tried to do some benchmarks of our MediaWiki setup. I used Apache jMeter and all worked well enough. My test was made of several simulated users accessing pages randomly from a pool of 1000 pages. I made sure to let the test run long enough before measurement to be certain that all pages had been touched once, and the MediaWiki parser caches were “hot”.

Then I wanted to try different caching strategies like without cache, with cache accel, etc. I would remeasure for each strategy to see which we would benefit of most. And there was the next MediaWiki pitfall luring behind the next corner, waiting to hit me:

Whenever I tried a new cache strategy, the response time went through the roof. The cause was soon found, all parser caches were suddenly empty and every page had to be reparsed.

It took me a while and looking at the caching source code, to eventually find the culprit in LocalSettings.php. By default it contains these lines:

# When you make changes to this configuration file, this will make
# sure that cached pages are cleared.
$configdate = gmdate( ‘YmdHis’, @filemtime( __FILE__ ) );
$wgCacheEpoch = max( $wgCacheEpoch, $configdate );

There. That was it. DUH!! MediaWiki invalidates all caches whenever you update any tiny little detail in your LocalSettings.php. Now this is not only a problem for benchmarks, it’s more of a problem for a live site.

As a matter of fact, we recently had quite serious performance issues, and looking back now it may have well been this anti-feature of MediaWiki. Invalidating caches is a sane and common concept. However to invalidate all caches at once – esp. with a parser that’s as slow as MediaWiki’s – this can quickly become a home made DoS attack on your own server. Suddenly every access to a page fires up the parser and takes considerably more CPU power than before. Do that with enough users simultaneously and an already strained server and you soon have a dead server.

It becomes worse if you are about to despair and think you could do something about the speed issue by tweaking things in your LocalSettings.php, because by doing that you reset the cache YET AGAIN.

I commented out that reset code from my settings file, and I know I will look more carefully in the future what software with a track record similar to MediaWiki’s uses as “sane defaults”.

MediaWiki is really no straightforward software for extension writers. Even if you think it is straightforward it is particularly not. So as an advice: if it looks just too easy to do … think twice!

As Timmy wrote in his blog I just ended an unpleasant dance with Mediawiki.

I had written a small extension for the spottingworld wikis that would copy page templates to new users’ pages as a welcome message. I was happy to see that there was the mentioned AddNewAccount hook available to get a wedge into the registration process. And it was really easy enough to use the hook, by copying the broad scheme from the Newuserlog extension.

But I had underestimated the pitfalls. And sometimes it’s really the minute details that are the deepest pitfalls. I had not put in a return true; statement at the end of my hook handling function. And as a result the Newuserlog extension was not called anymore.

It was easy enough to fix, but it makes me wonder what the sense is behind this statement. As part of the registration process it would be useful if an extension could veto a registration by simply returning false. But this did not happen. All our registrations went fine. All my mistake did was to not call other extensions using the same hook. I don’t see the reasoning of this except to annoy fellow developers.

Yes, another frustrating experience with MediaWiki. I found by accident a script in the maintenance directory of MediaWiki called genSitemap.php. It seemed to work well enough. Created all the sitemaps per namespace and even an index sitemap to link them all together.
But when I tried to submit it to Google Webmaster Tools I found the catch. The index sitemap file does not comply with Google’s idea of well formed syntax and hence is flagged with errors all over.
The problem is that genSitemap links to the acutal sitemap files relatively, while Google wants absolute URLs. For example:

<loc>sitemap-spottingworld-NS_0-0.xml.gz</loc>

should really be

<loc>http://www.spottingworld.com/sitemap-spottingworld-NS_0-0.xml.gz</loc>

Ok. It was easy enough to fix. But it was another of those “NOTHING works out of the box there” experiences. And more interestingly, I wonder if Wikipedia is using that same maintenance script and thus has not submitted a valid sitemap to Google for a while?!? Questions, questions, and more questions in the absurd world of MediaWiki…

First of all, I like Wikipedia. I like it as a user, I like looking things up and using that information. What I don’t like at all is MediaWiki as a software. I think all the hype about it comes from it being used to power Wikipedia. But does that really say anything about the quality of the software? I don’t think so.

(more…)

The first post is always a bit awkward because there is no reason to post except that it’s a threshold that has to be passed before writing something useful.

It will be a bit different here because this blog starts several weeks after the work on www.spottingworld.com and soon after train.spottingworld.com began, so I still some things fresh in my mind that I felt quite annoyed by. And yes, I will post them here, and in a more compressed timeframe than they originally happened.

So. This is post #1. Good we have that behind us and can start doing some real work.