Archive for the ‘wikitech’ Category

update

Wednesday, November 18th, 2009

In past few months I had lots of changes going on – left the Sun/MySQL job, my term on Wikimedia Board of Trustees ended, I joined Facebook and now I got appointed to Wikimedia Advisory Board. This also probably means that I will have slightly less hands-on work on Wikipedia technology (I’ll be mostly in “relaxed maintenance mode“), though I don’t know yet how much less – time will show :)

P.S. I also quit World of Warcraft. ;-)

GDB 7!

Thursday, October 8th, 2009

I wasn’t prepared for this. After spending months playing with GDB development trees I somehow entirely missed that 7.0 release is getting close, and took me more than an hour to spot it.

My favorite features are python scripting and non-stop debugging. I was toying around with python scripting for a while, and was planning to make backtraces make sense. Having hands that open means that one can see PHP backtraces, when gdb’ing apache, see table names and states when MySQL thread access handler interfaces, or remote IPs and users, when it is writing to network. Process inspection can simply rock, if right tools are created using these new capabilities, and I’m way too excited when I think about those. “Always have debugging symbols” gets way more meaning now.

Another issue I’ve been trying to resolve lately is avoiding long locking periods for running processes (directly attaching to process can freeze its work for a second or so, which isn’t that tolerable in production environments). GDB is getting closer to the async debugging capabilities – where one can run a debugger without actually stopping anything.

So, congratulations GDB team, now it is job for us to find all the uses of the tool. It has been invaluable so far, but this is much much more.

Spikes are not fun anymore

Thursday, August 20th, 2009

English Wikipedia just scored “three million articles”, so I thought I’d give some more numbers and perspectives :) Four years ago we observed impressive +50% traffic spike on Wikipedia – people came in to read about the new pope. Back then it was probably twenty additional page views a second, and we were quite happy to sustain that additional load :)

Nowadays big media events can cause some troubles, but generally they don’t bring huge traffic spikes anymore. Say, Michael Jackson’s English Wikipedia article had peak hour of one million page views (2009-06-25 23:00-24:00) – and that was merely 10% increase on one of our projects (English Wikipedia got 10.4m pageviews that hour). Our problems back then were caused by complexity of page content – and costs got inflated because of lack of rendering farm concurrency control.

Other interesting sources of attention are custom Google logos leading to search results leading to Wikipedia (of course!). Last ones, for Perseids or Hans Christian Ørsted sent over 1.5m daily visitors each – but thats mere 20 article views a second or so.

What makes those spikes boring nowadays is simply the length of long-tail. Our projects serve over five million different articles over the course of an hour (and 20m article views) – around 3.5m articles are opened just once. If our job would be serving just hot news, our cluster setup and software infrastructure would be very very very different – and now we have to accommodate millions of articles, that aren’t just stored in archives, but also are constantly read, even if once an hour (and daily hot set is much larger too).

All this viewership data is available in raw form, as well as nice visualizations at trendingtopics, wikirank and stats.grok.se. It is amazing to hear about all the research that is built on this kind of data, and I guess it needs some improved interfaces and APIs already for all the future uses ;-)

Board again (perhaps)

Monday, July 27th, 2009

Tomorrow voting for Wikimedia Foundation Board of Trustees Election starts – and Yours truly is a candidate.

You can find most of my views on various issues in our question pages (I was somewhat boiling when answering the What will you do about the WMF mishandling it’s funding? one – it probably takes great effort to phrase such a bad question, and so easy to answer it :), as well as Wikipedia Signpost ‘interview’.

I was appointed to the Board back in January 2008, after holding various other volunteer (at some point in time – ‘officer’) positions within the organization since 2004 – and brought in the core technology and operational efficiency skill set there. The appointment was supposed to be somewhat temporary, but board restructure appeared to be much longer process than we expected – both the chapters part, and nomination committee work. As a community member, after the restructure I was in ‘community-elected’ seat, though I never participated in any election – so that wasn’t too fair to the actual community, need to fix that :)

So, even though I wasn’t too visible to actual community (people would notice me mostly when things go wrong, and I’m not in best mood then, usually :-), I feel that the values I’ve worked on, evangelized and supported for all these years – efficiency and general availability of our projects – can win mindshare not only of our read-only users I work mostly for, but also eligible voters.

And I do think, that internal technology expertise has to be represented on board, as things we’ve been doing, and methods we’ve been using, are very much unique in the technology world. Oh, and somewhere I mentioned, our technology spending is close to 50%, that has to be represented too :-)

embarrassment

Friday, June 26th, 2009

So, we had a major embarrassment last night. It consisted of multiple factors:

  • We don’t have parallelism coordinator for our most cpu-intensive task at Wikipedia, so it can work on same job in ten, hundred, thousand threads across the cluster at the same time.
  • Some parts of our parsing process ended up extremely CPU-intensive, and that happened not in our code, but in ‘templates’, that are in user-space. We don’t have profiling for templates, so we can just guess which one is slow, which one is fast, nor their overall aggregates.
  • Some parts of pages are extremely template-heavy, making page rendering cost a lot (e.g. citations – see this discussion).
  • In order to avoid content integrity race conditions, editing process releases locks and invalidates objects early, separated from ‘virgin parse’ which populates caches.
  • It takes quite some time to refill the cache, as rendering is CPU-bound for quite a while in certain cases.
  • During that short time when caches are empty, stampede of users on single article causes lots of redundant work across the cluster/grid/cloud.
  • Michael Jackson article on English Wikipedia alone had a million views in one hour

So, in summary, we had havoc in our cluster because stampede of heavy requests between cache purge and cache population was consuming all available CPU resources, mostly working on rendering references section on Michael Jackson article.

Oh well, quick operations hack looked like this:

Index: ParserCache.php
===================================================================
--- ParserCache.php	(revision 52088)
+++ ParserCache.php	(working copy)
@@ -63,6 +63,7 @@
  if ( is_object( $value ) ) {
    wfDebug( "Found.\n" );
    # Delete if article has changed since the cache was made
    // temp hack!
+   if( $article->mTitle->getPrefixedText() != 'Michael Jackson' ) {
    $canCache = $article->checkTouched();
    $cacheTime = $value->getCacheTime();
    $touched = $article->mTouched;

It is embarrassing, as actual pageview count was way below our usual capacity, whenever we have problems is because of some narrow expensive problem, not because of overall unavoidable resource shortage. We can afford much more edits, much more pageviews. We could have handled this load way better if our users wouldn’t be creating complex logic in articles. We could have handled this way better, if we had more aggressive redundant job elimination.

Thats the real story of operations, though headlines like “High profile event brought down Wikipedia” may sound nice, the real story is “shit happens”.

on tools and operating systems

Tuesday, March 31st, 2009

Sometimes people ask why do I use MacOSX as my main work platform (isn’t that something to do with beliefs?). My answer is “good foundation with great user interface”. Though that can be treated as “he must like unix kernel and look&feel!”, it is not exactly that.

What I like is that I can have good graphical stable environment with some mandatory tools (yes, I used OS-supplied browser, mail, etc), but beside that maintain the bleeding edge open-source space (provided by MacPorts).

Also what I like, is OS-supplied development and performance tools. DTrace included is awesome, yes, but Apple did put some special touch on it too. This is visualization environment for dtrace probes and other profiling/debugging tools:

memory usage profiling

Even the web browser (well, I upgraded to Safari4.0 ;-) provides some impressive debugging and profiling capabilities:

safari web inspector

Of course, I end up running plethora of virtual machines (switching from Parallels to VirtualBox lately), but even got a KDE/Aqua build (for kcachegrind mostly). I don’t really need Windows apps, and I can run ‘Linux’ ones natively on MacOSX, and I can run MacOSX ones on MacOSX.

There’s full web stack for my MediaWiki work, there’re dozens of MySQL builds around, there’re photo albums, dtrace tools, World of Warcraft, bunch of toy projects, few different office suites, Skype, NetBeans, Eclipse, Xcode, integrated address books and calendars, all major scripting languages, revision control systems – git, svn, mercurial, bzr, bitkeeper, cvs, etc.

All that on single machine, running for three years, without too much clutter, and nearly zero effort to make it all work. Thats what I want from desktop operating system – extreme productivity without too much tinkering.

And if anyone blames me that I’m using non-open-source software, my reply is very simple – my work output is open-sourced.

I loved Encarta

Monday, March 30th, 2009

That happened long before Wikipedia. I loved Encarta. Well, before Encarta, I used to read this thing a lot:

But then Encarta arrived and I loved it. It did fit into single CD and didn’t take too much space on disk. I could look up all these articles in it, without having to use expensive dialup, fast. I remember my school buddies coming over and watching those tiny movies in it. I could rip it off for my school works, and look incredibly smart (now people rip off Wikipedia and don’t get too much credit for that :).

It is dead.

People on the interwebs suggest that employees at Wikipedia and Encyclopaedia Britannica will be throwing parties tonight. Oh well, Wikipedia is already up to date about this. Every encyclopedia out there was an inspiration for Wikipedia, more so than any technology or “web-two-oh” hype. There’s not much joy seeing good things die.

Ten years ago I imagined, that once I have my own home, I’ll have a place to put a full set of dead-tree Britannica, like my parents had “Lithuanian soviet encyclopaedia”. Wikipedia changed my plans (now there’re two flat panels staring at Wiki, inside and outside), but it seems it already is changing the world around it way more. RIP Encarta. You were inspiring, and really too young to die. If it was us, we didn’t mean it, really. By the way, that content of yours, I’d be glad to see it free. *wink*

I’m a creative commoner

Saturday, March 28th, 2009

Lately Creative Commons is becoming very dominant topic in my life. First of all, I see all the people in free culture world holding their breath and waiting for Wikipedia switch to CC license. I’m waiting for that too – and personally I really endorse it. Though usually people do not really notice licenses on web content, they really do once they see something they really want to reuse. Wikipedia ends up being isolated island, if it doesn’t go after sharing and exchanging information with other projects.

It takes time to understand one is ‘creative commoner’. I do have a t-shirt with such caption, but it is much more comfortable once you start feeling real power of use and reuse of information. Few anecdotes…
(more…)

Rasmus vs me

Monday, February 9th, 2009

Rasmus (of PHP fame) and me exchanged these nice words on Freenode’s #php (when discussing some PHP execution efficiency issues):

 
<Rasmus_> if that is your bottleneck, you are the world's best
          PHP developer
<Rasmus_> domas: then you are writing some very odd code.
          I think you will find you spend most of your time in
          syscalls and db stuff

<domas> Rasmus_: I can tell you're the best database developer, if
        you spend most of your time in db stuff :)
 

You can immediately see different application engineering perspectives :)

Tim is now vocal

Tuesday, December 16th, 2008

Tim at the datacenter
Tim is one of most humble and intelligent developers I’ve ever met – and we’re extremely happy having him at Wikimedia. Now he has a blog, where the first entry is already epic by any standards. I mentioned the IE bug, and Tim has done thorough analysis on this one, and similar problems.

I hope he continues to disclose the complexity of real web applications – and that will always be a worthy read.