Archive for the ‘misc’ Category

When bad things happen!

Tuesday, December 29th, 2009

When bad things happen, like… ’strace -f’ or ‘gdb’ or any other process inspection tool decides to hang your precious processes (they show up in state T in process lists), there’s always help:

#include <sys/ptrace.h>
#include <signal.h>
main(int ac, char **av) {
int pid; if (ac>1) pid=atoi(av[1]);
ptrace(PTRACE_ATTACH,pid,0,0);
ptrace(PTRACE_DETACH,pid,0,0);
kill(pid, SIGCONT); }

update

Wednesday, November 18th, 2009

In past few months I had lots of changes going on – left the Sun/MySQL job, my term on Wikimedia Board of Trustees ended, I joined Facebook and now I got appointed to Wikimedia Advisory Board. This also probably means that I will have slightly less hands-on work on Wikipedia technology (I’ll be mostly in “relaxed maintenance mode“), though I don’t know yet how much less – time will show :)

P.S. I also quit World of Warcraft. ;-)

Knight’s Cross!

Sunday, July 5th, 2009

Celebrating the Knight's Cross
We had very special State Award Ceremony today. Special, as it happens at the year we celebrate “thousand years of Lithuania”, special as it is the last one given by our very special President Valdas Adamkus.

Though for me, it was really special – Alvydas Mituzas, my Dad, got a Knight’s Cross, for his lifetime merits and service for our country, spanning forty years of creativity and dedication. Anyone knowing my Dad really know what the award is for, but for me it is also a symbol of support for virtues that I’ve seen in my whole life – hard work and imagination combined for public service and public good, no matter how difficult it is to start, or finish. Stories of his past are fascinating, and even I can learn more and more of them over the years. We’re proud of him, and we’re lucky to be his family.

I’m a creative commoner

Saturday, March 28th, 2009

Lately Creative Commons is becoming very dominant topic in my life. First of all, I see all the people in free culture world holding their breath and waiting for Wikipedia switch to CC license. I’m waiting for that too – and personally I really endorse it. Though usually people do not really notice licenses on web content, they really do once they see something they really want to reuse. Wikipedia ends up being isolated island, if it doesn’t go after sharing and exchanging information with other projects.

It takes time to understand one is ‘creative commoner’. I do have a t-shirt with such caption, but it is much more comfortable once you start feeling real power of use and reuse of information. Few anecdotes…
(more…)

Many needles in a haystack

Sunday, February 8th, 2009

This is probably quite useless experiment I’ve been doing, but it all started when someone in #mysql@freenode gave a very simple task:

  • Table A has two million rows
  • Table B has two thousand rows
  • Find all rows in A, which have any B row as substring in some field.

So, my colleague Scott was there, and gave the answer which satisfied the guy:

SELECT * FROM a JOIN b ON a.field LIKE CONCAT('%',b.field,'%');

I started telling Scott, that this will result in too many nested loops, and that better combined pattern matching should be used. Well, my actual words were something like “use RLIKE”. So, folks replaced LIKE in above query with RLIKE, didn’t see too much of improvement and made some fun of me.

So, I thought I should provide some decent response to mocking :-) I downloaded ‘most common American last names of 19th century’ from some website out there, took ~2000 of them, also built a list of all Wikipedia article titles, that have space in them (just to reduce dataset a bit, and make more humans show up there).

My initial poking showed around double speed increase when using combined pattern of RLIKE, and using PCRE UDFs provided double speed over RLIKE. I have no idea what I did wrong back then (or doing wrong now), but simple LIKE with nested row lookup is faster on my current test. Still, there’s something else I wanted to show :)

GNU grep has ‘-F’ functionality, which Interprets PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. Actually, it is quite well optimized, and uses nice algorithm, located in file kwset.c. This is what some comment in that file tells:

The algorithm implemented by these routines bears a startling resemblence to one discovered by Beate Commentz-Walter, although it is not identical.

See “A String Matching Algorithm Fast on the Average,” Technical Report, IBM-Germany, Scientific Center Heidelberg, Tiergartenstrasse 15, D-6900 Heidelberg, Germany. See also Aho, A.V., and M. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” CACM June 1975, Vol. 18, No. 6, which describes the failure function used below.

So, let’s try standing on shoulders of giants. Actually, I’m not even that smart to find this, it was actually Tim who researched and implemented this as PHP extension to make some of Wikipedia’s code faster.

So I shamelessly took few files from grep, and wrote some crappy MySQL UDF glue (it is just for demonstration, would need proper engineering to make it usable for general purposes).

So, what kind of performance would a custom-tailored algorithm for the task give…

Simple LIKE matching:

select straight_join * from names join lasts on
binary name like concat("%",last,"%") limit 1000;
1000 rows in set (3.80 sec)

Combined RLIKE matching (I don’t know why it is slower – it was much better on some previous test):

SELECT CONCAT("(",GROUP_CONCAT(last SEPARATOR "|"),")")
from lasts into @re;

SELECT * FROM names WHERE BINARY name RLIKE @re;
1000 rows in set (25.99 sec)

Combined PECL UDF matching:

SELECT CONCAT("/(",GROUP_CONCAT(last SEPARATOR "|"),")/")
from lasts into @re;

select * from names where preg_rlike(@re,name) limit 1000;
1000 rows in set (8.10 sec)

Algorithm theft:

SELECT fss_prepare(last) FROM lasts;
SELECT fss_search(name) FROM names LIMIT 1000;
1000 rows in set (0.02 sec)

Damn it, too fast, this must be some mistake, let’s try more rows:

10000 rows in set (0.07 sec)
100000 rows in set (0.62 sec)
551971 rows in set (3.50 sec)

So, by using customized algorithm we got 600x performance. What does Scott say about this?

domas: my answer took about 10 minutes.
       yours has taken what, week and a half?

.. Bastard… Oh well, priorities priorities :)

Memcached for small objects

Thursday, December 25th, 2008

Memcached quite often ends up as a store for very small objects (small key and some integer value), though it isn’t really designed to do this kind of work by default. Current memory management is based on slabs (200 of them), where objects are grouped by similar size – though actual sizes are pre-defined at startup based on few configuration parameters.

By default memcached would have slabs based on assumption, that smallest object size will have 48 bytes of data (thats without item header), and will increase the slab sizes in +25% steps:

slab class   1: chunk size    104 perslab 10082
slab class   2: chunk size    136 perslab  7710
slab class   3: chunk size    176 perslab  5957
slab class   4: chunk size    224 perslab  4681
...

So, in this case, it allocates at least 104 bytes per object, and next steps are way behind. Fortunately, there’re some quick steps to have better efficiency: (more…)

tax

Friday, December 19th, 2008

I love my country. Yesterday a law came in, in two weeks I’ll start paying triple income taxes (for all expenses too). I’m just knocked out today.

Computer art

Friday, August 29th, 2008

One immediately realizes Apple computers are very artsy, and why they’re loved by designers and such. Instead of showing BSODs, they start making abstract art.

Thats how it shows me my mail:
computer art

Talks

Monday, August 25th, 2008

I’ve opened a talks page, which is supposed to list slides from my talks at various conferences (well, mysqlconf mostly :) – I had to miss few conferences simply because of time clashes already, so having these public may help someone, maybe. :)

RAID

Friday, August 22nd, 2008

DBA rule #34: Never forget upgrading RAID firmware. It may be more worthy than upgrading anything else.

P.S. Can anyone make an island out of Portugal?