Posts Tagged ‘replication’

On replication, some more

Sunday, December 13th, 2009

Dear MySQL,

I feel ashamed that I ever wanted you to support 4.0->5.1 replication, and apologize for that. I really understand that it was really egoistic of me even to consider you should be involved in this.

I even understand that 5.0 is running out of active support (I’m not questioning that you’ll stop supporting 4.1 entirely too), and you’ll stop doing pretty much anything to 5.0, except “critical security fixes” (w00t, I managed to get one into 4.1, 8 year old MITM flaw :).

I really understand that supporting more than one release is very very very difficult, and people should do only adjacent version upgrade.

I’m not asking you much, but, maybe you could then support 5.1 to 5.1 replication? I don’t want much, just:

  • Gracefully recover after slave crashes.
  • Don’t have single serial reading of pages for replication stream as a bottleneck – either read-ahead properly (you can do that with RBR!!!), or apply events in parallel (you can do that with RBR too!)
  • Allow to edit replication filters without restarting servers.
  • Allow to enable and disable binary logging of events received from master, as well as enabling and disabling binary logging without restarting the instance.

I hope it isn’t too much too ask! It is just supported replication between two same version instances.

Thanks!

on replication compatibility

Saturday, December 5th, 2009

Dear MySQL,

I will do this to rest of your code, if you continue breaking replication for me.

– Domas

Evil replication management

Thursday, July 30th, 2009

When one wants to script automated replication chain building, certain things are quite annoying, like immutable replication configuration variables. For example, at certain moments log_slave_updates is more than needed, and thats what the server says:

mysql> show variables like 'log_slave_updates';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| log_slave_updates | OFF   |
+-------------------+-------+
1 row in set (0.00 sec)

mysql> set global log_slave_updates=1;
ERROR 1238 (HY000): Variable 'log_slave_updates' is a read only variable

Of course, there are few options, roll in-house fork (heheeeee!), restart your server, and keep warming up your tens of gigabytes of cache arenas, or wait for MySQL to ship a feature change in next major release. Then there are evil tactics:

mysql> system gdb -p $(pidof mysqld)
                       -ex "set opt_log_slave_updates=1" -batch
mysql> show variables like 'log_slave_updates';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| log_slave_updates | ON    |
+-------------------+-------+
1 row in set (0.00 sec)

I don’t guarantee safety of this when slave is running, but… stopping and starting slave threads is somewhat cheaper, than stopping and starting big database instance, right?

What else can we do?

mysql> show slave status \G
...
     Replicate_Do_DB: test
...
mysql> system gdb -p $(pidof mysqld)
          -ex 'call rpl_filter->add_do_db(strdup("hehehe"))' -batch
mysql> show slave status \G
...
      Replicate_Do_DB: test,hehehe
...

It is actually possible to add all sorts of filters this way, rpl_filter.h can be good reference :) So now that you want to throw out some data from your slaves, restart isn’t needed. Unfortunately, deleting entries isn’t possible via rpl_filter methods, but you can always edit base_ilists, can’t you?

P.S. having this functionality inside server would definitely be best.

5.0 journal: various issues, replication prefetching, our branch

Saturday, June 7th, 2008

First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2′. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.

Few migration notes – once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY – which pads data with \0 bytes. Solution was converting everything into VARBINARY – as actually it doesn’t have much overhead. TRIM('\0' FROM field) eventually helped too.

The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately – though it had few more queries remaining in the queue – so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…

I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% – so I felt nearly like I’ve done the good thing. Interesting though – profile showed that quite a lot of CPU time was spent in synchronization – mutexes and such – so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.

Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread – and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.

Still, there’re few problems with the concept – it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless – as all buffer pool pages required would be already in memory – thus leading to less spots of mutex contention.

We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not – file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.

Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.

I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)

I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)

Trainwreck: external MySQL replication agent

Wednesday, May 14th, 2008

I wanted to work more on the actual project before writing about it, but I’m lazy, and dear community may be not.

At Wikimedia we have one database server which replicates from multiple (like 15!) masters. It even splits replication streams by database, and applying changes in parallel.

All this stuff is done by external replication agent, Trainwreck. It is public-domain software, which was written by River, doesn’t have much documentation, works only on Solaris (River likes Solaris), unless you comment out all process management blocks, which use doors and other Solaris specific API.

It lives in Wikimedia SVN, and can be checked out using:

svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/trainwreck/

It sits there, maintained just for needs of that specific single server (ok, there might be two or three), so if anyone wants to make it available for broader audience, feel free to fork a project to some community-oriented place, add all nice features you need. :)

Replication will live!

Saturday, April 12th, 2008

Brian exposed some of his internal letters about death of replication (caused by memcached). Back when he wrote this, I responded back a bit too. Now as quite a few people really want to burry replication, let me point out some of reasoning why it will live.
(more…)