<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>domas mituzas &#187; mydumper</title>
	<atom:link href="http://mituzas.lt/tag/mydumper/feed/" rel="self" type="application/rss+xml" />
	<link>http://mituzas.lt</link>
	<description></description>
	<lastBuildDate>Fri, 30 Jul 2010 07:36:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1-alpha</generator>
		<item>
		<title>after the conference, mydumper, parallelism, etc</title>
		<link>http://mituzas.lt/2009/05/18/after-the-conference-mydumper-parallelism-etc/</link>
		<comments>http://mituzas.lt/2009/05/18/after-the-conference-mydumper-parallelism-etc/#comments</comments>
		<pubDate>Mon, 18 May 2009 20:00:57 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[mydumper]]></category>
		<category><![CDATA[mysqlconf]]></category>
		<category><![CDATA[mysqluc]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=490</guid>
		<description><![CDATA[Though slides for my MySQL Conference talks were on the O&#8217;Reilly website, I placed them in my talks page too, for both dtrace and security presentations. I also gave a lightning talk about mydumper. Since my original announcement mydumper has &#8230; <a href="http://mituzas.lt/2009/05/18/after-the-conference-mydumper-parallelism-etc/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Though slides for my MySQL Conference talks were on the O&#8217;Reilly website, I placed them in my <a href="http://dammit.lt/talks">talks</a> page too, for both <a href="http://dammit.lt/uc/mysqluc2009dtrace.pdf">dtrace</a> and <a href="http://dammit.lt/uc/mysqluc2009security.pdf">security</a> presentations.</p>
<p>I also gave a lightning talk about <a href='http://launchpad.net/mydumper'>mydumper</a>. Since my <a href='http://dammit.lt/2009/02/03/mydumper/'>original</a> announcement mydumper has changed a bit. It supports writing compressed files, detecting and killing slow queries that could block table flushes, supports regular expressions for table names, and trunk is slowly moving towards understanding that storage engines differ :)</p>
<p>I&#8217;ve been using mydumper quite a lot in my deployments (and observing 10x faster dumps). Now, the sad part is how to do faster recovery. It is quite easy to parallelize load of data (apparently, xargs supports running parallel processes):</p>
<pre>
echo *.sql.gz | xargs -n1 -P 16 -I % sh -c 'zcat % | mysql dbname'
</pre>
<p>Still, that doesn&#8217;t scale much &#8211; only doubles the load speed, compared to single threaded load, even on quite powerful machine. The problem lives in log_sys mutex &#8211; it is acquired for every InnoDB <b>row</b> operation, to grab LogicalSequenceNumbers (LSNs), so neither batching nor differentiation strategies really help, and same problem is hit by LOAD DATA too. In certain cases I saw quite some spinning on other mutexes, and it seems that InnoDB currently doesn&#8217;t scale that well with lots of small row operations. Maybe someone some day will pick this up and fix, thats why we go to conferences and share our findings :) </p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2009/05/18/after-the-conference-mydumper-parallelism-etc/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>mydumper</title>
		<link>http://mituzas.lt/2009/02/03/mydumper/</link>
		<comments>http://mituzas.lt/2009/02/03/mydumper/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 12:22:57 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[mydumper]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=309</guid>
		<description><![CDATA[Last weekend I ended up working on small pet project &#8211; and today I&#8217;m kind of releasing it. So, I had that idea that there&#8217;s no good tool to do logical dump of MySQL data for large sites &#8211; mysqldump &#8230; <a href="http://mituzas.lt/2009/02/03/mydumper/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Last weekend I ended up working on small pet project &#8211; and today I&#8217;m kind of releasing it. </p>
<p>So, I had that idea that there&#8217;s no good tool to do logical dump of MySQL data for large sites &#8211; mysqldump doesn&#8217;t provide too much of I/O pressure, mk-parallel-dump is closer, but it doesn&#8217;t do consistent snapshots, uses same mysqldump, as well as is written in Perl (haha!), and&#8230; I just wanted something new to hack, as proof of concept. For a while to use it one had to edit constants in code, but my colleague Mark contributed options support and it doesn&#8217;t need recompiles anymore to run it :)</p>
<p>So, let me introduce <a href='https://launchpad.net/mydumper'>mydumper</a>. It doesn&#8217;t dump table definitions, all it does is extracting data and writing it to files, fast. <span id="more-309"></span></p>
<p>I took ~20GB-sized French Wikipedia core database (<a href='http://p.defau.lt/?OFdwYHWaPEOhH6EB1_hqyA'>SHOW TABLE STATUS</a>), and tried dumping it with three different methods &#8211; mysqldump, mk-parallel-dump and mydumper (used 32 thread, chunked backup setting for last two). </p>
<p>Dump times, smaller is better:</p>
<pre>
mysqldump: 75m18s
maatkit:    8m13s
mydumper:   6m44s \o/ WINNER \o/
</pre>
<p>There&#8217;s no cache skew &#8211; I restarted mysqld before every test, and it is using O_DIRECT.</p>
<p>At certain moments it seemed like gigabit network wasn&#8217;t enough for the test&#8230; It seems, it was using underlying I/O properly too:</p>
<pre>
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.58    0.00    3.28   48.14    0.00   36.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
sda              19.00    26.00 3077.00    6.00 151720.00   306.00    

   avgrq-sz avgqu-sz   await  svctm  %util
      49.31    32.09   10.27   0.32 100.00
</pre>
<p>Though, once I tried from warm caches, and saw 2m rows read a second, I had a warm fuzzy feeling :) </p>
<p>Apparently the trick of having successful fast mysql dump was applying lots of pressure to underlying storage as well as using multi-processor capabilities. So we do, so can you!</p>
<p>Oh, and easiest way to start is:</p>
<pre>
bzr co lp:mydumper/0.1
cd mydumper
make
./mydumper --help
</pre>
<p>Alternatively, one can use &#8216;lp:mydumper&#8217; to get trunk &#8211; though various things (like startup options) can change. Feel free to file bugs, ask questions, and contribute with anything you think is worth contributing (thats why it ended up on Launchpad). </p>
<p><a href='https://answers.launchpad.net/mydumper/+faqs'>FAQs page</a> can have answers to questions that might arise too :)</p>
<p>Update: Added also downloadable archive for bzr impaired at <a href='https://launchpad.net/mydumper/+download'>downloads</a> page. </p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2009/02/03/mydumper/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
