<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>domas mituzas &#187; mysql</title>
	<atom:link href="http://mituzas.lt/tag/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://mituzas.lt</link>
	<description></description>
	<lastBuildDate>Fri, 30 Jul 2010 07:36:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1-alpha</generator>
		<item>
		<title>dtrace!</title>
		<link>http://mituzas.lt/2008/10/03/dtrace/</link>
		<comments>http://mituzas.lt/2008/10/03/dtrace/#comments</comments>
		<pubDate>Fri, 03 Oct 2008 11:13:06 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[dtrace]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://dammit.lt/2008/10/03/218/</guid>
		<description><![CDATA[At the MySQL developer conference I accidently showed up some of things we&#8217;ve been doing with dtrace (I used it in few cases and realized the power it has), and saw some jaws drop. Then I ended up doing small demos &#8230; <a href="http://mituzas.lt/2008/10/03/dtrace/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At the MySQL developer conference I accidently showed up some of things we&#8217;ve been doing with <a id="gonk" href="http://www.sun.com/bigadmin/content/dtrace/" title="dtrace">dtrace</a> (I used it in few cases and realized the power it has), and saw some jaws drop. Then I ended up doing small demos around the event. What most people know about dtrace, is that there&#8217;re some probes and you can trace them. What people don&#8217;t know is that you can actually create lots of probes dynamically, and use them with lots of flexibility.</p>
<p>One of major things not really grasped by many is that dtrace is a combination of a tracing tool, debugger, programming language and a database, having minor, but very valuable functionality for each. It can attach to any place in code, it can get stacks, function arguments, traverse structures, do some evaluations, aggregate data, and in the end &#8211; thats all compiled code executed by kernel (or programs). </p>
<p>Sometimes a probe may look not that useful (strace would provide file writes too?), but once combined with ability to get immediate stack, as well as set or read context variables (a previous probe on any other event could have saved some important information, e.g. host,user,table names, etc) &#8211; so final result may tell statistics correlated to many other activities. </p>
<p>One developer (a traitor who has left support for easier life in engineering dept) listened to all this, and I asked what his current project was &#8211; apparently he was adding static dtrace probes to MySQL. It ended up being quite interesting discussion, as static probes provide two value points. First of all, it provides an interface &#8211; whereas dynamic probes can change with code changes (though, that doesn&#8217;t happen too often :) Second value &#8211; one can do additional calculations on a specific probe, which would be done only on-demand (when the probe is attached). </p>
<p>So, having a static probe that directly maps to easy-mode dynamic one (it is straightforward to attach to a function, and quite easy to read its arguments), is a bit of waste (both in development time, as well as few instructions are actually written there). Dynamic tracing generally modifies binaries on fly &#8211; so it does not carry additional costs otherwise. Though an example where static probe would be awesome &#8211; having &#8220;query start&#8221; event, which would have query string canonized with all literals removed &#8211; this would allow on-demand query profiling for query groups, rather than stand-alone queries.</p>
<p>The other major value is ability to set thread-specific context variables in different probes, so they can read each other data. At the type of incoming packet one can tag the thread with whichever information needed &#8211; then any subsequent actions can reuse such information to filter out important events. That also removes the need of static probes providing multiple-layer information &#8211; it all can be achieved by chaining the events &#8211; without too much complexity. </p>
<p>I took a bit of trollish stance when approached a developer implementing internal performance statistics. We were playing a game &#8211; he&#8217;d tell me what kind of performance information he&#8217;d like to extract, and I&#8217;d show a method to do that with dtrace. More people from monitoring field joined, and we ended up discussing what is the perfect performance monitoring and analysis system. It is quite easy to understand, that different people will need different kinds of metrics. For MySQL development work performance engineer will need mutex contention information, someone fixing a leak will need heap profiling, someone writing a feature will want an easy way to trace how server executes their code &#8211; and all that is way far from any needs actual user or DBA has. Someone who writes a query just wants to see the query plan with some easy-to-understand costs (just need to pump more steroids into EXPLAIN). DBAs may want to see resource consumption per-user, per-table, etc (something <a id="j46_" href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches" title="Google patch">Google patch</a>  provides). It is interesting to find a balance, between external tools and what should be supported out-of-the-box internally &#8211; and it is way easier to force internal crowd to have proper tools, and it is always nice to provide a much as possible instrumentation for anyone externally. </p>
<p>Of course, there&#8217;s poor guy in the middle of two camps &#8211; a support engineer &#8211; who needs easy performance metrics to be accessible from clients, but needs way more depth than standard tools provide. In ideal case dtrace would be everywhere (someone recently said, thats one of coolest things Sun has ever brought) &#8211; then we&#8217;d be able to retrieve on-demand performance metrics from everywhere, and would be tempted to write <a id="q8p:" href="http://opensolaris.org/os/community/dtrace/dtracetoolkit/" title="DTraceToolkit">DTraceToolkit</a>  (a suite of programs that give lots and lots of information based on dtrace) like bunch of stuff for MySQL internals analysis.</p>
<p>I already made <a id="ntzb" href="http://p.defau.lt/?XFyyeBIFiQZLmn7XPXpPWQ" title="one very very simple tool">one very very simple tool</a>  which visualizes dtrace output, so we can have graphviz based SVG callgraph for pretty much any type of probe (like, who in application does expensive file reads) &#8211; all from a single dtrace oneliner. It seems I can sell the tool to Sun&#8217;s performance engineering team &#8211; they liked it. :) </p>
<p>Some people even installed Solaris afterwards for their performance tests. Great, I won&#8217;t have to (haha!).</p>
<p>Though lack of dtrace in Linux is currently a blocker for the technology, lots of engineers already have it on their laptops &#8211; MacOSX 10.5 ships it. It even has visual toolkit, that allows building some dtrace stuff in a GUI. </p>
<p>I&#8217;m pretty sure now, any engineer would love dtrace (or dtrace based tools), they just don&#8217;t know that yet.</p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2008/10/03/dtrace/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Notes from land of I/O</title>
		<link>http://mituzas.lt/2008/08/11/notes-from-land-of-io/</link>
		<comments>http://mituzas.lt/2008/08/11/notes-from-land-of-io/#comments</comments>
		<pubDate>Mon, 11 Aug 2008 10:52:00 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[directio]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[jfs]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[xfs]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=184</guid>
		<description><![CDATA[A discussion on IRC sparkled some interest on how various I/O things work in Linux. I wrote small microbenchmarking program (where all configuration is in source file, and I/O modes can be changed by editing various places in code ;-), &#8230; <a href="http://mituzas.lt/2008/08/11/notes-from-land-of-io/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A discussion on IRC sparkled some interest on how various I/O things work in Linux. I wrote small microbenchmarking <a href='http://noc.wikimedia.org/~midom/raidbench.c.txt'>program</a> (where all configuration is in source file, and I/O modes can be changed by editing various places in code ;-), and started playing with performance.</p>
<p>The machine for this testing was RAID10 16disk box with 2.6.24 kernel, and I tried to understand how O_DIRECT works, and how fsync() works and ended up digging into some other stuff.</p>
<p>My notes for now are:</p>
<ul>
<li>O_DIRECT serializes writes to a file on ext2, ext3, jfs, so I got at most 200-250w/s.</li>
<li>xfs allows parallel (and out-of-order, if that matters) DIO, so I got 1500-2700w/s (depending on file size &#8211; seek time changes.. :) of random I/O without write-behind caching. There are few outstanding bugs that lock this down back to 250w/s (<i>#xfs@freenode: &#8220;yeah, we drop back to taking the i_mutex in teh case where we are writing beyond EOF or we have cached pages&#8221;</i>, so
<pre>posix_fadvise(fd, 0, filesize, POSIX_FADV_DONTNEED)</pre>
<p>helps).</li>
<li>fsync(),sync(),fdatasync() wait if there are any writes, bad part &#8211; it can wait forever. Filesystems people say thats a bug &#8211; it shouldn&#8217;t wait for I/O that happened after sync being called. I tend to believe, as it causes stuff like InnoDB semaphore waits and such. </li>
</ul>
<p>Of course, having write-behind caching at the controller (or disk, *shudder*) level allows filesystems to be lazy (and benchmarks are no longer that different), but having the upper layers work efficiently is quite important too, to avoid bottlenecks. </p>
<p>It is interesting, that write-behind caching isn&#8217;t needed that much anymore for random writes, once filesystem parallelizes I/O, even direct, nonbuffered one. </p>
<p>Anyway, now that I found some of I/O properties and issues, should probably start thinking how they apply to the upper layers like InnoDB.. :) </p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2008/08/11/notes-from-land-of-io/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Crashes, complicated edition</title>
		<link>http://mituzas.lt/2008/08/05/complicated-crashes/</link>
		<comments>http://mituzas.lt/2008/08/05/complicated-crashes/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 09:22:00 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[wikitech]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[gcc]]></category>
		<category><![CDATA[inline]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[opteron]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=177</guid>
		<description><![CDATA[Usually our 4.0.40 (aka &#8216;four oh forever&#8217;) build doesn&#8217;t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen&#8230; As we used &#8230; <a href="http://mituzas.lt/2008/08/05/complicated-crashes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Usually our <a href='http://svn.wikimedia.org/viewvc/mysql/trunk/server/'>4.0.40</a> (aka &#8216;four oh forever&#8217;) build doesn&#8217;t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen&#8230;</p>
<p>As we used to run RAID0, a disk failure usually means system wipe and reinstall once fixed &#8211; so our machines all run relatively new kernels and OS (except some boxes which just refuse to die ;-), and we&#8217;re usually way more ahead than all the bunch of conservative RHEL users. </p>
<p>We had one machine which was reporting CPU/northbridge/RAM problems, and every MySQL crash was accompanied by <a href='http://en.wikipedia.org/wiki/Machine_Check_Exception'>MCEs</a>, so after replacing RAM, CPU and motherboard itself, we just sent the machine back to service, and asked them to do whatever it takes to fix it. </p>
<p>So, this machine, with proud name of &#8216;db1&#8242; comes and after entering the service starts crashing every day. I reduced InnoDB log file size, to make recovery faster, and would run it under &#8216;gdb&#8217;. Stacktrace on crash pointed to check-summing (aka folding) bunch of functions, so initial assumption was &#8216;here we get memory errors again&#8217;. So, for a while I thought that &#8216;db1&#8242; needs some more hardware work, and just left it as is, as we were waiting for new database hardware batch to deploy and there was a bit more work around.</p>
<p>We started deploying new database hardware, and it started crashing every few hours instead of every few days. Here again, reduced InnoDB transaction log size and gdb attached allowed to <a href='http://p.defau.lt/?pMUxpWwiGwwDOA1daO3Tiw'>trap the segfault</a>, and it was pointing again to the very same adaptive hash key calculation (folding!). </p>
<p>Unfortunately, it was non-trivial chain of inlined functions (InnoDB is full of these), so I built &#8216;-g -fno-inline&#8217; build, and was keenly waiting for a crash to happen, so I could investigate what and where gets corrupted. It did not. Then I looked at our <a href='http://p.defau.lt/?A6y0ZFUttppM_5_rNlmpmQ'>zoo</a> just to find out we have lots of different builds. On one hand it was a bit messy, on another hand, it showed few conclusions:</p>
<ul>
<li>Only Opterons crashed (though there&#8217;re like three year gap between revisions)</li>
<li>Only Ubuntu 8.04 crashed</li>
<li>Only GCC-4.2 build crashed</li>
</ul>
<p>After thinking a bit that:</p>
<ul>
<li>We have Opterons that don&#8217;t crash (older gcc builds)</li>
<li>Xeons didn&#8217;t crash.</li>
<li>We have Ubuntu 8.04 that don&#8217;t crash (they either are Xeons or run older gcc-4.1 builds)</li>
<li>We have GCC-4.2 builds that run nice (all &#8211; on Xeons, all on 8.04 Ubuntu). </li>
</ul>
<p>The next test was taking gcc-4.1 builds and running them on our new machines. No crash for next two days.<br />
One new machine did have gcc-4.2 build and didn&#8217;t crash for few days of replicate-only load, but once it got some parallel load, it crashed in next few hours. </p>
<p>I tried to chat about it on Freenode&#8217;s #gcc, and I got just:</p>
<pre>
noshadow&gt;	domas: almost everything that fails when
		optimized (as inlining opens many new
		optimisation possibilities)
noshadow&gt;	i.e: const misuse, relying on undefined
		behaviour, breaking aliasing rules, ...
domas&gt;		interesting though, I hit it just with
		gcc 4.2.3 and opterons only
noshadow&gt;	domas: that makes it more likely that
		it is caused by optimisation unveiling
		programming bugs
</pre>
<p>In the end I know, that there&#8217;s programming bug in ancient code using inlined functions, that causes memory corruption in multithreaded load if compiled with gcc-4.2 and ran on Opteron. As for now it is our fork, pretty much everyone will point at each other and won&#8217;t try to fix it :) </p>
<p>And me? I can always do:</p>
<pre>env CC=gcc-4.1 CXX=g++-4.1 ./configure ... </pre>
<p>I&#8217;m too lazy to learn how to disassemble and check compiled code differences, especially when every test takes few hours. I already destroyed my weekend with this :-) I&#8217;m just waiting for people to hit this with stock mysql &#8211; would be one of those things we love debugging ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2008/08/05/complicated-crashes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
