<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>domas mituzas &#187; opteron</title>
	<atom:link href="http://mituzas.lt/tag/opteron/feed/" rel="self" type="application/rss+xml" />
	<link>http://mituzas.lt</link>
	<description></description>
	<lastBuildDate>Thu, 12 Aug 2010 14:09:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1-alpha</generator>
		<item>
		<title>Crashes, complicated edition</title>
		<link>http://mituzas.lt/2008/08/05/complicated-crashes/</link>
		<comments>http://mituzas.lt/2008/08/05/complicated-crashes/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 09:22:00 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[wikitech]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[gcc]]></category>
		<category><![CDATA[inline]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[opteron]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=177</guid>
		<description><![CDATA[Usually our 4.0.40 (aka &#8216;four oh forever&#8217;) build doesn&#8217;t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen&#8230; As we used &#8230; <a href="http://mituzas.lt/2008/08/05/complicated-crashes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Usually our <a href='http://svn.wikimedia.org/viewvc/mysql/trunk/server/'>4.0.40</a> (aka &#8216;four oh forever&#8217;) build doesn&#8217;t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen&#8230;</p>
<p>As we used to run RAID0, a disk failure usually means system wipe and reinstall once fixed &#8211; so our machines all run relatively new kernels and OS (except some boxes which just refuse to die ;-), and we&#8217;re usually way more ahead than all the bunch of conservative RHEL users. </p>
<p>We had one machine which was reporting CPU/northbridge/RAM problems, and every MySQL crash was accompanied by <a href='http://en.wikipedia.org/wiki/Machine_Check_Exception'>MCEs</a>, so after replacing RAM, CPU and motherboard itself, we just sent the machine back to service, and asked them to do whatever it takes to fix it. </p>
<p>So, this machine, with proud name of &#8216;db1&#8242; comes and after entering the service starts crashing every day. I reduced InnoDB log file size, to make recovery faster, and would run it under &#8216;gdb&#8217;. Stacktrace on crash pointed to check-summing (aka folding) bunch of functions, so initial assumption was &#8216;here we get memory errors again&#8217;. So, for a while I thought that &#8216;db1&#8242; needs some more hardware work, and just left it as is, as we were waiting for new database hardware batch to deploy and there was a bit more work around.</p>
<p>We started deploying new database hardware, and it started crashing every few hours instead of every few days. Here again, reduced InnoDB transaction log size and gdb attached allowed to <a href='http://p.defau.lt/?pMUxpWwiGwwDOA1daO3Tiw'>trap the segfault</a>, and it was pointing again to the very same adaptive hash key calculation (folding!). </p>
<p>Unfortunately, it was non-trivial chain of inlined functions (InnoDB is full of these), so I built &#8216;-g -fno-inline&#8217; build, and was keenly waiting for a crash to happen, so I could investigate what and where gets corrupted. It did not. Then I looked at our <a href='http://p.defau.lt/?A6y0ZFUttppM_5_rNlmpmQ'>zoo</a> just to find out we have lots of different builds. On one hand it was a bit messy, on another hand, it showed few conclusions:</p>
<ul>
<li>Only Opterons crashed (though there&#8217;re like three year gap between revisions)</li>
<li>Only Ubuntu 8.04 crashed</li>
<li>Only GCC-4.2 build crashed</li>
</ul>
<p>After thinking a bit that:</p>
<ul>
<li>We have Opterons that don&#8217;t crash (older gcc builds)</li>
<li>Xeons didn&#8217;t crash.</li>
<li>We have Ubuntu 8.04 that don&#8217;t crash (they either are Xeons or run older gcc-4.1 builds)</li>
<li>We have GCC-4.2 builds that run nice (all &#8211; on Xeons, all on 8.04 Ubuntu). </li>
</ul>
<p>The next test was taking gcc-4.1 builds and running them on our new machines. No crash for next two days.<br />
One new machine did have gcc-4.2 build and didn&#8217;t crash for few days of replicate-only load, but once it got some parallel load, it crashed in next few hours. </p>
<p>I tried to chat about it on Freenode&#8217;s #gcc, and I got just:</p>
<pre>
noshadow&gt;	domas: almost everything that fails when
		optimized (as inlining opens many new
		optimisation possibilities)
noshadow&gt;	i.e: const misuse, relying on undefined
		behaviour, breaking aliasing rules, ...
domas&gt;		interesting though, I hit it just with
		gcc 4.2.3 and opterons only
noshadow&gt;	domas: that makes it more likely that
		it is caused by optimisation unveiling
		programming bugs
</pre>
<p>In the end I know, that there&#8217;s programming bug in ancient code using inlined functions, that causes memory corruption in multithreaded load if compiled with gcc-4.2 and ran on Opteron. As for now it is our fork, pretty much everyone will point at each other and won&#8217;t try to fix it :) </p>
<p>And me? I can always do:</p>
<pre>env CC=gcc-4.1 CXX=g++-4.1 ./configure ... </pre>
<p>I&#8217;m too lazy to learn how to disassemble and check compiled code differences, especially when every test takes few hours. I already destroyed my weekend with this :-) I&#8217;m just waiting for people to hit this with stock mysql &#8211; would be one of those things we love debugging ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://mituzas.lt/2008/08/05/complicated-crashes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
