Posts Tagged ‘profiling’

PMP!

Sunday, December 27th, 2009

It is a glorious day today – Poor Man’s Profiler (previously introduced here) just got its own website. Do visit it at http://poormansprofiler.org/ – and contribute to better tomorrow.

on tools and operating systems

Tuesday, March 31st, 2009

Sometimes people ask why do I use MacOSX as my main work platform (isn’t that something to do with beliefs?). My answer is “good foundation with great user interface”. Though that can be treated as “he must like unix kernel and look&feel!”, it is not exactly that.

What I like is that I can have good graphical stable environment with some mandatory tools (yes, I used OS-supplied browser, mail, etc), but beside that maintain the bleeding edge open-source space (provided by MacPorts).

Also what I like, is OS-supplied development and performance tools. DTrace included is awesome, yes, but Apple did put some special touch on it too. This is visualization environment for dtrace probes and other profiling/debugging tools:

memory usage profiling

Even the web browser (well, I upgraded to Safari4.0 ;-) provides some impressive debugging and profiling capabilities:

safari web inspector

Of course, I end up running plethora of virtual machines (switching from Parallels to VirtualBox lately), but even got a KDE/Aqua build (for kcachegrind mostly). I don’t really need Windows apps, and I can run ‘Linux’ ones natively on MacOSX, and I can run MacOSX ones on MacOSX.

There’s full web stack for my MediaWiki work, there’re dozens of MySQL builds around, there’re photo albums, dtrace tools, World of Warcraft, bunch of toy projects, few different office suites, Skype, NetBeans, Eclipse, Xcode, integrated address books and calendars, all major scripting languages, revision control systems – git, svn, mercurial, bzr, bitkeeper, cvs, etc.

All that on single machine, running for three years, without too much clutter, and nearly zero effort to make it all work. Thats what I want from desktop operating system – extreme productivity without too much tinkering.

And if anyone blames me that I’m using non-open-source software, my reply is very simple – my work output is open-sourced.

Poor man’s contention profiling

Sunday, February 15th, 2009

I wrote already about poor man’s query profiling techniques last summer. Today I’m going to share poor man’s contention profiling tool, for all these poor souls which do not have Solaris with dtrace or pstack, don’t want to run tcmalloc contention profiler, and simply need some easy hack to see ‘what the heck is going on in my server’. Here it is:

gdb \
    -ex "set pagination 0" \
    -ex "thread apply all bt" \
    --batch -p $(pidof mysqld)

Run few times, and you will have enough samples to start judging. Do note, this may stop the process execution for a second, so do not spam it in too tight loop.
Once you have results it is just a matter of 20-liner script to extract any useful calculations :)

P.S. I’d love to see efficient pstack implementation for 64-bit Linux :)

update: this now lives at http://poormansprofiler.org

dtrace!

Friday, October 3rd, 2008

At the MySQL developer conference I accidently showed up some of things we’ve been doing with dtrace (I used it in few cases and realized the power it has), and saw some jaws drop. Then I ended up doing small demos around the event. What most people know about dtrace, is that there’re some probes and you can trace them. What people don’t know is that you can actually create lots of probes dynamically, and use them with lots of flexibility.

One of major things not really grasped by many is that dtrace is a combination of a tracing tool, debugger, programming language and a database, having minor, but very valuable functionality for each. It can attach to any place in code, it can get stacks, function arguments, traverse structures, do some evaluations, aggregate data, and in the end – thats all compiled code executed by kernel (or programs). 

Sometimes a probe may look not that useful (strace would provide file writes too?), but once combined with ability to get immediate stack, as well as set or read context variables (a previous probe on any other event could have saved some important information, e.g. host,user,table names, etc) – so final result may tell statistics correlated to many other activities. 

One developer (a traitor who has left support for easier life in engineering dept) listened to all this, and I asked what his current project was – apparently he was adding static dtrace probes to MySQL. It ended up being quite interesting discussion, as static probes provide two value points. First of all, it provides an interface – whereas dynamic probes can change with code changes (though, that doesn’t happen too often :) Second value – one can do additional calculations on a specific probe, which would be done only on-demand (when the probe is attached). 

So, having a static probe that directly maps to easy-mode dynamic one (it is straightforward to attach to a function, and quite easy to read its arguments), is a bit of waste (both in development time, as well as few instructions are actually written there). Dynamic tracing generally modifies binaries on fly – so it does not carry additional costs otherwise. Though an example where static probe would be awesome – having “query start” event, which would have query string canonized with all literals removed – this would allow on-demand query profiling for query groups, rather than stand-alone queries.

The other major value is ability to set thread-specific context variables in different probes, so they can read each other data. At the type of incoming packet one can tag the thread with whichever information needed – then any subsequent actions can reuse such information to filter out important events. That also removes the need of static probes providing multiple-layer information – it all can be achieved by chaining the events – without too much complexity. 

I took a bit of trollish stance when approached a developer implementing internal performance statistics. We were playing a game – he’d tell me what kind of performance information he’d like to extract, and I’d show a method to do that with dtrace. More people from monitoring field joined, and we ended up discussing what is the perfect performance monitoring and analysis system. It is quite easy to understand, that different people will need different kinds of metrics. For MySQL development work performance engineer will need mutex contention information, someone fixing a leak will need heap profiling, someone writing a feature will want an easy way to trace how server executes their code – and all that is way far from any needs actual user or DBA has. Someone who writes a query just wants to see the query plan with some easy-to-understand costs (just need to pump more steroids into EXPLAIN). DBAs may want to see resource consumption per-user, per-table, etc (something Google patch  provides). It is interesting to find a balance, between external tools and what should be supported out-of-the-box internally – and it is way easier to force internal crowd to have proper tools, and it is always nice to provide a much as possible instrumentation for anyone externally. 

Of course, there’s poor guy in the middle of two camps – a support engineer – who needs easy performance metrics to be accessible from clients, but needs way more depth than standard tools provide. In ideal case dtrace would be everywhere (someone recently said, thats one of coolest things Sun has ever brought) – then we’d be able to retrieve on-demand performance metrics from everywhere, and would be tempted to write DTraceToolkit  (a suite of programs that give lots and lots of information based on dtrace) like bunch of stuff for MySQL internals analysis.

I already made one very very simple tool  which visualizes dtrace output, so we can have graphviz based SVG callgraph for pretty much any type of probe (like, who in application does expensive file reads) – all from a single dtrace oneliner. It seems I can sell the tool to Sun’s performance engineering team – they liked it. :) 

Some people even installed Solaris afterwards for their performance tests. Great, I won’t have to (haha!).

Though lack of dtrace in Linux is currently a blocker for the technology, lots of engineers already have it on their laptops – MacOSX 10.5 ships it. It even has visual toolkit, that allows building some dtrace stuff in a GUI. 

I’m pretty sure now, any engineer would love dtrace (or dtrace based tools), they just don’t know that yet.

Poor man’s query profiling

Thursday, May 22nd, 2008

There is one very easy way to find database queries that might be performance bottlenecks on properly running servers, without using any sophisticated tools or using precision slowlogs,

Just overload your server, and spam ‘SHOW FULL PROCESSLIST’. :)

On valgrind and tcmalloc

Thursday, March 27th, 2008

I already wrote about tcmalloc, and how it helped with memory fragmentation. This time had some experience with extended features – memory profiling and leak checker. With tcmalloc it is possible to get an overview as well as detailed reports of what areas of memory are allocated for what uses. Even more, it can detect and report any memory that leaked. Of course, valgrind does that too. With one slight difference:

  • valgrind slows down applications 20-40 times (my tests ran 4s instead of 14ms)
  • tcmalloc does not. Same 14ms.

I wrote some MySQL UDFs for profiling and heap checking management, so can extract per-thread single-test stuff. Will try to clean up and release. Would be shame not to.

mediawiki graphic profile

Wednesday, January 18th, 2006

So I used amazing tool named KCacheGrind to parse XDebug output of MediaWiki. Of course, I immediately started refactoring some places in code. It is a random web hit, I don’t know even what it is, but it looks nasty and waiting for a gentle touch. Kill remote interwiki caches! :)

Here is what I got (triple click to zoom):
mediawiki graphic profile thumb

Here is what I used:
kcachegrind thumb

Yummy!

profiling web applications

Wednesday, January 11th, 2006

I’ve experimented a bit with wikipedia’s existing profiling framework lately, and extracted some of bits from that. The way we do it now is having our internal profiler written in PHP, and wfProfileIn() / wfProfileOut() function calls around the code. As there’re quite a lot of profiling entry places, overhead isn’t tolerable at high loads, so it’s turned on only for every 50th page view.

One of interesting entry points is in Database::query(), we have a tiny set of regexps in Database::generalizeSQL, that convert literal arguments into ‘X’ for string and ‘N’ for integer. This way we end up with all similar queries grouped together in profiling report, showing us which queries need some love or bring down our cluster.

Profiler may have different personalities, as result can be written to database table, printed out to a user, or sent over a network packed into UDP packets. I made a special daemon for handling those messages, which is just in-memory hashtable, that is updated on every incoming UDP message and any incoming TCP connection gets full XML export. Later this can be served in human output by reporting script. It is very easy to scale such profiling system, as state preservation is not needed, and XMLs can be aggregated. I guess it was possible to do that with HEAP tables in MySQL, but writing tiny daemons is sooo fun :-)

One of plans now is to rewrite profiler class into C (that would provide ability to run large-scale all-request profiling again), and merge that with our setproctitle extension, which currently allows to see what part of PHP code is being executed:

httpd: wfMsgGetKey [enwiki]
httpd: main-cleanup [dewiki]
httpd: requestfinish
httpd: query: SELECT page_id FROM `page` WHERE page_namespace = 'X' LIMIT N  [enwiki]
httpd: query: SELECT blob_text FROM `blobs` WHERE blob_id = 'X' LIMIT N  [enwiki]

Anyway, do not think of in-code profiling as of a replacement for in-core profilers such as APD or xdebug, as these can tell much more accurately where you’re fast and where you’re slow (sometimes these results may surprise!). But what you can win with code-controlled profiler – a general view for deployed application in distributed environments, that relies not on call-tree, but rather your own defined profiling blocks.

And now we know that :

  • 20-25% of our MediaWiki execution real time is spent waiting for MySQL (3.5ms per query)
  • 16% for memcached (or actually, Tugela, my memcached+BerkeleyDB hack :)
  • 20% of requests to backend are searches, and we spend 70ms average waiting for Lucene and Mono based mwdaemon to respond
  • Saving of an article takes around 0.8s, page view is around 0.1s
  • Average times (and deviations, if someone hacks reporting :) for every query…

In most cases developers without tools do miss real performance hotspots. This is what tools are for! :-)