I’m not sure if I’m the first coining in ‘LAMPS’ – scaled out LAMP environment with Squid in front, but it sounds cool. Squid is major component in content distribution systems, reducing the load from all the backend systems dramatically (especially with proper caching rules). We had various issues in past, where we used code nobody else seemed to be using – cache coordination, purges and of course, load.
Quite a few problems resulted in memory leaks, but one was particularly nasty: Squid processes under high load started leaking CPU cycles somewhere. After deploying profiling for squid we actually ended up seeing that the problem is inside libc. Once we started profiling libc, one of initial assumptions appeared to be true – our heap was awfully fragmented, slowing down malloc().
Here comes our steroids part: Google has developed a drop-in malloc replacement, tcmalloc, that is really efficient. Space efficient, cpu efficient, lock efficient. This is probably mostly used (and sophisticated) libc function, that was suffering performance issues not that many people wanted to actually tackle. The description sounded really nice, so we ended up using it for our suffering Squids.
The results were what we expected – awesome :) Now the nice part is that the library is optimized for multi-threaded applications, doing lots of allocations for small objects without too much of lock contention, and uses spinlocks for large allocations. MySQL exactly fits the definition, so just by using simple drop-in replacement you may achieve increased performance over standard libc implementations.
For any developers working on high-performance applications, Google performance tools provide easy ways to access information that was PITA to work on before. Another interesting toy they have is embedded http server providing run-time profiling info. I’m already wondering if we’d should combine that with our profiling framework. Yummy. Steroids.
Thanks for using tcmalloc! We’re proud of it too.
This is pretty damn interesting. So what version of squid did you link against TCmalloc, and were you using aufs ? or were you using FreeBSD ?
2.6 squid (it has been under performance engineering lately).
COSS is our storage used, on Linux (Ubuntu/Fedora were having the issue).
[...] jak większość programistów wie, systemowy (glibcowy) malloc jest delikatnie sprawę ujmując zbyt ogólny. mniej delikatnie – do niczego. sprawdza się przy programach gdzie wydajność nie jest kluczowa. problem jest znany. część projektów pisze własne wersje malloca – przykładem chociażby postgres i jest palloc. jak mi donieśli zaprzyjaźnieni admini google wypuścł, a kolesie od wikipedii przetestowali, nową wersję malloca. ta – napisana przez google’a, nazywa się tcmalloc i jest dużo szybsza. można obejrzeć np. spadek zajętości procesora przez squidy zrekompilowane z tcmalloc’iem – różnica powala. tcmalloc jest całkowicie bezstresowym zastępnikiem do standardowego malloca. można go alko zlinkować przy budowaniu, albo zaladować przy pomocy LD_PRELOAD. polecam przyjrzenie się temu – wygląda mocno obiecująco. [...]
What does tcmalloc give you over the Hoard allocator? Its been around for ages, and it very cross-platform, and has been shown to outperform ptmalloc by a huge amount…
tcmalloc is awesome! We had very similar problems, with CPU spikes on our MySQL server… Queries started taking ages and the replication got waaay behind the master, it just couldn’t keep up.
Using tcmalloc with MySQL really improved things, and the CPU load is now much more even! :)
[...] Domasz Mituzas stated Now the nice part is that the library is optimized for multi-threaded applications, doing lots of [...]
[...] One of the web apps I work on has a really fantastic test suite. We have unit tests, system tests, page tests, the whole thing makes it a system I’m proud to have a small part in. However, the test suite takes 59 minutes to run on my laptop. It runs against a local postgres database loaded with sample data, and I occasionally try various things to see if I can make it run faster. This is about my experiment with tcmalloc, something I’ve been wanting to try for months now and finally got around to doing. The pound load balancer uses tcmalloc, and Domas blogged about using tcmalloc with squid and MySQL a while ago. [...]
[...] already wrote about tcmalloc, and how it helped with memory fragmentation. This time had some experience with [...]
Did you try comparing it to other allocators optimized for multi-threading such as hoard, mtmalloc or ptmalloc?