Tuesday, June 1, 2010

Drupal Caching

Drupal Caching

If your content was all static you would not need Drupal. However even the most dynamic of web sites have content chunks that are repeatedly shown to the users, and that’s where caching comes in.

Beyond Drupal – before we begin.

This post does not cover database-level query caching or PHP opcode caching: those are two separate topics deserving an analysis by experts. For those curious about the subject, there is an excellent presentation “Accelerating PHP Applications” (the only option it misses is my beloved eAccelerator).

memcached is another interesting mechanism: it works very much like a database cache, but it can be shared over a network (imagine a “cache server”), and it does not suffer from mySQL’s habit of dropping a table’s cache every time the table is changed. There is a price though: it’s up to you to make sure that memcached data has not grown stale. There is a module for using memcache with Drupal instead of the standard cache.

Drupal itself has several built-in and optional caching mechanisms, so I decided to sort it out here.

An ode to Drupal developers: much thinking and discussion has gone into the impossible task of creating this infinitely configurable yet still speedy CMS. Before you complain, check out the prior discussions and efforts like this, and then go ahead and propose an improvement (if you are in it just for the impotent bitching, go buy yourself a commercial product).

As of version 5.0 Drupal enjoys a pluggable cache system.

What is cached by Drupal?

  • sessions
  • variables
  • menus
  • blocks (not by default)
  • nodes ( = content) and comments

It would appear that the site admin cannot directly control caching of sessions and menus.

Partial Solution: DEVEL module has in its block an “Empty cache” item. This is a “total purge” option: it clears all caches. Very useful for developers.

Standard Drupal Caching assumes that only logged-in users need to see up-to-the-second content, and so the nodes (but not the blocks) seen by the casual visitors are cached.

“Aggressive” caching option is incompatible with many modules, and does a very good job ofmaking the danger clear to the admin. If just one of your modules is incompatible with aggressive caching, you have to switch it off.

However, this compatibility test is not perfect. It is possible for a module to raise the flag, yet pose not danger to aggressive caching (this has to do with some contributors using a potentially invasive hook for rather benign, yet extremely useful, applications). So, if you discover that your favorite module is “incompatible” with aggressive caching, do check around before mourning.

The cache itself can be a source of a slowdown: the most popular mySQL engine, myISAM, does table-only locking, and if your site really grows it will become a problem. The solution for large traffic sites is to move often updated tables to INNODB or similar engine.

Throttling is the ideological opposite of caching: instead of showing your content faster, it throttles (switches off) some blocks when you site is under high load, thus making Drupal work less. Throttling is built into the standard distribution, but you have to enable it on per-module basis (under Site Building -> Modules). It is a good last line of defense.

Contributed Modules

Block Cache

This module creates a cached version of each block. Block caching happens separately from page caching and so it will work for logged-in users whether or not page caching is enabled for the site. Cached blocks make just one call to the database and can therefore reduce the overhead required to render blocks that require complex queries and/or calculations such as some Views or Taxonomy-related blocks.

The module is very flexible in terms of fine tuning. I have yet to hear about a high volume site that does not use it for at least some blocks.

fastpath_fscache

This module uses Drupal’s pluggable cache system, but instead of a database it caches static pages as local files. This can give yor site one hell of a performance boost.

The cached pages are intelligently expired when the page contents are updated. However, block updates are not caught in the same manner, which is why the module also enforces a regular cache expiration triggered by cron. This means that as far as the visitors are concerned, the content of the blocks on your site will change only so often (once and hour if your cron runs every hour, more often if you shorten your cron gaps). – It’s a classic trade-off of speed versus flexibility. You decide what your needs are.

Boost

On the surface Boost is very similar to fastpath_fscache. It uses file caching and has been documented in great detail here, here, and here. Boost uses same pluggable cache system and, just like other modules, limits its field to the anonymous users, but architecturally its author has intentionally left space for caching for the logged-in users (possibly with personal per-user cache). – I am vary curious what comes out of this.

Boost promises higher performance than similar fastpath_fscache because it uses Apache mod_rewrite so that when a new request arrives, if the page has already been cached Apache serves it out without ever touching Drupal. So for non-authenticated users it’s just Apache serving static pages bloody fast. (The downside is that you cannot run it on Windows or on any Web server that does not fully support mod_rewrite. Sorry.)

It is also possible to have Boost file cache shared by several Web servers in a cluster, over, say, NFS. (Just remember to balance the IO).

There has been an interesting report of Boost working together with Memcache module where Boost handled anonymous user pages and Memcache took care of the logged-in users.

APC – Alternative PHP Cache

APC is two things. First it is an opcache and second it is a memory based cache. This module only handles the latter. This module overrides the built in caching in Drupal and uses memory based caching instead of database based caching.

Many people love and use APC – it is reliable and mature.

The only questions I would have before installing this module are:

  • can I also use APC as an opcode cache? How?
  • will it interfere with a different opcode cache?

Memcache API and Integration

The concept of memcache is deceptively simple: it’s a daemon that listens on a TCP port and accepts (or serves) key/value data. The data itself is stored in RAM. That’s it.

From this simple idea great possibilities arise, allowing to replace/augment a database cache, create and manage multiple cache “buckets” shared among clustered servers over a network, et cetera, et cetera.

The Drupal module takes an existing Pecl extension and plugs it into Drupal’s caching interface. Since the interface is standardized, memcache plays nicely with Boost and Advanced Cache (see below).

Even more importantly, it does not force you to give up your database cache – you can still have both (thus feeling safe about a power outage).

The installation looks scary, but it really is not as long as you follow the instructions. And the effort is worth it, since memcache can help you with some items not handled by file cache solutions, and thus improve logged-in user experience.

Advanced Cache

The advanced caching module is mostly a set of patches and a supporting module to bring caching to Drupal core in places where it is needed yet currently unavailable. These include caching nodes, comments, taxonomy (terms, trees, vocabularies and terms-per-node), path aliases, and search results.

Without Advanced Cache module, once a user logs in, the caching stops, because the content is considered too user specific to cache effectively. Just think of all the permutaios of blocks, personalization, keywords and stats! Now add the dangers of accidentally giving users access to cached content they are not authorized to view: a disaster in the making!

Advanced Cache patches Drupal to add very sophisticated cashing of information “atoms”, and keeps track of a whole lot of relevant restriction data. In my honest opinion, this functionality truly belongs in Drupal Core.

The module not only allows you to improve the experience of your logged-in (i.e. regular) users, it actually works well with the existing caching backends like memcache.

The installation is not for the timid, but if you are building a highly (but not overly) dynamic site with lots of logged-in users… you got to consider Advanced Cache.

The Bottom Line

  • Caching cad do miracles and turn an apparently sluggish site into a playground of joy.
  • You can apply caching outside Drupal: at the Database, for PHP code, and in the Web Server (see Apache2 mod_cache).

In Drupal:

  1. Start by playing with built in cache (basic and advanced) and throttling
  2. Deploy Block cache (intelligently!)
  3. For a site with lots of anonymous visitors consider fastpath_fscache.
  4. For a more sophisticated installation, Boost is its bigger brother.
  5. For memory caching APC is very straightforward.
  6. Memcache is state of the art, but you must be prepared to actually focus on the deployment – it is not a 3-minute fix.
  7. Advanced Cache, possibly with memcache (and even Boost?) is as sophisticated as it gets these days. But if only few of your visitors ever log in, it may be an overkill. Think of it this way: are you desperate enough to patch the core code?

I’d love to know what worked for you. Drop me a note!

3 comments:

Mike Carper said...

Nice article. I would like to point out that with the 6.x version of Boost it works with windows & is better behaved.

Unique Cameo Necklaces said...

Thanks for the tips. They worked great. Needed the extra support to get the job done. thanks again.

Richzendy said...

Very good article,This makes me confirm that drupal cache variables, thanks a lot