Tuesday, December 14, 2010

scaling drupal - an open-source infrastructure for high-traffic drupal sites

scaling drupal - an open-source infrastructure for high-traffic drupal sites

the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.



when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.


a well designed deployment will not only increase your scalability, but will also enhance your redundancy by removing single points of failure. implemented properly, an unmodified drupal install can run on this new deployment, blissfully unaware of the clustering, routing and caching going on behind the scenes.

incremental steps towards scalability

in this article, i outline a step-by-step process for incrementally scaling your deployment, from a simple single-node drupal install running all components of the system, all the way to a load balanced, multi node system with database level optimization and clustering.

since you almost certainly don't want to jump straight from your single node system to the mother of all redundant clustered systems in one step, i've broken this down into 5 incremental steps, each one building on the last. each step along the way is a perfectly viable deployment.

tasty recipes

i give full step-by-step recipes for each deployment, that with a decent working knowledge of linux, should allow you to get a working system up and running. my examples are for apache2, mysql5 and drupal5 on debian etch, but may still be useful for other versions / flavors.

note that these aren't battle-hardened production configurations, but rather illustrative minimal configurations that you can take and iterate to serve your specific needs.

the 5 deployment configurations

the table below outlines the properties of each of the suggested configurations:















step 0step 1step 2step 3step 4step 5
separate web and dbnoyesyesyesyesyes
clustered web tiernonoyesyesyesyes
redundant load balancernononoyesyesyes
db optimization and segmentationnonononoyesyes
clustered dbnononononoyes
scalabiltypoor-poorfairfairgoodgreat
redundancypoor-poor-fairgoodfairgreat
setup easegreatgoodgoodfairpoorpoor-


step 0 - a basic drupal install


in step 0, i outline how to install drupal, mysql and apache to get a get a basic drupal install up-and-running on a single node. i also go over some of the basic configuration steps that you''ll probably want to follow, including cron scheduling, enabling clean urls, setting up a virtual host etc.





step 1 - a dedicated data server


in step 1, i go over a good first step to scaling drupal; creating a dedicated data server. by "dedicated data server" i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment.




step 2 - sticky load balancing with apache mod_proxy


in step 2, i go over how to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment.




step 3 - using heartbeat to implement a redundant load balancer


in step 3, i discuss clustering your load balancer. one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy.




step 4 - database segmentation using mysql proxy



in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling.




step 5 - the holy grail?


the holy grail of drupal database scaling might very well be a drupal deployment on mysql cluster. if you've tried this, plan to try this or have opinions on the feasibility of an ndb "port" of drupal, i'd love to hear it.

Reference: http://www.johnandcailin.com/blog/john/scaling-drupal-open-source-infrastructure-high-traffic-drupal-sites

No comments: