Friday, April 17, 2009

MySQL InnoDB: performance gains as well as some pitfalls

MySQL InnoDB: performance gains as well as some pitfalls
In the vast majority of cases, web sites will be setup with the MyISAM engine for all the Drupal tables.

MyISAM Engine

MyISAM is the default engine on most MySQL installations, and is oriented towards read heavy workload. MyISAM started life for datawarehousing and analytics, where transactions are rarely needed. Its has no support for things that relational databases take for granted nowadays, such as lack of referential integrity, row level locking, ...etc.


For transactional workload, MySQL relies so far on InnoDB from InnoBase. InnoDB has all the features of a "real" relational database engine, including row level locking, referential integrity, ...etc..

Drupal and locking

Drupal, by default, caters for MyISAM, since this is how it started, and this is what most people have on their hosting machines. So, Drupal does table level locking, as per this code in

function db_next_id($name) { $name = db_prefix_tables($name); db_query('LOCK TABLES {sequences} WRITE'); $id = db_result(db_query("SELECT id FROM {sequences} WHERE name = '%s'", $name)) + 1; db_query("REPLACE INTO {sequences} VALUES ('%s', %d)", $name, $id); db_query('UNLOCK TABLES'); return $id;} This code is used every time we assign a new id for an object, for example, a user, a node, a comment, a category term.

There is also this function:

function db_lock_table($table) { db_query('LOCK TABLES {'. db_escape_table($table) .'} WRITE');} And this one to go with it:

function db_unlock_tables() { db_query('UNLOCK TABLES');} These last functions are called in Drupal core (5.x) in the following functions:

./includes/ db_lock_table('variable');./includes/ db_lock_table($table);./modules/block/block.module:190: db_lock_table('blocks');The first one in is when setting a variable, which is usually not a frequent operation. The second one in is in cache_set(), which can be a frequent operation for the filter cache, and page cache, and can cause contention.

There has been some work on eliminating locking in Drupal, and it should be available in Drupal 6.x.

How InnoDB can help with locking
InnoDB can help with locking because its locks are on the row level, not on the table level.

So, if a busy site has lots of people adding comments, and browsing, the node and comment modules call check_markup(), which in turn calls cache_get() and cache_set().

On a large site, I commented out the filter cache code altogether so as to eliminate the contention for that table.

Similarly, if the page cache is turned on by the site administrator, there could be contention on the table if there are lots of adds on it.

Auto Increment and Locking in InnoDB
Note that InnoDB still locks the entire table when there is and auto increment column. This is to avoid two inserts at the same time giving duplicate values for the auto increment column.

So, watch out for that caveat too.

InnoDB Pitfalls
However, all is not rosy with InnoDB. Because of its transactional nature, it has bottlenecks of its own. On MyISAM, doing a query that does SELECT COUNT(*) FROM {some_table}, is very fast, since MyISAM keeps the information in the index.

On InnoDB, this info is not stored in an index, and even the index and the data are kept in the same file. So, doing the same query on a table can incur a significant performance penalty.

To check what overhead this has, I wrote a simple test benchmark code. I duplicated a client node table that has 20,243 rows from MyISAM to InnoDB.

On a quiescent AMD 64 machine with MySQL server 5.0.24, doing a SELECT COUNT(*) FROM node takes 0.835 milliseconds on MyISAM, while on InnoDB it takes 12.292 milliseconds!

Running the query a second time causes comparable numbers, with MyISAM taking 0.331 ms, and InnoDB taking 0.488 ms. However, this is due to the query cache having the results in there.

In practice, this affects Drupal in some blocks, like the "popular content" block for example. A client had to turn that block off to save some 1200 milliseconds per page load after they converted to InnoDB.

Patch to remove database locking from Drupal.
InnoDB slow COUNT(*) by Peter Zaitsev.
InnoDB locks the entire table for statement duration for inserts with auto increment columns.
‹ Monitoring MySQL queries using PROCESSLIST along with system resource utilization up MySQL my.cnf configuration for a large Drupal site ›
» Add new comment
InnoDB count(*) limitation
Submitted by Morgan Tocker (not verified) on Tue, 2007/05/29 - 09:47.
> On InnoDB, this info is not stored in an index, and even the index and the data are kept in the same file. So, doing the same
> query on a table can incur a significant performance penalty.

I don't think having the index and data in separate files has much to do with the limitation. InnoDB keeps multiple versions of each record at any one time in order to boost concurrency[1]. Since in any given transaction the "number of rows" in {some_table} is going to be different, it's harder to maintain this sort of information.

Some applications can get around the SELECT count(*) FROM table issue by replacing it with a SELECT MAX(primary_key) FROM table. This is only going to be correct if your table has a primary key that increments one value at a time, and there are no deleted records.


InnoDB with Drupal
Submitted by Jesse (not verified) on Mon, 2007/03/05 - 17:28.
I work on a site which uses InnoDB for the row-locking and we had to change a lot of the core functions to get the proper performance. I lay this squarely at the feet of Drupal, whose "database abstraction" isn't really all that abstract -- it basically assumes MySQL+MyISAM. There's also no abstraction for creating/modifying tables, so that the burden of creating tables for multiple databases is placed on the shoulders of the module developers. I'd say 70% of third-party modules install correctly only for MySQL and often explicitly set the table type to MyISAM.

With InnoDB MySQL provides a way of calculating the found rows in a given query: the SQL_CALC_FOUND_ROWS keyword. For example,

To get the row count one then executes SELECT FOUND_ROWS() to get the previous row count. We replaced the logic in pager_query to do this rather than utilizing the "count SQL" parameter.

Sounds like a module to
Submitted by Damien McKenna (not verified) on Tue, 2008/11/18 - 13:50.
Sounds like a module to query_alter() all "SELECT COUNT(*)" statements with SQL_CALC_FOUND_ROWS is in order...

what about creating a MySQL+InnoDB abstaction layer...
Submitted by JimBroad (not verified) on Mon, 2008/05/05 - 10:53.
From what I've read, drupal allows for addition of module-like abstraction layers. Why not create such an abstraction layer, instead of modifying core/current abstraction layer? Sure, you may then need to modify the .install file of the modules you are using, to ensure they are setup properly for the new InnoDB abstraction layer, but that would be quite trivial considering they are likely already set up for MySQL.

Abstractions has its limits
Submitted by Khalid on Mon, 2007/03/05 - 18:12.
Abstraction of table creation and modification has been discussed, and it seems that the general feeling is leaning towards it.

One has to note that abstraction has its limits, specially when it comes to highly optimized environments.

To squeeze out performance, it is necessary to have custom code or custom SQL, which may be overkill for small to medium sites. The very example you cite would not work on PostgreSQL, and other code is necessary.

Rather than write on blogs, why don't you share your modifications in the form of patches attached to issues on so everyone can see/use, and they may be even accepted into core?
2bits -- Drupal consulting


1 comment:

Anonymous said...

let me share my experience with regard to the service of repair database sql server, it automatically eliminates data corruption issues in selected databases