Friday, April 17, 2009

InnoDB COUNT(id) - Why so slow?

InnoDB COUNT(id) - Why so slow?
Asked by VoteyDisciple in MySQL Server
Tags: innodb, slow, count
I'm clearly missing something fundamental about how InnoDB tables work. I've created a table...

-- whole bunch of other other fields

I've batch inserted the first, say, 300,000 records (so they all carry distinct and sequential IDs at this point). No problem.

Then I run the following (highly sophisticated) query:


It takes around 20 seconds to run. EXPLAIN indicates it's using the primary key (i.e. the index), but that with the index it still has to contemplate just under (or sometimes over) 300,000 rows.

OPTIMIZE TABLE has no impact. Explicitly creating an index has no impact. Me whacking the server with a hammer has no impact but to summon the systems group to get me out of the machine room.

Seriously, what am I missing that would make this (and, for that matter, other "real" queries) run so slow when they seem so simple?

(I note, incidentally, that after running ALTER TABLE t ENGINE MyISAM, queries run as fast as I want them to. I realize MyISAM inherently allows faster SELECT access, but surely not a 20 second vs. sub-millisecond difference on a query for the total number of rows in the table?)

It may be to do with the buffering, InnoDb does not cache the index it caches into memory the actual data rows, becuase of this for what seems to be a simple scan it is not loading the primary key index but all the data into RAM and then running your query on it. This may take some time to work - hopefully if you were running queries after this on the same table then they would run much faster.

MyIsam loads the indexes into RAM and then runs its calculations over this space and then returns a result, as an index is generally much much smaller than all the data in the table you should see an immediate difference there.

Another option may be the way that innodb stores the data on the disk - the innodb files are a virtual tablespace and as such are not necessarily ordered by the data in your table, if you have a fragmented data file then this could be creating problems for your disk IO and as a result running slower. MyIsam generally are sequential files, and as such if you are using an index to access data the system knows exactly in what location on disk the row is located - you do not have this luxury with innodb, but I do not think this particular issue comes into play with just a simple count(*)
==================== explains this:

InnoDB does not keep an internal count of rows in a table. (In practice, this would be somewhat complicated due to multi-versioning.) To process a SELECT COUNT(*) FROM t statement, InnoDB must scan an index of the table, which takes some time if the index is not entirely in the buffer pool. To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. If your table does not change often, using the MySQL query cache is a good solution. SHOW TABLE STATUS also can be used if an approximate row count is sufficient. See Section 14.2.11, “InnoDB Performance Tuning Tips”.
todd_farmer:It actually does explain the difference - MyISAM understands that COUNT(ID) where ID is a PK column is the same as COUNT(*), which MyISAM keeps precalculated while InnoDB does not.
VoteyDisciple:What's really baffling is that I've never noticed this before. After playing with various configurations today I can see the caching in play, and I did re-read the InnoDB restrictions document (whatdya know, says right there 'don't try to count stuff'). I think the moral is, "Use MyISAM and hope I don't have to roll back a transaction."

1 comment:

ScW said...

found another time you might want to consider innodb for its record level locks. I have a fairly big DB that gets mostly read from. If all you're doing is a bunch of reads... myisam is great. But as soon as you updates or inserts... you're dealing with table level locking. So if you have someone grabbing a long-ish query (for example: getting a copy of all the records), and then some updates get queued up to be executed... you can run into a lock situation where the big select query has frozen your entire server until it completes because those updates aren't eligible to run until after the select is done. That's very very bad.