Friday, April 17, 2009

COUNT(*) for Innodb Tables

COUNT(*) for Innodb Tables
Posted by peter
I guess note number one about MyISAM to Innodb migration is warning what Innodb is very slow in COUNT(*) queries. The part which I often however see omitted is fact it only applies to COUNT(*) queries without WHERE clause.

So if you have query like SELECT COUNT(*) FROM USER It will be much faster for MyISAM (MEMORY and some others) tables because they would simply read number of rows in the table from stored value. Innodb will however need to perform full table scan or full index scan because it does not have such counter, it also can’t be solved by simple singe counter for Innodb tables as different transactions may see different number of rows in the table.

If you have query like SELECT COUNT(*) FROM IMAGE WHERE USER_ID=5 this query will be executed same way both for MyISAM and Innodb tables by performing index rage scan. This can be faster or slower both for MyISAM and Innodb depending on various conditions.

In real applications there are much more queries of second type rather than first type so it is typically not as bad problem as it may look. Most typically count of rows is needed by admin tools which may show it in table statistics, it may also be used in application stats to show something like “We have 123.345 users which have uploaded 1.344.656 images” but these are normally easy to remove.

So remember Innodb is not slow for ALL COUNT(*) queries but only for very specific case of COUNT(*) query without WHERE clause. It does not mean I would not like to see it fixed though, it is pretty annoying.


COUNT ( column ) counts all non-NULL occurences (or should …



COUNT ( * ) counts all rows.


You say count(*), would count(id) result in a table scan (MyISAM) since it doesn’t know if id is NULL or not? I’m not even sure if it counts NULL or not, I guess i should check it out.

Comment :: December 1, 2006 @ 12:20 pm

2. peter

It depends on now field is defined if it is NOT NULL it will read the counter. If it can be NULL it will perform full table scan

Comment :: December 1, 2006 @ 1:10 pm

3. Martijn Tonies

Wesley,

COUNT ( column ) counts all non-NULL occurences (or should …

COUNT ( * ) counts all rows.

Comment :: December 2, 2006 @ 1:10 pm

4. Kevin Burton

This just needs to be fixed. I forgot about this when I migrated a couple of table sot MyISAM. We were computing COUNT in order to graph DB stats. Bad idea….

The issue is that most people don’t need a 100% accurate count. They just need lazy data. INNODB should add the ability to get this even if it’s not 100% correct.

Comment :: December 2, 2006 @ 3:59 pm

5. peter

Kevin,

I guess having counter such as MyISAM which would count “dirty” number of uncommitted rows could be possible but this would require some special option to enable as you can’t return wrong (approximate) result for query in default mode.

It is quite interesting where other transactional storage engines would stay on this.

Comment :: December 2, 2006 @ 4:10 pm

6. Sean

I agree with Kevin - unless you’re doing some sort of life or death calculation, it never needs to be totally accurate. Retarded decision on MySQL’s part…

Comment :: December 9, 2006 @ 6:23 pm

7. Sheeri Kritzer » Blog Archive » OurSQL Episode 2: Wild Performance Tips - The MySQL She-BA

[...] Peter Zaitsev’s blog post about count for InnoDB tables: http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/ [...]

Pingback :: December 13, 2006 @ 9:19 pm

8. peter

Sean,

The question in this case is semantics. SELECT COUNT(*) must return accurate value by standards, There is no way it would be made to return wrong data by default as this may break some application. So we’re speaking about option which will need to be specified such as inaccurate_count=1 ? Still not everyone will be able to use it because single instance can be shared by multiple applications. Having Some form of SQL FLAG, such as SELECT APPROXIMATE COUNT(*) would be better but will require application changes and at the same time one could use info from SHOW TABLE STATUS.

Comment :: December 28, 2006 @ 5:40 am

9. puRe aka Marcel Oelke » MySQL Performance: Use counter tables

[...] I guess many of you know, that using SELECT count(*) FROM table is problematic and slow when using Innodb tables. This actually only applies to COUNT(*) queries without WHERE a clause as mentioned in the MySQL Performance Blog. [...]

Pingback :: April 3, 2007 @ 4:16 am

10. John

I am seeing a query take up to 2 seconds (viewing the listing of members in my site) because at the top there’s a count. What steps can I take to speed this up? I’ve thought about doing a quick count total cache stored in a separate field, but that won’t work either, because depending on the search criteria, the count number can change at any time.

Comment :: June 17, 2007 @ 10:14 pm

11. kiran
How to find the top values of table row?

Comment :: August 15, 2007 @ 2:24 am

12. GSIY … Ruby-Rails Portal

[...] of transactions, foreign keys and other niceties, you might be aware of its limitations, like much slower count(*). Our DBAs are in a constant lookout for slow queries in production and the ways to keep DBs happy [...]

Pingback :: September 5, 2007 @ 9:00 am

13. Technocation, Inc.

OurSQL Episode 2: Wild Performance Tips…



Trackback :: September 11, 2007 @ 7:31 pm

14. John Swapceinski
If you want a quick and dirty count of the number of rows in a InnoDB table, this seems to work:

explain select count(*) from Table;

The result is fast and will give you a “rows” count that should be within 10% of the number of rows in the Table. So it seems the row count is being cached SOMEWHERE (I don’t know where). BTW, I am using MySQL v5.0.45.

Comment :: April 20, 2008 @ 11:20 am

15. Kishore
Dear Friends,
I am getting headache with MySql. I am new to .net and MYSQl.
At present lots of Issues occured in MySql …but the problem is slow..
“Whenever We are executing the Query it is taking lot’z of time to execute. Our tables contains nearly 3000 records…”
How can I optimize the Stored procedures..
(or)
How can I change this Sp into faster manner…
“Thanks In advance”

Comment :: June 24, 2008 @ 10:54 pm

16. Ries van Twisk

to Kishore.

I you need to read upon SQL, usage of indexes etc. 3000 records is nothing for any DB and shouldn’t be slow in any case.
Most likely your DB is not properly normalized and you don’t have proper indexes.

Ries

Comment :: July 18, 2008 @ 7:33 am

17. peter

Ries, Kishore

Indeed 3000 records is very small unless these are 10MB faxes stored directly in the database Wrong indexing or query which MySQL can’t optimize well is likely reason. If you report the EXPLAIN on our forums we might be able to help.

Comment :: July 20, 2008 @ 9:04 am

18. Fekke
3,000 records may be a problem. If you have three tables with 1,000 records each and you do a join involving the three tables with no indexes at all then you would have 1,000,000,000 records full scan to fetch the results.

So the optimization you acquire by creating the proper indexes not only relies on how many records your tables have but also what your queries are.

Comment :: December 20, 2008 @ 10:26 pm

19. nop
ready for a headache? try “select count($field) from $table;” and compare the time it takes to “select count($field) from $table use index($field);”
($field is the primary key (int autoincrement) in $table which was converted from myisam and has around 900k rows in it).

for me, the standard count without use index takes around 8 seconds - the count with forced index takes below 0.1 seconds.

any thoughts on why this is so?

Comment :: February 16, 2009 @ 6:18 pm

20. nop
correction: replace “8 seconds” with “5 minutes, 8 seconds”.

Comment :: February 16, 2009 @ 6:28 pm

21. Wallace
Why the performance gap?

Running this slow query took 138 seconds (2 min 18.09 sec)
SELECT COUNT(*) FROM dbmail_messageblks;
+———-+
| COUNT(*) |
+———-+
| 262788 |
+———-+
1 row in set (2 min 18.09 sec)

After optimizing the SQL, it took 0.27 seconds.
SELECT COUNT(*) FROM dbmail_messageblks use index(physmessage_id_index);
+———-+
| COUNT(*) |
+———-+
| 262796 |
+———-+
1 row in set (0.27 sec)

> SHOW CREATE TABLE dbmail_messageblks\G
*************************** 1. row ***************************
Table: dbmail_messageblks
Create Table: CREATE TABLE `dbmail_messageblks` (
`messageblk_idnr` bigint(21) NOT NULL auto_increment,
`physmessage_id` bigint(21) NOT NULL default ‘0′,
`messageblk` longblob NOT NULL,
`blocksize` bigint(21) NOT NULL default ‘0′,
`is_header` tinyint(1) NOT NULL default ‘0′,
PRIMARY KEY (`messageblk_idnr`),
KEY `physmessage_id_index` (`physmessage_id`),
KEY `physmessage_id_is_header_index` (`physmessage_id`,`is_header`),
CONSTRAINT `dbmail_messageblks_ibfk_1` FOREIGN KEY (`physmessage_id`) REFERENCES `dbmail_physmessage` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=602519 DEFAULT CHARSET=utf8

> EXPLAIN SELECT COUNT(*) FROM dbmail_messageblks\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: dbmail_messageblks
type: index
possible_keys: NULL
key: PRIMARY
key_len: 8
ref: NULL
rows: 1930308
Extra: Using index

> EXPLAIN SELECT COUNT(*) FROM dbmail_messageblks use index(physmessage_id_index)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: dbmail_messageblks
type: index
possible_keys: NULL
key: physmessage_id_index
key_len: 8
ref: NULL
rows: 1930310
Extra: Using index

No comments: