Friday, May 21, 2010

INSERT ON DUPLICATE KEY UPDATE and summary counters

INSERT ON DUPLICATE KEY UPDATE and summary counters

INSERT ... ON DUPLICATE KEY UPDATE is very powerful but often forgotten MySQL feature. It was introduced in MySQL 4.1 but I still constantly see people unaware of it.
Myself I like this feature big deal because it is designed in truly MySQL style - very efficient solution for freqent task while keeping it beautiful and easy to use.
So what is this feature great for ? Well any kind of maintaining counters. If you're writing traffic accounting it could be traffic and number of packet passed for given port or IP address. For Web Applications it could be counting number of visits per page or IP address, number of times particular keyword was searched etc.
This functionality also makes it very easy to do incremental single pass log file processing and building summary tables.
Here is example:
SQL:

  1. CREATE TABLE ipstat(ip int UNSIGNED NOT NULL PRIMARY KEY,

  2.                           hits int UNSIGNED NOT NULL,

  3.                           last_hit timestamp);


  4. INSERT INTO ipstat VALUES(inet_aton('192.168.0.1'),1,now())

  5.                        ON duplicate KEY UPDATE hits=hits+1;
This example actually shows one more neat feature of MySQL - inet_aton and inet_ntoa functions which can convert IP address strings to integers and back. This allows to save on field length significantly by using 4 bytes instead of 15
The third feature this example takes advantage of is TIMESTAMP field. By default first TIMESTAMP column will have its value automatically updated to current timestamp on insert and update. We actually could have omitted now() in insert clause but this would require to specify list of columns which we skipped for sake of example.
So how would this example work ? Well just you would expect it. If there is no such IP address in the table it will be added with hits=1 if it is already where (note ip is PRIMARY KEY) it would be just incremented and last visit timestamp updated.
The benefit of using this feature insted of INSERT + UPDATE could be different, depending on number of new rows and data set size. 30% speedup should be typical. Performance increase is not the only benefit - what is even more important the application code becomes simplier - less error prone and easy to read.

No comments: