Monday, May 25, 2009

LENGTH() and UTF-8

> Hi!
>
> We are storing UTF-8 data in out mysql database and we need to get the
> length of the data. But length() doesn't return the number of characters
> but the pure number of bytes.
>
> SELECT LENGTH('köter') => 6
>
> Currently we are doing something like that:
>
> SELECT LENGTH(CONVERT('köter' USING 'ucs2'))/2;
>
> This works fine but a "real" solution like CHAR_LENGTH() or something like
> that would be really apprectiated.


From http://dev.mysql.com/doc/mysql/en/string-functions.html:

CHAR_LENGTH(str)

Returns the length of the string str, measured in characters. A multi-byte
character counts as a single character. This means that for a string
containing five two-byte characters, LENGTH() returns 10, whereas
CHAR_LENGTH() returns 5.

Look at OCTET_LENGTH() and CHAR_LENGTH(). (While OCTET_LENGTH() is a
synonym, it is the SQL standard way of getting the length of a string
in bytes.)

Jochem

No comments: