Tuesday, January 19, 2010

Only Non-English Character Matching

<?php
//Non-English Character Matching:

//The "character set" block (square brackets) allow us to match characters, but since we can only range english chars, one trick is to use ASCII or UNICODE matching like this:

//ASCII matching can be performed like this: 
preg_match('/[\x00-\x80]+/', $str);

//Unicode matching can be performed like this:
preg_match('/[^\u0000-\u0080]+/', $str);

//To our case, to match only non-english chars use:
preg_match('/[^\x00-\x80]+/', $str);

//To match ALL chars (both english & non english & some non-chars as well, perhaps) use:
preg_match('/[a-zA-Z\x00-\xFF]+/', $str);

// Chinese characters range in unicode (verify?)
preg_match('/^[u4e00-u9fa5],{0,}$/', $str);
?>

No comments: