To check if a file contains UTF-8 BOM at header:
# hexdump -n 3 -C 2.txt
00000000 ef bb bf
ef bb bf // YES
====================
ISO8859-1 is almost identical to -15 where -15 replaces one encoding
with the Euro symbol and includes a few more french symbols. The only
way to tell them apart would be to look at the symbols in context.
UTF-8 is identical to ISO8859 for the first 128 ASCII characters which
include all the standard keyboard characters. After that, characters
are encoded as a multi-byte sequence.
Unicode is usually encoded in UTF-16. If you're lucky, there might be
a BOM (Byte Order Mark) of 0xFFFE or 0xFEFF as the first two characters
in the file. Otherwise, look for a 0x00 (Null character) as every
other character if the text file contains basic 7-bit ASCII characters.
http://www.xpheads.com/forums/microsoft-public-windowsxp-help_and_support/164700-how-detect-if-text-file-iso8859-1-iso8859-15-utf-8-unicode-encoded.html
Saturday, January 29, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment