Monday, February 24, 2014

detect file encoding and convert it to UTF-8

# file -i input.txt
input.txt: text/html; charset=unknown-8bit

# iconv -f unknown-8bit -t utf-8 input.txt > out.txt
iconv: iconv_open(utf-8, unknown-8bit): Invalid argument

# cd /usr/ports/converters/enca ; make install clean

# enca -L none input.txt
Universal transformation format 8 bits; UTF-8
Surrounded by/intermixed with non-text data

# iconv -f utf-8 -t utf-8 input.txt > out.txt

