Thursday, November 20, 2014

How to properly handle a gzipped page when using curl?

Method 1:

curl will automatically decompress the response if you set the --compressed flag:

# curl --compressed ""

--compressed (HTTP) Request a compressed response using one of the algorithms libcurl supports, and save the uncompressed document. If this option is used and the server sends an unsupported encoding, curl will report an error.

gzip is most likely supported, but you can check this by running curl -V and looking for libz somewhere in the "Features" line:

$ curl -V
Protocols: ...
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz
Note that it's really the website in question that is at fault here. If curl did not pass an Accept-Encoding: gzip request header, the server should not have sent a compressed response.

Method 2:

curl -sH 'Accept-encoding: gzip' | gunzip -

Note: You can use -D to dump headers to a file, e.g. -D headers.txt and it will save them to the file out of band so it won't screw up your gzip encoding

No comments: