Friday, March 13, 2009

Parsing a binary file in PHP

Parsing a binary file in PHP
Published by netty5, last update on Tuesday November 25, 2008 06:33:01 AM by netty5


Parsing a binary file in PHP







When using low level languages like C or Pascal, it is a common procedure, to data in a binary file (a record that can’t be translated into text).


Using C language, suppose you want to save the value 500 in a file, the code will be as follows:


#include

int main()
{
int val = 500;
FILE *fp = fopen("file", "wb");

fwrite(&val, sizeof(int), 1, fp); //store val in "file"
fclose(fp);
return 0;
}


When opening this particular file with a text editor, you may find it unreadable because your value is not saved as a text but in raw form binary.

But if you use PHP, it is often necessary to retrieve values stored in binary from time to time. However, PHP reads and writes in the files as text. A specific function must be used to retrieve your values.

The solution

The function unpack() can be use to solve this kind of concern.
As first argument, you must declare the type of data you want to recover and as second argument, the string from which you want to retrieve the data.

The type of data to be recovered must be detailed in the form of a symbolic nature. For example, to retrieve a signed integer, use the i character.

So if we look at the file we have record in the example above, here's the code to retrieve our value:


<?
$fp = fopen("file", "rb");
$data = fread($fp, 4); // 4 is the byte size of a whole on a 32-bit PC.
$number = unpack("i", $data);
echo $number[1]; //displays 500
?>

Important notes:

The data size may change depending on the processor architectures (Sparc, ARM, PowerPC).

A program written in C, use integers of different sizes from 32-bit to 64 bits.

The arrangement of data may not be the same. Some machines store data in Big Endian, others in Little Endian.

The data size can vary depending on the compiler.

The unpack function returns an array a little more elaborate that the one given as example below. In our case, with one requested value, our value is in the offset 1 of the array.


Equivalencies formats / data types for 32-bit PC


Here is a table comparing the data recorded by a C program compiled for a 32-bit PC.

char: c
unsigned char: C
short: s
unsigned short: S
int: I
unsigned int: L
float: f
double: d

No comments: