Ad

Differences In Reading Binary Files Between R And Python

- 1 answer

I have a binary file and I know the first 3 bytes should be 0x6c, 0x1b, and 0x01, in that order. I used python and R to load the file.

Python: f = open('pathToFile', 'rb') f.read(3) I get: out[2] b'l\x1b\x01'

While in R: readBin('pathToFile', what='raw', n=3) I get [1] 6c 1b 01

I do not understand this b'l' in the python output, since I am expecting 6c (as the R output shows).

What am I doing wrong?

Ad

Answer

So binary values are just numbers. Use list(f.read(3)) to see that in python. R and Python just use two different representations by default. R uses hexadecimal and Python uses ASCII (with \xhh for non-ascii values). The reason for this difference is that Python is a general purpose language and many (normally old but also not) terminals and programs use 7-bit ascii to display output. R ignores this usage as it's normally only interested in raw data.

To get the hexadecimal representation in python, use this bytes subclass

class BytesHex(bytes):
    def __repr__(self):
        return ' '.join('{:0>2x}'.format(b) for b in self)

BytesHex(b'l\x1b\x01')
Out[158]: 6c 1b 01
Ad
source: stackoverflow.com
Ad