Ad

Python Read Part Of Large Binary File

I have large binary file (size ~2.5Gb). It contains header (size 336 byte) and seismic signal data (x, y and z channels) with type int32. Count of discrete is 223 200 000. I need read part of signal. For example, I want get part of signal in interval of discrete [216 000 000, 219 599 999]. I wrote the function:

def reading(path, start_moment, end_moment):
    file_data = open(path, 'rb')
    if start_moment is not None:
        bytes_value = start_moment * 4 * 3
        file_data.seek(336 + bytes_value)
    else:
        file_data.seek(336)

    if end_moment is None:
        try:
            signals = np.fromfile(file_data, dtype=np.int32)
        except MemoryError:
            return None
        finally:
            file_data.close()
    else:
        moment_count = end_moment - start_moment + 1
        try:
            signals = np.fromfile(file_data, dtype=np.int32,
                                  count=moment_count * 3)
        except MemoryError:
            return None
        finally:
            file_data.close()
    channel_count = 3
    signal_count = signals.shape[0] // channel_count
    signals = np.reshape(signals, newshape=(signal_count, channel_count))
    return signals

If I run script with the function in PyCharm IDE I get error:

Traceback (most recent call last): File "D:/AppsBuilding/test/testReadBaikal8.py", line 41, in signal_2 = reading(path=path, start_moment=216000000, end_moment=219599999) File "D:/AppsBuilding/test/testReadBaikal8.py", line 27, in reading count=moment_count * 3) OSError: obtaining file position failed

But if I run script with parameters: start_moment=7200000, end_moment=10799999 all ok. On my PC was installed Windows7 32bit. Memory size is 1.95Gb Please, help me resolve this problem.

Ad

Answer

Divide the file into small segments, freeing memory after each small piece of content is processed

def read_in_block(file_path):
    BLOCK_SIZE = 1024
    with open(file_path, "r") as f:
        while True:
            block = f.read(BLOCK_SIZE)  
            if block:
                yield block
            else:
                return  

        print block
Ad
source: stackoverflow.com
Ad