Ad

Is There A Way To Not Make A Copy When A Numpy Array Is Sliced?

- 1 answer

I need to handle some large numpy arrays in my project. After such an array is loaded from the disk, over half of my computer's memory will be consumed.

After the array is loaded, I make several slices (almost half of the array will be selected) of it, then I receive error tells me the memory is insufficient.

By doing a little experiment I understand, I receive the error because when a numpy array is sliced, a copy will be created

import numpy as np

tmp = np.linspace(1, 100, 100)
inds = list(range(100))
tmp_slice = tmp[inds]

assert id(tmp) == id(tmp_slice)

returns AssertionError

Is there a way that a slice of a numpy array only refers to the memory addresses of the original array thus data entries are not copied?

Ad

Answer

In Python slice is a well defined class, with start, stop, step values. It is used when we index a list with alist[1: 10: 2]. This makes a new list with copies of the pointers from the original. In numpy these are used in basic indexing, e.g. arr[:3, -3:]. This creates a view of the original. The view shares the data buffer, but has its own shape and strides.

But when we index arrays with lists, arrays or boolean arrays (mask), it has to make a copy, an array with its own data buffer. The selection of elements is too complex or irregular to express in terms of the shape and strides attributes.

In some cases the index array is small (compared to the original) and copy is also small. But if we are permuting the whole array, then the index array, and copy will both be as large as the original.

Ad
source: stackoverflow.com
Ad