Ad

Faster Way To Threshold A 4-D Numpy Array

- 1 answer

I have a 4D numpy array of size (98,359,256,269) that I want to threshold. Right now, I have two separate lists that keep the coordinates of the first 2 dimension and the last 2 dimensions. (mag_ang for the first 2 dimensions and indices for the last 2).

size of indices : (61821,2)

size of mag_ang : (35182,2)

Currently, my code looks like this:

inner_points = []

for k in indices:
    x = k[0]
    y = k[1]
    for i,ctr in enumerate(mag_ang):
        mag = ctr[0]
        ang = ctr[1]
        if X[mag][ang][x][y] > 10:
            inner_points.append((y,x))

This code works but it's pretty slow and I wonder if there's any more pythonic/faster way to do this?s

Ad

Answer

(EDIT: added a second alternate method)

Use numpy multi-array indexing:

import time

import numpy as np

n_mag, n_ang, n_x, n_y = 10, 12, 5, 6
shape = n_mag, n_ang, n_x, n_y
X = np.random.random_sample(shape) * 20

nb_indices = 100 # 61821
indices = np.c_[np.random.randint(0, n_x, nb_indices), np.random.randint(0, n_y, nb_indices)]

nb_mag_ang = 50 # 35182
mag_ang = np.c_[np.random.randint(0, n_mag, nb_mag_ang), np.random.randint(0, n_ang, nb_mag_ang)]

# original method
inner_points = []
start = time.time()
for x, y in indices:
    for mag, ang in mag_ang:
        if X[mag][ang][x][y] > 10:
            inner_points.append((y, x))
end = time.time()
print(end - start)

# faster method 1:
inner_points_faster1 = []
start = time.time()
for x, y in indices:
    if np.any(X[mag_ang[:, 0], mag_ang[:, 1], x, y] > 10):
        inner_points_faster1.append((y, x))
end = time.time()
print(end - start)

# faster method 2:
start = time.time()
# note: depending on the real size of mag_ang and indices, you may wish to do this the other way round ?
found = X[:, :, indices[:, 0], indices[:, 1]][mag_ang[:, 0], mag_ang[:, 1], :] > 10
# 'found' shape is (nb_mag_ang x nb_indices)
assert found.shape == (nb_mag_ang, nb_indices)
matching_indices_mask = found.any(axis=0)
inner_points_faster2 = indices[matching_indices_mask, :]
end = time.time()
print(end - start)

# finally assert equality of findings
inner_points = np.unique(np.array(inner_points))
inner_points_faster1 = np.unique(np.array(inner_points_faster1))
inner_points_faster2 = np.unique(inner_points_faster2)
assert np.array_equal(inner_points, inner_points_faster1)
assert np.array_equal(inner_points, inner_points_faster2)

yields

0.04685807228088379
0.0
0.0

(of course if you increase the shape the time will not be zero for the second and third)

Final note: here I use "unique" at the end, but it would maybe be wise to do it upfront for the indices and mag_ang arrays (except if you are sure that they are unique already)

Ad
source: stackoverflow.com
Ad