# Finding The Distance To The Next Higher Value In Pandas Dataframe

## 15 October 2019 - 1 answer

I have a data frame containing floating point values

``````my_df = pd.DataFrame([1,2,1,4,3,2,5,4,7])
``````

I'm trying to find for each number, when (how many indices need to move forward) till I find the next number larger than the current number, if there is no larger number, I mark it with some value (like 999999).

So for the example above, the correct answer should be

``````result = [1,2,1,3,2,1,2,1,999999]
``````

Currently I've solved it by very slow double loop with `itertuples` (meaning O(n^2))

Is there a smarter way to do it ?

Here's a numpy based one leveraging broadcasting:

``````a = my_df.squeeze().to_numpy() # my_df.squeeze().values for versions 0.24.0.<

diff_mat = a - a[:,None]
result = (np.triu(diff_mat)>0).argmax(1) - np.arange(diff_mat.shape[1])
result[result <= 0] = 99999
``````

``````print(result)

array([    1,     2,     1,     3,     2,     1,     2,     1, 99999],
dtype=int64)
``````

Where `diff_mat` is the distance matrix, and we're looking for the values from the main diagonal onwards, which are greater than `0`:

``````array([[ 0,  1,  0,  3,  2,  1,  4,  3,  6],
[-1,  0, -1,  2,  1,  0,  3,  2,  5],
[ 0,  1,  0,  3,  2,  1,  4,  3,  6],
[-3, -2, -3,  0, -1, -2,  1,  0,  3],
[-2, -1, -2,  1,  0, -1,  2,  1,  4],
[-1,  0, -1,  2,  1,  0,  3,  2,  5],
[-4, -3, -4, -1, -2, -3,  0, -1,  2],
[-3, -2, -3,  0, -1, -2,  1,  0,  3],
[-6, -5, -6, -3, -4, -5, -2, -3,  0]], dtype=int64)
``````

We have `np.triu` for that:

``````np.triu(diff_mat)

array([[ 0,  1,  0,  3,  2,  1,  4,  3,  6],
[ 0,  0, -1,  2,  1,  0,  3,  2,  5],
[ 0,  0,  0,  3,  2,  1,  4,  3,  6],
[ 0,  0,  0,  0, -1, -2,  1,  0,  3],
[ 0,  0,  0,  0,  0, -1,  2,  1,  4],
[ 0,  0,  0,  0,  0,  0,  3,  2,  5],
[ 0,  0,  0,  0,  0,  0,  0, -1,  2],
[ 0,  0,  0,  0,  0,  0,  0,  0,  3],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0]], dtype=int64)
``````

And by checking which are greater than `0`, and taking the `argmax` of the boolean `ndarray` we'll find the first value greater than `0` in each row:

``````(np.triu(diff_mat)>0).argmax(1)
array([1, 3, 3, 6, 6, 6, 8, 8, 0], dtype=int64)
``````

We only need to subtract the corresponding offset from the main diagonal to the beginning