Ad

Perform Calculations Based On Signals In Array

- 1 answer

I have two columns - a 'close' column and a 'signals' column in an array. I would like to perform calculations on data in the 'close' column based on classified data that is in the 'signals' column. If the same signal appears consecutively (ignoring NANs) then do nothing, only perform a calculation when the 'signals' data at n+t index is opposite of the preceding 'signals' data at index n.

This is for a rudimentary back-testing code to prove the ability of an algorithm I have logically came up with. I understand that a for-loop is likely needed to apply properly but am not sure how to do so correctly when trying to apply to specific index points of the data.

PSEUDOCODE

for n in signals:
    if signals == 1: 
        if 'signals' n+t == 1 maintain 'close' at n index point:
        when 'signals' n+t == 2
            return ['close'(n+t) - 'close'(n)] in 'calculations' at index n+t

Here is an output I am looking to attain via a programmatic approach.

   close  signals  calculations
0  100    NAN      NAN
1  105    1        NAN
2  110    NAN      NAN
3  107    1        NAN
4  115    NAN      NAN
5  120    2        15

Thanks for any help and please let me know if any clarification is needed!

Ad

Answer

One way might be:

  1. Extract rows where "signals" are not null using dropna
  2. Remove consecutive duplicates using shift
  3. Set output column: if signal = 2, set close difference, else: set NaN. I use np.where()
  4. Update this column to the input dataframe using join

Here the code:

# Import modules
import pandas as pd
import numpy as np

# Build dataset
data = [[10,  np.NaN,  ],
        [105, 1,       ],
        [110, np.NaN,  ],
        [107, 1,       ],
        [115, np.NaN,  ],
        [120, 2,       ]]
df = pd.DataFrame(data, columns=["close", "signals"])


# Select rows where "signals" not null and remove duplicates
sub_df = df.dropna(subset=['signals'])

# Remove consecutive duplicates
sub_df = sub_df.loc[sub_df.signals.shift() != sub_df.signals]

# If signal == 2, set diff between close and previous close
# Else: set NaN
sub_df['output'] = np.where(sub_df.signals == 2, sub_df.close - sub_df.close.shift(), np.NaN)
print(sub_df)
#    close  signals  output
# 1    105      1.0     NaN
# 5    120      2.0    15.0

# Update dataframe with the new column
print(df.join(sub_df['output']))
#    close  signals  output
# 0     10      NaN     NaN
# 1    105      1.0     NaN
# 2    110      NaN     NaN
# 3    107      1.0     NaN
# 4    115      NaN     NaN
# 5    120      2.0    15.0
Ad
source: stackoverflow.com
Ad