Ad

Remove Previous Rows From Dataframe Based On Condition

I have two dataframe say df1 (primary dataframe) and df2. I want to drop all previous rows from df1 based on a condition from df2. My dataframe are like below:

df2

           tradingsymbol      Time
0                   AAAA  12:54:40
1                   BBBB  12:53:33
2                   CCCC  12:51:50

df1.head(20)

            tradingsymbol      Time  last_price
0                    AAAA  09:20:10       84.40
1                    AAAA  09:20:10       85.95
2                    AAAA  12:55:60       84.70 <-Valid Row
3                    AAAA  13:22:10       86.35 <-Valid Row
4                    AAAA  14:55:40       87.10 <-Valid Row

5                    BBBB  09:20:13       88.95
6                    BBBB  09:20:13       88.80
7                    BBBB  09:20:14       88.30
8                    BBBB  14:23:11       87.30 <-Valid Row

9                    CCCC  09:20:15       90.15
10                   CCCC  09:20:16       90.10
11                   CCCC  09:20:17       91.05
12                   CCCC  09:20:18       90.95

I want to remove all rows from df1 previous to time in Time column of df2 for each tradingsymbol. I want my result as below:

            tradingsymbol      Time  last_price
2                    AAAA  12:55:60       84.70
3                    AAAA  13:22:10       86.35
4                    AAAA  14:55:40       87.10
8                    BBBB  14:23:11       87.30
Ad

Answer

You can use pd.concat and sort values to put remove flags.

Code

import io
import numpy as np
import pandas as pd

# Sample creation
s1 = '''tradingsymbol,Time,last_price
AAAA,09:20:10,84.40
AAAA,09:20:10,85.95
AAAA,12:55:60,84.70
AAAA,13:22:10,86.35
AAAA,14:55:40,87.10
BBBB,09:20:13,88.95
BBBB,09:20:13,88.80
BBBB,09:20:14,88.30
BBBB,14:23:11,87.30
CCCC,09:20:15,90.15
CCCC,09:20:16,90.10
CCCC,09:20:17,91.05
CCCC,09:20:18,90.95'''

s2 = '''tradingsymbol,Time
AAAA,12:54:40
BBBB,12:53:33
CCCC,12:51:50'''

df1 = pd.read_csv(io.StringIO(s1), dtype={'last_pirce': np.float64})
df1.Time = pd.to_datetime(df1.Time, format='%H:%M:%S').dt.time

df2 = pd.read_csv(io.StringIO(s2))
df2.Time = pd.to_datetime(df2.Time, format='%H:%M:%S').dt.time

# Operations to remove specific rows
df = pd.concat([df1, df2], axis=0).sort_values(['tradingsymbol', 'Time'], ascending=[True, False])
df['flag'] = df.last_price.isnull()
df.flag = df.groupby('tradingsymbol').flag.cumsum()
df = df[df.flag==0].sort_values(['tradingsymbol', 'Time']).drop('flag', axis=1)

Output

tradingsymbolTimelast_price
2AAAA12:56:0084.7
3AAAA13:22:1086.35
4AAAA14:55:4087.1
8BBBB14:23:1187.3
Ad
source: stackoverflow.com
Ad