Ad

Find Index Of Max() After Groupby

- 1 answer

Is there a possibility to find the index after a groupby. so in other words i use a groupby function to find max values, but i would like to find the corresponding index in the original data set and add this to a seperate column.

data= pd.DataFrame(
    [[pd.Timestamp('2022-08-05 10:11:04'), 140, 120],
    [pd.Timestamp('2022-08-05 10:11:05'), 400, 155],
    [pd.Timestamp('2022-08-05 10:13:06'), 400, 160],
    [pd.Timestamp('2022-08-05 10:15:07'), 100, 155],
    [pd.Timestamp('2022-08-05 10:15:08'), 430, 160],
    [pd.Timestamp('2022-09-05 10:17:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:17:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:17:11'), 430, 170],
    [pd.Timestamp('2022-09-07 10:18:06'), 430, 156],
    [pd.Timestamp('2022-09-07 10:19:07'), 130, 155],
    [pd.Timestamp('2022-09-07 10:19:08'), 130, 160],
    [pd.Timestamp('2022-09-07 10:19:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:20:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:20:11'), 130, 170]],
    columns=['timestamp', 'power', 'heart rate'])

so i want to know the max heart rate per date, with the corresponding index in the data dataframe

Ad

Answer

Use GroupBy.transform with DataFrameGroupBy.idxmax for indices by maximal value per group in column heart rate:

data['new'] = data.groupby(data['timestamp'].dt.date)['heart rate'].transform('idxmax')
print (data)
             timestamp  power  heart rate  new
0  2022-08-05 10:11:04    140         120    2
1  2022-08-05 10:11:05    400         155    2
2  2022-08-05 10:13:06    400         160    2
3  2022-08-05 10:15:07    100         155    2
4  2022-08-05 10:15:08    430         160    2
5  2022-09-05 10:17:09    430         130    5
6  2022-09-07 10:17:10    430         131    7
7  2022-09-07 10:17:11    430         170    7
8  2022-09-07 10:18:06    430         156    7
9  2022-09-07 10:19:07    130         155    7
10 2022-09-07 10:19:08    130         160    7
11 2022-09-07 10:19:09    430         130    7
12 2022-09-07 10:20:10    430         131    7
13 2022-09-07 10:20:11    130         170    7

If need only maximal rows per groups:

df = data.loc[data.groupby(data['timestamp'].dt.date)['heart rate'].idxmax()]
print (df)
            timestamp  power  heart rate
2 2022-08-05 10:13:06    400         160
5 2022-09-05 10:17:09    430         130
7 2022-09-07 10:17:11    430         170
Ad
source: stackoverflow.com
Ad