How To Retain Certain Fields Of The Data Frame That I Don't Want To Group Over?

- 1 answer

I have a dataframe with the following fields:


Key 1, Key 2, Key 3, Key 4, Value 1, Value 2

Step 1: What I want to do is group over Keys 1, 2, 3, and 4 originally and find the mean of Value 1 as well as Value 2.

Step 2: My goal is to find the maximum of Value 1 when grouping over keys 1, 2, and 3, so I then group over Keys 1, 2, 3 and call the max. However, I want the value of Value 2 that corresponds to the actual max Value 1 results, meaning I want to keep the original Value 2 that is associated with the max value .

df.groupby(['Key 1', 'Key 2', 'Key 3'], as_index=False).max()

^ When the following is called, it simply finds the max Value 2 as well, while what I really want is simply the max Value 1, and its corresponding Value 2.

As an example: For df with fields

Key1, Key2, Key3, Key4, Value1, Value2:

k1, k2, k3, k4, 30, 10

k1, k2, k3, k4, 20, 20

When using groupby from above, this returns k1, k2, k3, 30, 20, while what I want is k1, k2, k3, 30, 10

Any ideas on how this can be done?



You can go about it using transform:

df['Value1max'] = df.groupby(['Key 1', 'Key 2', 'Key 3'])['Value1'].transform('max')

So if this is you dataframe:

  Key1 Key2 Key3 Key4  Value1  Value2
0   k1   k2   k3   k4      30      10
1   k1   k2   k3   k4      20      20

You'd get this output:

  Key1 Key2 Key3 Key4  Value1  Value2  Value1max
0   k1   k2   k3   k4      30      10         30
1   k1   k2   k3   k4      20      20         30