Ad

GroupBy With Ffill Deletes Group And Does Not Put Group In Index

- 1 answer

I'm running into a very strange issue ever since I have ported my code from one computer to another. I'm using pandas version 0.25.1 on this system, but am unsure on the pandas version I was using previously.

The issue is as follows:

I create a simple, unsorted (mock) dataframe on which I want to sort values and forward-fill all the NaN values.

In [1]: import pandas as pd
   ...: import numpy as np

In [2]: test = pd.DataFrame({"group" : ["A", "A", "A", "B", "B", "B", "C", "C"],
   ...:                      "count" : [2, 3, 1, 2, 1, 3, 1, 2],
   ...:                      "value" : [10, np.nan, 30, np.nan, 19, np.nan, 25, np.nan]})

In [3]: test
Out[3]:
  group  count  value
0     A      2   10.0
1     A      3    NaN
2     A      1   30.0
3     B      2    NaN
4     B      1   19.0
5     B      3    NaN
6     C      1   25.0
7     C      2    NaN

However, when I do that I lose the entire "group" column, and it does not reappear in my index either.

In [4]: test.sort_values(["group", "count"]).groupby("group").ffill()
Out[4]:
   count  value
2      1   30.0
0      2   10.0
1      3   10.0
4      1   19.0
3      2   19.0
5      3   19.0
6      1   25.0
7      2   25.0

I've also tried to use the following using fillna, but that gives me the same result:

In [5]: test.sort_values(["group", "count"]).groupby("group").fillna(method = "ffill")
Out[5]:
   count  value
2      1   30.0
0      2   10.0
1      3   10.0
4      1   19.0
3      2   19.0
5      3   19.0
6      1   25.0
7      2   25.0

Does anyone know what I am doing wrong? The issue seems to be with the ffill method, since I CAN use .mean() on the groupby and retain my groupings.

Ad

Answer

IICU, you have to use 'update` to get the results back to the dataframe

test.update(test.sort_values(["group", "count"]).groupby("group").ffill())
print(test)

Output

group   count   value
0   A   2   10.0
1   A   3   10.0
2   A   1   30.0
3   B   2   19.0
4   B   1   19.0
5   B   3   19.0
6   C   1   25.0
7   C   2   25.0
Ad
source: stackoverflow.com
Ad