Ad

Shifting Timeseries Data Using Per Group Using Shift() And Groupby() Results In NaN

Given the following dataset

df = pd.DataFrame(
    {
        'YearMo': ['01', '02', '01', '02', '01', '02'],
        'Prod': ['a', 'a', 'b', 'b', 'c', 'c'],
        'Value': [1, 2, 3, 4, 5, 6]
    }
)

I'm trying to lag data per year/prod group. It is my understanding this can be done using shift() after a groupby(), as follows:

df['shifteddata'] = df.groupby(['YearMo', 'Prod'])['Value'].shift()

The result of this however is a column containing NaN.

print(df)
  YearMo Prod  Value  shifteddata
0     01    a      1          NaN
1     02    a      2          NaN
2     01    b      3          NaN
3     02    b      4          NaN
4     01    c      5          NaN
5     02    c      6          NaN

Whereas I am expecting

  YearMo Prod  Value  shifteddata
0     01    a      1          NaN
1     02    a      2          1
2     01    b      3          NaN
3     02    b      4          3
4     01    c      5          NaN
5     02    c      6          5

What am I missing?

Ad

Answer

I guess you are missing the fact that groups defined by .groupby(['YearMo', 'Prod']) are groups consisting of only one item. Shift of one item returns NaNs.

Your desired output can be reached by the following code:

df['shifteddata'] = df.groupby(['Prod'])['Value'].shift()
Ad
source: stackoverflow.com
Ad