Ad

Fillna/null Values Base On Below Line Based On Conditions In Pandas

- 1 answer

I have a huge pandas data frame just like below.

d = {'id1': ['85643', '85644','discription','8564312','8564314','85645','discription','85646','8564318','85647','85648','85649','85655','56731','34566','78931','78931'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00002','G-00001','G-00001','G-00001','G-00001','G-00001','G-00002','G-00002','G-00002','G-00002','G-00003','G-00003','G-00003'],'col1': [np.nan, np.nan,5,np.nan,4,np.nan,60,np.nan,0,6,3,2,4,32,3,1,23],'Goal': [np.nan, 56,78,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan, 7, 84,np.nan,5 ],'col2': [793, np.nan,8,np.nan,43,np.nan,610,np.nan,0,16,23,72,48,3,28,5,3],'col3': [500, np.nan,89,np.nan,44,np.nan,60,np.nan,5,6,3,2,4,13,12,14,98],'Date': ['2021-06-13', '2021-06-13','2021-06-14','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16','2021-05-23','2021-05-13','2021-03-26','2021-05-13']}
df = pd.DataFrame(data=d)
df

    id1         ID      col1        Goal        col2        col3      Date
0   85643       G-00001 NaN         NaN         793.000     500.000   2021-06-13
1   85644       G-00001 NaN         56.0000     NaN         NaN       2021-06-13
2   discription G-00002 5.0000      78.0000     8.0000      89.0000   2021-06-14
3   8564312     G-00002 NaN         NaN         NaN         NaN       2021-06-13
4   8564314     G-00002 4.0000      89.0000     43.0000     44.0000   2021-06-14
5   85645       G-00001 NaN         73.0000     NaN         NaN       2021-06-15
6   discription G-00001 60.0000     NaN         610.000     60.0000   2021-06-15
7   85646       G-00001 NaN         NaN         NaN         NaN       2021-06-13
8   8564318     G-00001 0.0000      NaN         0.0000      5.0000    2021-06-16
9   85647       G-00001 6.0000      NaN         16.0000     6.0000    2021-06-13
10  85648       G-00002 3.0000      NaN         23.0000     3.0000    2021-06-13
11  85649       G-00002 2.0000      34.0000     72.0000     2.0000    2021-06-13
12  85655       G-00002 4.0000      NaN         48.0000     4.0000    2021-06-16
13  56731       G-00002 32.0000     7.0000      3.0000      13.0000   2021-05-23
14  34566       G-00003 3.0000      84.0000     28.0000     12.0000   2021-05-13
15  78931       G-00003 1.0000      NaN         5.0000      14.0000   2021-03-26
16  78931       G-00003 23.0000     5.0000      3.0000      98.0000   2021-05-13

So I want to fill the columns "col1", "col2" and "col3" when meet the below criteria. When "id1" column has word "discription", I want to fill na values in upper line for "col1", "col2" and "col3" using the values in "id1" "discription" line.

For example index 2 has word "discription" and i want to fill na values in index 1 for "col1", "col2" and "col3" using the values in "col1", "col2" and "col3" in "discription" line. All the other null values I don't want to fill in for "col1", "col2" and "col3".

So the final dataframe should be look like below.

    id1         ID      col1        Goal        col2        col3      Date
0   85643       G-00001 NaN         NaN         793.000     500.000   2021-06-13
1   85644       G-00001 5.0000      56.0000     8.0000      89.0000   2021-06-13
2   discription G-00002 5.0000      78.0000     8.0000      89.0000   2021-06-14
3   8564312     G-00002 NaN         NaN         NaN         NaN       2021-06-13
4   8564314     G-00002 4.0000      89.0000     43.0000     44.0000   2021-06-14
5   85645       G-00001 60.0000     73.0000     610.000     60.0000   2021-06-15
6   discription G-00001 60.0000     NaN         610.000     60.0000   2021-06-15
7   85646       G-00001 NaN         NaN         NaN         NaN       2021-06-13
8   8564318     G-00001 0.0000      NaN         0.0000      5.0000    2021-06-16
9   85647       G-00001 6.0000      NaN         16.0000     6.0000    2021-06-13
10  85648       G-00002 3.0000      NaN         23.0000     3.0000    2021-06-13
11  85649       G-00002 2.0000      34.0000     72.0000     2.0000    2021-06-13
12  85655       G-00002 4.0000      NaN         48.0000     4.0000    2021-06-16
13  56731       G-00002 32.0000     7.0000      3.0000      13.0000   2021-05-23
14  34566       G-00003 3.0000      84.0000     28.0000     12.0000   2021-05-13
15  78931       G-00003 1.0000      NaN         5.0000      14.0000   2021-03-26
16  78931       G-00003 23.0000     5.0000      3.0000      98.0000   2021-05-13

Is it possible to do this in python? Any suggestion would be appreciated. Thanks in advance!

Ad

Answer

Mask the rows which do not have discription in id1, then shift the masked dataframe one unit upwards and fill the nan values in required columns

c = ['col1', 'col2', 'col3']

mask = df['id1'].eq('discription')
df[c] = df[c].fillna(df[c].where(mask).shift(-1))

            id1       ID  col1  Goal   col2   col3        Date
0         85643  G-00001   NaN   NaN  793.0  500.0  2021-06-13
1         85644  G-00001   5.0  56.0    8.0   89.0  2021-06-13
2   discription  G-00002   5.0  78.0    8.0   89.0  2021-06-14
3       8564312  G-00002   NaN   NaN    NaN    NaN  2021-06-13
4       8564314  G-00002   4.0  89.0   43.0   44.0  2021-06-14
5         85645  G-00001  60.0  73.0  610.0   60.0  2021-06-15
6   discription  G-00001  60.0   NaN  610.0   60.0  2021-06-15
7         85646  G-00001   NaN   NaN    NaN    NaN  2021-06-13
8       8564318  G-00001   0.0   NaN    0.0    5.0  2021-06-16
9         85647  G-00001   6.0   NaN   16.0    6.0  2021-06-13
10        85648  G-00002   3.0   NaN   23.0    3.0  2021-06-13
11        85649  G-00002   2.0  34.0   72.0    2.0  2021-06-13
12        85655  G-00002   4.0   NaN   48.0    4.0  2021-06-16
13        56731  G-00002  32.0   7.0    3.0   13.0  2021-05-23
14        34566  G-00003   3.0  84.0   28.0   12.0  2021-05-13
15        78931  G-00003   1.0   NaN    5.0   14.0  2021-03-26
16        78931  G-00003  23.0   5.0    3.0   98.0  2021-05-13
Ad
source: stackoverflow.com
Ad