Ad

Pandas Dataframe Capturing The Row Having Multiple Columns Populated With Constant

- 1 answer

Have a dataframe as :

data = {'ledger_1': [3, -99,-99], 'ledger_2': [0.1, 1.2,-99] , 'geo_3' : [2.3,4.5,1.0]}
df_data = pd.DataFrame.from_dict(data)

  geo_3 ledger_1    ledger_2
0  2.3   3            0.1
1  4.5   -99          1.2
2  1.0   -99         -99.0

Now i want to capture the row where both ledger_1 & ledger_2 are not having -99 constant value populated

I 've been trying to do -

cols = ['ledger_1','ledger_2']
df_data[df_data[cols]!= -99]

but this gives

   geo_3    ledger_1    ledger_2
0   NaN       3.0        0.1
1   NaN       NaN        1.2
2   NaN       NaN        NaN

where as the solution i want is just the first record -

   geo_3    ledger_1    ledger_2
0   2.3      3.0         0.1

How to filter a dataframe based on a set of columns (ledger_1 ,_2 ,_3..._n) and a common condition (!= -99) and not just individually as the columns list is fairly huge ?

Ad

Answer

Use DataFrame.all for test if both values are Trues:

cols = ['ledger_1','ledger_2']
df = df_data[(df_data[cols]!= -99).all(axis=1)]
print (df)
   ledger_1  ledger_2  geo_3
0         3       0.1    2.3

Details:

print (df_data[cols]!= -99)
   ledger_1  ledger_2
0      True      True
1     False      True
2     False     False

print ((df_data[cols]!= -99).all(axis=1))
0     True
1    False
2    False
dtype: bool
Ad
source: stackoverflow.com
Ad