Ad

How To Count Length Of Missing Values For String Variables As Zero?

- 1 answer

I'm trying to count the length of object variables in a dataframe with Python. A lot of my variables are string with missing values and unfortunately when I try to count the length of missing values it shows as 3 (as it counts "Nan" as a 3 character value).

Here's the code that I'm using:

df_string_mean_with_na = pd.DataFrame(df_string.applymap(len).astype(int).mean().to_dict(), index=[df_string.index.values[0]])

where df_string is my starting dataframe and I'm trying to calculate the average length of values for each columns. I would like to count the length of missing values for object variables as 0, is there a way?

Ad

Answer

I think you need DataFrame.fillna for replace missing values to empty strings before counting length:

print (Table1)
       A      B    C
0  hello     hi  NaN
1   good     hi   so
2   home  hello   no

Test missing values:

print (Table1.isna())
       A      B      C
0  False  False   True
1  False  False  False
2  False  False  False

df = Table1.fillna('').applymap(len).mean().to_frame().T
print (df)
          A    B         C
0  4.333333  3.0  2.333333

Detail:

print (Table1.fillna('').applymap(len))
   A  B  C
0  5  2  0
1  4  2  2
2  4  5  2

If missing values are strings use DataFrame.replace:

print (Table1.isna())
       A      B      C
0  False  False  False
1  False  False  False
2  False  False  False

df = Table1.replace('NaN', '').applymap(len).mean().to_frame().T
print (df)
          A    B         C
0  4.333333  3.0  2.333333
Ad
source: stackoverflow.com
Ad