Ad

How To Quickly Label Int Ranges With A String?

- 1 answer

I want to label "Fare" quantils bands automatically as shown below.

My data looks like:

df.head()


PassengerId Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
0   1   0   3   Braund, Mr. Owen Harris male    22.0    1   0   A/5 21171   7.2500  NaN S
1   2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Th...   female  38.0    1   0   PC 17599    71.2833 C85 C
2   3   1   3   Heikkinen, Miss. Laina  female  26.0    0   0   STON/O2. 3101282    7.9250  NaN S
3   4   1   1   Futrelle, Mrs. Jacques Heath (Lily May Peel)    female  35.0    1   0   113803  53.1000 C123    S
4   5   0   3   Allen, Mr. William Henry    male    35.0    0   0   373450  8.0500  NaN S

I did:

df['FareBin'] = pd.qcut(df['Fare'], 4)
df[['FareBin', 'Survived']].groupby(['FareBin'], as_index=False).mean().sort_values(by='FareBin', ascending=True)


FareBin Survived
0   (-0.001, 7.896] 0.197309
1   (7.896, 14.454] 0.303571
2   (14.454, 31.275]    0.441048
3   (31.275, 512.329]   0.600000

Now, I want to replace bands like (-0.001, 7.896] with string labels in some intelligent way.

I've tried:

df.loc[ df['Fare'] <= 7.91, 'Fare'] = 'Low'
df.loc[(df['Fare'] > 7.91) & (df['Fare'] <= 14.454), 'Fare'] = 'Mid low'
...

Is there a way how to do that so I don't need to list all the conditions like that? Thanks.

Ad

Answer

You can use the parameter labels in the qcut() function:

pd.qcut(range(5), 3, labels=["good", "medium", "bad"])

Output:

[good, good, medium, bad, bad]
Ad
source: stackoverflow.com
Ad