Ad

Pandas Dataframe: Find Unique Value From One Column Which Has The Largest Number Of Unique Values In Another Column

- 1 answer

I have the following pandas dataframe

df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T
df.columns = ['col1','col2','col3']

Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.

So I am looking for which unique value from col1 has the largest number of unique values in col2

I have tried the following:

df.groupby(['col1'])['col2'].nunique()

which returns:

col1
66    2
77    3
88    1
99    2

Now I would like to get the value from col1 with the highest value in col2. Which is:

77    3

I have tried

df.groupby(['col2'])['col1'].nunique().max()

However I only receive the max of unique values in col2

3

Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.

 77    3

Thank you for your help!

Ad

Answer

I would like to know both the max value from col2 and to which value in col1 this belongs.

With your result, call both:

result = df.groupby(['col1'])['col2'].nunique()
result.idxmax()  # 77
result.max()  # 3

You could also convert it to a DataFrame before calling .loc[lambda d: d.idxmax()] but I don't know why you would want to do that.

Ad
source: stackoverflow.com
Ad