Pandas Dataframe: Find Unique Value From One Column Which Has The Largest Number Of Unique Values In Another Column
I have the following pandas dataframe
df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T df.columns = ['col1','col2','col3']
Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.
So I am looking for which unique value from col1 has the largest number of unique values in col2
I have tried the following:
col1 66 2 77 3 88 1 99 2
Now I would like to get the value from col1 with the highest value in col2. Which is:
I have tried
However I only receive the max of unique values in col2
Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.
Thank you for your help!
I would like to know both the max value from col2 and to which value in col1 this belongs.
With your result, call both:
result = df.groupby(['col1'])['col2'].nunique() result.idxmax() # 77 result.max() # 3
You could also convert it to a DataFrame before calling
.loc[lambda d: d.idxmax()] but I don't know why you would want to do that.
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module