Pandas Groupby With Length Of Lists
I need display in dataframe columns both the user_id and length of content_id which is a list object. But struggling to do using groupby. Please help in both groupby as well as my question asked at the bottom of this post (how do I get the results along with user_id in dataframe?)
Dataframe types:
df.dtypes
output:
user_id object
content_id object
dtype: object
Sample Data:
user_id content_id
0 user_18085 [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...
1 user_16044 [cont_2738_2_49, cont_4482_2_19, cont_4994_18_...
2 user_13110 [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...
3 user_18909 [cont_3170_2_28]
4 user_15509 [cont_2598_4_4, cont_2738_2_49, cont_4482_2_19...
Pandas query:
df.groupby('user_id')['content_id'].count().reset_index()
df.groupby(['user_id'])['content_id'].apply(lambda x: get_count(x))
output:
user_id content_id
0 user_10013 1
1 user_10034 1
2 user_10042 1
When I tried without grouping, I am getting fine as below -
df['content_id'].apply(lambda x: len(x))
0 11
1 9
2 11
3 1
But, how do I get the results along with user_id in dataframe? Like I want in below format -
user_id content_id
some xxx 11
some yyy 6
Answer
pandas.Groupby
returns a grouper element not the contents of each cell. As such it is not possible (without alot of workarounding) to do what you want. Instead you need to simply rewrite the columns (as suggested by @ifly6)
Using
df_agg = df.copy()
df_agg.content_id = df_agg.content_id.apply(len)
df_agg = df_agg.groupby('user_id').sum()
will result in the same dataframe as the Groupby
you described.
For completeness sake the instruction for a single groupby would be
df.groupby('user_id').agg(lambda x: x.apply(len).sum())
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module