Groupby Within Groups
I have data like this:
df = pd.DataFrame({
'a': ['milk', 'eggs', 'eggs', 'butter', 'butter',
'milk', 'eggs', 'eggs', 'butter', 'butter'],
'b': ['billy', 'bob', 'frank', 'frank', 'sue',
'frank', 'sue', 'sue', 'sue', 'sue'],
'c': ['1/30', '1/30', '1/31', '1/31', '1/31',
'3/31', '3/31', '3/31', '5/31', '5/31'],
}, index=list('ABCDEFGHIJ'))
I want the inverse of the counts for each distinct value of c in b. Billy and Bob each have one distinct value in c, so their counts are both equal to one. Frank has two dates, so his is 0.5, etc.
Desired output:
A 1.000000
B 1.000000
C 0.500000
D 0.500000
E 0.333333
F 0.500000
G 0.333333
H 0.333333
I 0.333333
J 0.333333
dtype: float64
I think I need to manipulate groupby(some group).count()
and/or groupby(some group).transform('count')
, but I'm not sure how to manipulate them and what else I need (if anything) - or if there's a better way.
I tried variations on
df.groupby(['b', 'c'], as_index=False)['c'].transform('count').reset_index()
(based on aggregating within a groupby), to no avail.
I could probably figure out an "ugly" way but I'd very much like to know how to do this in 1-2 lines (if possible).
Thanks!
Answer
I'm sure there's a better way, I'm really unfamiliar with anything beyond the basics of Pandas, but this seems to do what you want:
df.merge(pd.DataFrame(1 / df.groupby("b")["c"].nunique()).reset_index(), on="b").set_index(df.index)
Output:
a b c_x c_y
A milk billy 1/30 1.000000
B eggs bob 1/30 1.000000
C eggs frank 1/31 0.500000
D butter frank 1/31 0.500000
E milk frank 3/31 0.500000
F butter sue 1/31 0.333333
G eggs sue 3/31 0.333333
H eggs sue 3/31 0.333333
I butter sue 5/31 0.333333
J butter sue 5/31 0.333333
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module