Pandas Groupby To Calculate Percentage Of Groupby Columns
I want to calculate the rate_death percentage as below - (new_deaths / population) * 100 after grouping by location and summing new_deaths.
Example: for Afghanistan, rate_death must calculate as ((1+4+10) / 38928341) * 100 And for Albania, it must calculate as ((0+0+1) / 2877800) * 100
Below is the data and approaches which I tried but not working -
df_data
location date new_cases new_deaths population 0 Afghanistan 4/25/2020 70 1 38928341 1 Afghanistan 4/26/2020 112 4 38928341 2 Afghanistan 4/27/2020 68 10 38928341 3 Albania 4/25/2020 15 0 2877800 4 Albania 4/26/2020 34 0 2877800 5 Albania 4/27/2020 14 1 2877800
Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 location 6 non-null object 1 date 6 non-null object 2 new_cases 6 non-null int64 3 new_deaths 6 non-null int64 4 population 6 non-null int64
Approach 1:
df_res = df_data[['location','new_deaths','population']].groupby(['location']).sum()
location new_deaths population Afghanistan 15 116785023 Albania 1 8633400
df_res['rate_death'] = (df_res['new_deaths'] / df_res['population'] * 100.0)
location new_deaths population rate_death Afghanistan 15 116785023 0.000 Albania 1 8633400 0.000
I know that the population is summing up twice due to the above groupby with 'sum' operation, but still I wonder why is the rate_death not calculating the percentage as expected but rather showing as 0.000
Approach 2: (tried as mentioned in this post - Pandas percentage of total with groupby)
location_population = df_data.groupby(['location', 'population']).agg({'new_deaths': 'sum'})
location = df_data.groupby(['location']).agg({'population': 'mean'})
location_population.div(location, level='location') * 100
location population new_deaths population Afghanistan 38928341 NaN NaN Albania 2877800 NaN NaN
But it is coming as NaN.
Please help if anything wrong in these approaches or how to resolve this. Thanks!
Answer
You can do -
df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100
Result
new_deaths population rate_death
location
Afghanistan 15 38928341 0.000039
Albania 1 2877800 0.000035
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module