# Pandas Groupby To Calculate Percentage Of Groupby Columns

## 12 January 2022 - 1 answer

I want to calculate the rate_death percentage as below - (new_deaths / population) * 100 after grouping by location and summing new_deaths.

Example: for Afghanistan, rate_death must calculate as ((1+4+10) / 38928341) * 100 And for Albania, it must calculate as ((0+0+1) / 2877800) * 100

Below is the data and approaches which I tried but not working -

``````df_data
``````
```    location    date        new_cases   new_deaths  population
0   Afghanistan 4/25/2020   70          1           38928341
1   Afghanistan 4/26/2020   112         4           38928341
2   Afghanistan 4/27/2020   68          10          38928341
3   Albania    4/25/2020    15          0           2877800
4   Albania    4/26/2020    34          0           2877800
5   Albania    4/27/2020    14          1           2877800
```
```Data columns (total 5 columns):
#   Column      Non-Null Count  Dtype
---  ------      --------------  -----
0   location    6 non-null      object
1   date        6 non-null      object
2   new_cases   6 non-null      int64
3   new_deaths  6 non-null      int64
4   population  6 non-null      int64
```

Approach 1:

``````df_res = df_data[['location','new_deaths','population']].groupby(['location']).sum()
``````
```location    new_deaths  population
Afghanistan 15          116785023
Albania     1           8633400
```
``````df_res['rate_death'] = (df_res['new_deaths'] / df_res['population'] * 100.0)
``````
```location    new_deaths  population  rate_death
Afghanistan 15          116785023   0.000
Albania     1           8633400     0.000
```

I know that the population is summing up twice due to the above groupby with 'sum' operation, but still I wonder why is the rate_death not calculating the percentage as expected but rather showing as 0.000

Approach 2: (tried as mentioned in this post - Pandas percentage of total with groupby)

``````location_population = df_data.groupby(['location', 'population']).agg({'new_deaths': 'sum'})
location = df_data.groupby(['location']).agg({'population': 'mean'})
location_population.div(location, level='location') * 100
``````
```location    population  new_deaths  population

Afghanistan 38928341    NaN         NaN
Albania     2877800     NaN         NaN
```

But it is coming as NaN.

You can do -

``````df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100
``````

Result

``````             new_deaths  population  rate_death
location
Afghanistan          15    38928341    0.000039
Albania               1     2877800    0.000035
``````