Ad
Generate Descriptive Statistics For Each Row Value And Transpose Dynamically
I have a dataframe like as shown below
df = pd.DataFrame({
'subject_id':[1,1,1,1,2,2,2,2,3,3,4,4,4,4,4],
'readings' : ['READ_1','READ_2','READ_1','READ_3','READ_1','READ_5','READ_6','READ_8','READ_10','READ_12','READ_11','READ_14','READ_09','READ_08','READ_07'],
'val' :[5,6,7,11,5,7,16,12,13,56,32,13,45,43,46],
})
What I would like to do is get the descriptive statistics/summarized form of existing columns instead of having the original columns. I expect to see (min
,max
,25%
,75%
,std
,var
) as new columns for each subject
I tried the below but the output isn't exact
df.groupby(['subject_id','readings']).describe().reset_index() #this gives some output but it isn't exact
df.groupby(['subject_id','readings']).pivot_table(values='val', index='subject_id', columns='readings').describe() # this throws error
I expect my output to be like as shown below. Basically it will be a wide and sparse matrix. Since the screenshot is wide, I couldn't enlarge it further. If you click on the image, you will have a better display of the expected output
Ad
Answer
Use Series.unstack
for reshape after describe
, then DataFrame.swaplevel
and for order like in original add DataFrame.reindex
:
df = (df.groupby(['subject_id','readings'])['val']
.describe()
.unstack()
.swaplevel(0,1,axis=1)
.reindex(df['readings'].unique(), axis=1, level=0))
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
subject_id READ_1_count READ_1_mean READ_1_std READ_1_min READ_1_25% \
0 1 2.0 6.0 1.414214 5.0 5.5
1 2 1.0 5.0 NaN 5.0 5.0
2 3 NaN NaN NaN NaN NaN
3 4 NaN NaN NaN NaN NaN
READ_1_50% READ_1_75% READ_1_max READ_2_count ... READ_08_75% \
0 6.0 6.5 7.0 1.0 ... NaN
1 5.0 5.0 5.0 NaN ... NaN
2 NaN NaN NaN NaN ... NaN
3 NaN NaN NaN NaN ... 43.0
READ_08_max READ_07_count READ_07_mean READ_07_std READ_07_min \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 43.0 1.0 46.0 NaN 46.0
READ_07_25% READ_07_50% READ_07_75% READ_07_max
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 46.0 46.0 46.0 46.0
[4 rows x 105 columns]
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad