Ad

Pandas, Transform Inner Most Index Into A Json String Column Or A List Of Dictionaries

- 1 answer

For every unique combination of the first two indexes, I want all of the rows (and the index name) of their third index transformed into a json string column.

For example


recs = [{'id':123,'color':'orange','store':'big_mart', 'price':6}, {'id':123,'color':'orange','store':'Buckyz','price':5}, {'id':234,'color':'blue', 'store':'Gmart','price':7}]
d3 = pd.DataFrame(recs)
d3.set_index(['id','color','store'])

My intended output is supposed to look like something like this.

idcolornew_col
123'orange'"[ {'store':'big_mart','price': 6}, {'store':'Buckyz','price':5}]"
234'blue'"[{'store':'Gmart','price':7}]"

The best I could do is this

d3.reset_index()
d3.groupby(['id','color']).apply(lambda x: x[['store','price']].to_json(orient='records')))

id   color 
123  orange    [{"store":"big_mart","price":6},{"store":"Buck...
234  blue                          [{"store":"Gmart","price":7}]
dtype: object

It works, but I'm having trouble appending the list output as a column into a deduplicated dataframe

Ad

Answer

You can groupby "id" and "color" and then apply to_dict with orient parameter set to "records" to each group:

out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_dict('records')).astype(str).reset_index(name='new_col')

or use to_json (saves conversion to type str):

out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_json(orient='records')).reset_index(name='new_col')

Output:

    id   color                                            new_col
0  123  orange  [{'store': 'big_mart', 'price': 6}, {'store': 'Buckyz', 'price': 5}]
1  234    blue                   [{'store': 'Gmart', 'price': 7}]
Ad
source: stackoverflow.com
Ad