Ad

Nesting/grouping A Range Of Columns When Converting A Pandas DataFrame To A Dictionary

- 1 answer

I've been trying to work out how to convert a Pandas DataFrame into a list of nested dictionaries and I haven't been having any luck.

My first thought was to convert the DataFrame into a list of dictionaries (with users = users.to_dict(orient='records')) and then merge the 'address' and 'color_preference' items into sublists but there must be a better way to do it!

I have a dataframe like this:

import pandas as pd
users = pd.DataFrame({'email_address': ["[email protected]"], 'status': ["active"], 'address': ["1 Eagle St"],  'suburb': ["BROOKLYN"],  'state': ["NY"],  'postcode': ["11201"],  'country': ["USA"],  'red': [False],  'orange': [True],  'yellow': [True],  'green': [True],  'blue': [False],  'indigo': [False],  'violet': [False]})

and I'm trying to convert it into this format:

{  
   "email_address":"[email protected]",
   "status":"active",
   "address":{  
      "address":"1 Eagle St",
      "suburb":"Brooklyn",
      "state":"NY",
      "postcode":"11201",
      "country":"USA"
   },
   "color_preference":{  
      "red":false,
      "orange":true,
      "yellow":true,
      "green":true,
      "blue":false,
      "indigo":false,
      "violet":false
   }
}
Ad

Answer

You can do this explicitly with apply (I've done the first couple but you could do all the address/colors):

def extract_json(row):
  return {
    "email_address": row.loc["email_address"],
    "status": row.loc["status"],
    "address": row.loc[["address", "suburb"]].to_dict(),
    "color_preference": row.loc[["red", "orange"]].to_dict()
  }

In [11]: users.apply(extract_json, axis=1)
Out[11]:
0    {'email_address': '[email protected]', 'status':...
dtype: object

In [12]: users.apply(extract_json, axis=1).tolist()
Out[12]:
[{'email_address': '[email protected]',
  'status': 'active',
  'address': {'address': '1 Eagle St', 'suburb': 'BROOKLYN'},
  'color_preference': {'red': False, 'orange': True}}]

You could pull out all the address/colors by position:

In [21]: users.columns[2:7]
Out[21]: Index(['address', 'suburb', 'state', 'postcode', 'country'], dtype='object')

In [22]: users.columns[7:]
Out[22]: Index(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'], dtype='object')
Ad
source: stackoverflow.com
Ad