Ad

Python Extract Json Structure From Html Page

- 1 answer

in python i'm reading an html page content which contains a lot of stuff. To do this i read the webpage as string by this way:

url = 'https://myurl.com/'
reqq = req.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
reddit_file = req.urlopen(reqq)
reddit_data = reddit_file.read().decode('utf-8')

if i print the reddit_data i can see correctly the whole html contents. Now, inside it there's a structure like json that i would like to read and extract some fields from that.

Below the structure:

"dealDetails" : {
      "f240141a" : {
         "egressUrl" : "https://ccc.com",
         "title" : "ZZZ",
         "type" : "ghi",
      },
      "5f9ab246" : {
         "egressUrl" : "https://www.bbb.com/",
         "title" : "YYY",
         "type" : "def",
      },
      "2bf6723b" : {
         "egressUrl" : "https://www.aaa.com//",
         "title" : "XXX",
         "type" : "abc",
      },
}

What i want to do is: find the dealDetails field and then for each f240141a5f9ab2462bf6723b get the egressURL, title and type values.

Thanks

Ad

Answer

Try this,

[nested_dict['egressUrl'] for nested_dict in reddit_data['dealDetails'].keys()]

To access the values of JSON, you can consider as dictionary and use the same syntax to access values as well.

Edit-1:

Make sure your type of reddit_data is a dictionary.

if type(reddit_data) is str.

You need to do..

import ast
reddit_data = ast.literal_eval(reddit_data)

OR

import json
reddit_data = json.loads(reddit_data)
Ad
source: stackoverflow.com
Ad