Ad

How To Crawl A Website Page When It Is Fully Loaded(js, Css All Loaded)

- 1 answer

I would like to crawl some websites page like Amazon or eBay to get the sold item pictures path. When I checked the page, it seems like the image src is modified by javascript when page completely loaded.

There is one library called cheerio. it is simple but it doesn't expose a method to do some check after page completely load and it only returns html back. Does anyone have the experience on this? or is there any library i can use to get the real image path since it is modified by javascript? thanks for your help.

Ad

Answer

As mentioned in the comments, puppeteer is probably the best way to scrape dynamic pages. It's a node library that interfaces with chrome/chromium and will load the page just like an instance of regular chrome.

Inside your page.evaluate, you can use the MutationObserver browser api to watch the DOM and wait for the images you want.

I've had good experiences using Apify, which will run puppeteer instances for you and has a generous free tier.

Ad
source: stackoverflow.com
Ad