Ad

The "await" Property Of "async" Function Sleeps After An Instance - Javascript

- 1 answer

I am working on a scraper . I am using Phantom JS along with Node JS. Phantom JS loads the page with async function, just like : var status = await page.open(url). Sometimes, because of the slow internet the page takes longer to load and after a time the page status is not returned, to check while its loaded or not. And the page.open() sleeps, which doesn't return anything at all, and all the execution is waiting.

So, my basic question is; is there any way to keep this page.open(url) alive, as the execution of the rest of the code waits until the page is loaded.

My Code is

const phantom = require('phantom');

ph_instance = await phantom.create();
ph_page = await ph_instance.createPage();

var status = await ph_page.open("https://www.cscscholarship.org/");

if (status == 'success') {
  console.log("Page is loaded successfully !");
  //do more stuff
}
Ad

Answer

From your comment, it seems like it might be timing out (because of slow internet sometimes)... you can validate this by adding the onResourceTimeout method to your code (link: http://phantomjs.org/api/webpage/handler/on-resource-timeout.html)

It would look something like this:

ph_instance.onResourceTimeout = (request) => {
    console.log('Timeout caught:' + JSON.stringify(request));
};

And if that ends up being true, you can increase the default resource timeout settings (link: http://phantomjs.org/api/webpage/property/settings.html) like this:

ph_instance.settings.resourceTimeout = 60000 // 60 seconds

Edit: I know the question is about phantom, but I wanted to also mention another framework I've used for scraping projects before called Puppeteer (link: https://pptr.dev/) I personally found that their API's are easier to understand and code in, and it's currently a maintained project unlike Phantom JS which is not maintained anymore (their last release was two years ago).

Ad
source: stackoverflow.com
Ad