Ad

How To Process Large Number Of Requests With Promise All

- 1 answer

I have about 5000 links and I need to crawl all those. So Im wonder is there a better approach than this. Here is my code.

let urls = [ 5000 urls go here ];

const doms = await getDoms(urls);

// processing and storing the doms

getDoms = (urls) => {

  let data = await Promise.all(urls.map(url => {
    return getSiteCrawlPromise(url)
  }));
  return data;

}

getSiteCrawlPromise = (url) => {

  return new Promise((resolve, reject) => {
    let j = request.jar();
    request.get({url: url, jar: j}, function(err, response, body) {
        if(err)
          return resolve({ body: null, jar: j, error: err});
        return resolve({body: body, jar: j, error: null});
    });
  })

} 

Is there a mechanism implemented in promise so it can devide the jobs to multiple threads and process. then return the output as a whole ? and I don't want to devide the urls into smaller fragments and process those fragments

Ad

Answer

The Promise object represents the eventual completion (or failure) of an asynchronous operation, and its resulting value.

There is no in-built mechanism in Promises to "divide jobs into multiple threads and process". If you must do that, you'll have to fragment the urls array into smaller arrays and queue the fragmented arrays onto separate crawler instances simultaneously.

But, there is absolutely no need to go that way, since you're using node-js and node-crawler, you can use the maxConnections option of the node-crawler. This is what it was built for and the end result would be the same. You'll be crawling the urls on multiple threads, without wasting time and effort on manual chunking and handling of multiple crawler instances, or depending on any concurrency libraries.

Ad
source: stackoverflow.com
Ad