Ad

Requests Suddenly Not Working Despite No Apparent Change From Scraping Source Or Code

- 1 answer

I am currently trying to get better at scraping in JS and use request and cheerio. About two weeks ago I got a basic amazon scrape to work but this morning when I loaded my files it's no longer working. I made sure Cheerio and Request was installed on node and tried picking up requests from wikipedia and it worked fine. On Amazon my original source the code no longer works. Nothing on their webpage seems to have changed so I have no clue why none of my targets are working.

const request = require('request');
const cheerio = require('cheerio');

request(`http://amazon.com/dp/B07R7DY911`, (error,response,html) =>{
    if (!error && response.statusCode ==200) {
        const $ = cheerio.load(html);
        const productTitle = $("#productTitle").html()
        const price = $("#priceblock_ourprice").text();
        const rating = $('#centerCol #acrPopover').text().replace(/\s\s+/g, '');
        const numReviews = $('#centerCol #acrCustomerReviewText').text().replace(/\s\s+/g, '');
        const prodImg = $('#landingImage').attr('data-old-hires');

        console.log(productTitle);
        console.log(price);
        console.log(rating);
        console.log(numReviews);
        console.log(prodImg)
    } else {
        console.log(error);
    }
})

Some playing around and I get null and undefined where I simply didn't before.

Help me stack overflow. You're my only hope!

Update:

Switched code to axios. Much better now.

app.get("/",(req,res)=>{
    axios.get(`${link}`)
      .then((response)=> {
        const html = response.data;
        const $ = cheerio.load(html);
    
        const productName = $("#productTitle").html().replace(/\s\s+/g, '');
        const amznPrice = $("#priceblock_ourprice").text();
        const rating = $('#centerCol #acrPopover').text().replace(/\s\s+/g, '');
        const numReviews = $('#centerCol #acrCustomerReviewText').text().replace(/\s\s+/g, '');
        const prodImg = $('#landingImage').attr('data-old-hires');
        res.render("home", {
            productTitle: productName,
            price:amznPrice,
            prod_Img:prodImg,
            azLink:links,
            });
    });
     

});
Ad

Answer

It appears that you're getting a compressed output in a format that the request() library does not understand. If you add the gzip: true option in the request() call, then the code starts working for me.

const request = require('request');
const cheerio = require('cheerio');

request({url: 'http://amazon.com/dp/B07R7DY911', gzip: true}, (error,response,html) => {
    if (!error && response.statusCode == 200) {
        const $ = cheerio.load(html);
        const productTitle = $("#productTitle").html()
        const price = $("#priceblock_ourprice").text();
        const rating = $('#centerCol #acrPopover').text().replace(/\s\s+/g, '');
        const numReviews = $('#centerCol #acrCustomerReviewText').text().replace(/\s\s+/g, '');
        const prodImg = $('#landingImage').attr('data-old-hires');

        console.log("productTitle", productTitle);
        console.log("price", price);
        console.log("rating", rating);
        console.log("numReviews", numReviews);
        console.log("prodImg", prodImg)
    } else {
        console.log(error);
    }
});
Ad
source: stackoverflow.com
Ad