Ad

What Is Difference Between Soup Of Selenium And Requests?

I was crawling some information from the web, but there were different results while I'm using Selenium and requests

Selenium

driver.get('https://www.jobplanet.co.kr/companies/322493/benefits/%EC%A7%80%EC%97%90%EC%9D%B4%EC%B9%98%EC%94%A8%EC%A7%80')
soup= BeautifulSoup(driver.page_source, 'html.parser')
sample= soup.find_all('div', class_='accord_hd')`

requests

response= requests.get('https://www.jobplanet.co.kr/companies/322493/benefits/%EC%A7%80%EC%97%90%EC%9D%B4%EC%B9%98%EC%94%A8%EC%A7%80')
soup= BeautifulSoup(response.content, 'html.parser')
sample= soup.find_all('div', class_='accord_hd')`

while using Selenium, it returned an empty list. but in requests, there was a list with some strings in it.

I experienced sth similar to this before, so I wonder what's going on here

Ad

Answer

requests will obtain/return the initial html source code.

selenium will simulate/automate the browser to open the web page, which then you can pull the html source that was used to render the page.

The difference between these 2 is requests does not support that rendering/java script if the site is dynamically created. While since selenium actually opens up the browser to display the page, will allow the page to render it's contents before getting the html source.

That's the reason why you may get 2 different responses when using requests versus selenium.

However, in the particular code you have given above, I had the exact same output with using Selenium and using requests

Code:

from bs4 import BeautifulSoup
from selenium import webdriver
import requests

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get('https://www.jobplanet.co.kr/companies/322493/benefits/%EC%A7%80%EC%97%90%EC%9D%B4%EC%B9%98%EC%94%A8%EC%A7%80')
soup= BeautifulSoup(driver.page_source, 'html.parser')
sample_selenium= soup.find_all('div', class_='accord_hd')

driver.close()



import requests

response = requests.get('https://www.jobplanet.co.kr/companies/322493/benefits/%EC%A7%80%EC%97%90%EC%9D%B4%EC%B9%98%EC%94%A8%EC%A7%80')
soup= BeautifulSoup(response.content, 'html.parser')
sample_requests= soup.find_all('div', class_='accord_hd')



print ('Selenium: %s items\nRequests: %s items' %(len(sample_selenium), len(sample_requests)))

Output:

Selenium: 11 items
Requests: 11 items
Ad
source: stackoverflow.com
Ad