Ad
How To Find All Elements On The Webpage Through Scrolling Using SeleniumWebdriver And Python
I can't seem to get all elements on a webpage. No matter what I have tried using selenium. I am sure I am missing something. Here's my code. The url has at least 30 elements yet whenever I scrape only 6 elements return. What am I missing?
import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")
containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})
print(len(containers))
#for each container find shoe model
shoe_colors = []
for container in containers:
if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
shoe_model = container.div.div.img["title"]
review = container.find('div', {'class':'gl-product-card__reviews-number'})
review = int(review.text)
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')
for price in shoe_prices:
print(price.text)
print(len(shoe_prices))
Ad
Answer
So there seems to be some difference in the results as using your code trial:
- You find 30 items with requests and 6 items with Selenium
- Where as I found 40 items with requests and 4 items with Selenium
This items on this website are dynamically generated through Lazy Loading so you have to scrollDown
and wait for the new elements to render within the HTML DOM and you can use the following solution:
Code Block:
import requests import webbrowser from bs4 import BeautifulSoup as bs from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException, TimeoutException headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'} url = 'https://www.adidas.com/us/men-shoes-new_arrivals' res = requests.get(url, headers = headers) page_soup = bs(res.text, "html.parser") containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"}) print(len(containers)) shoe_colors = [] for container in containers: if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None: shoe_model = container.div.div.img["title"] review = container.find('div', {'class':'gl-product-card__reviews-number'}) review = int(review.text) options = Options() options.add_argument('start-maximized') options.add_argument('disable-infobars') options.add_argument('--disable-extensions') driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get(url) myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.gl-price")))) while True: driver.execute_script("window.scrollBy(0,400)", "") try: WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("span.gl-price")) > myLength) titles = driver.find_elements_by_css_selector("span.gl-price") myLength = len(titles) except TimeoutException: break print(myLength) for title in titles: print(title.text) driver.quit()
Console Output:
47 $100 $100 $100 $100 $100 $100 $180 $180 $180 $180 $130 $180 $180 $130 $180 $130 $200 $180 $180 $130 $60 $100 $30 $65 $120 $100 $85 $180 $150 $130 $100 $100 $80 $100 $120 $180 $200 $130 $130 $100 $120 $120 $100 $180 $90 $140 $100
Ad
source: stackoverflow.com
Related Questions
- → How to update data attribute on Ajax complete
- → October CMS - Radio Button Ajax Click Twice in a Row Causes Content to disappear
- → Octobercms Component Unique id (Twig & Javascript)
- → Passing a JS var from AJAX response to Twig
- → Laravel {!! Form::open() !!} doesn't work within AngularJS
- → DropzoneJS & Laravel - Output form validation errors
- → Import statement and Babel
- → Uncaught TypeError: Cannot read property '__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED' of undefined
- → React-router: Passing props to children
- → ListView.DataSource looping data for React Native
- → Can't test submit handler in React component
- → React + Flux - How to avoid global variable
- → Webpack, React & Babel, not rendering DOM
Ad