Loop Pages And Crawler Excel File Path Using Rvest
For the entries from this link, I need to click each entry, then crawler url of excel file's path in the left bottom part of page:
How could I achieve that using web scrapy packages in R such as
rvest, etc.? Sincere thanks at advance.
library(rvest) # Start by reading a HTML page with read_html(): common_list <- read_html("http://www.csrc.gov.cn/csrc/c100121/common_list.shtml") common_list %>% # extract paragraphs rvest::html_nodes("a") %>% # extract text rvest::html_text() -> webtxt # inspect head(webtxt)
First, my question is how could I correctly set
html_nodes to get url of each web page?
When I run
Error in checkError(res) : Undefined error in httr call. httr output: length(url) == 1 is not TRUE
rvest to get the links,
library(rvest) library(dplyr) library(RSelenium) link = url %>% read_html() %>% html_nodes('.mt10') link = link[] %>% html_nodes("a") %>% html_attr('href') %>% paste0('http://www.csrc.gov.cn', .)  "http://www.csrc.gov.cn/csrc/c101921/c1758587/content.shtml"  "http://www.csrc.gov.cn/csrc/c101921/c1714636/content.shtml"  "http://www.csrc.gov.cn/csrc/c101921/c1664367/content.shtml"  "http://www.csrc.gov.cn/csrc/c101921/c1657437/content.shtml"  "http://www.csrc.gov.cn/csrc/c101921/c1657426/content.shtml"
We can use
RSelenium to loop over the links and download excel files.
It took me over a minute to completely load a single webpage. I will demonstrate hetre using a single link.
url = "http://www.csrc.gov.cn/csrc/c101921/c1758587/content.shtml" #launch the browser driver = rsDriver(browser = c("chrome")) remDr <- driver[["client"]] #click on the excel file path remDr$navigate(url) remDr$findElement('xpath', '//*[@id="files"]/a')$clickElement()
- → OctoberCMS Backend Loging Hash Error
- → "failed to open stream" error when executing "migrate:make"
- → OctoberCMS - How to make collapsible list default to active only on non-mobile
- → Create plugin that makes objects from model in back-end
- → October CMS Plugin Routes.php not registering
- → OctoberCMS Migrate Table
- → How to install console for plugin development in October CMS
- → OctoberCMS Rain User plugin not working or redirecting
- → October CMS Custom Mail Layout
- → October CMS - How to correctly route
- → October CMS create a multi select Form field
- → How to update data attribute on Ajax complete
- → October CMS - Conditionally Load a Different Page