How To Find Link Associated With Keyword Using Python, Requests, And Beautiful Soup
I am very new python requests and beautiful soup so my code is probably really bad.
What I have now:
f = open('sites.txt','r') sitelist =  for line in f: sitelist.append(line.strip()) getsites = [''] print(sitelist) for i in range(len(sitelist)): getsites.append(sitelist[i]) for i in range(len(sitelist)): temp = requests.get(sitelist[i]) data = temp.text soup = BeautifulSoup(data, "html.parser") for url in soup.find_all("Yeezy"): print(element.find_previous_sibling('loc')) print(url.text)
Example of XML File I am parsing:
<url> <loc> https://www.a-ma-maniere.com/products/beanie-502805f16-black-white </loc> <lastmod>2016-12-24T22:25:05Z</lastmod> <changefreq>daily</changefreq> <image:image> <image:loc> https://cdn.shopify.com/s/files/1/0626/9065/products/502805F16-1.jpg?v=1472499019 </image:loc> <image:title>Alexander Wang: Beanie (Black/White)</image:title> </image:image> </url>
What I want to do is grab a keyword via the then print the link associated with it stored in .
For find all you need to give it a tag to look for. If you only want tags of that type that contain the word "Yeezy" then in your for loop check to see if the text of the tag is the string you are looking for. If it is the string you are looking for then you have the element want and can print the url.
For most urls this is simply
for url in soup.find_all('a') if "Yeezy" in url.get_text(): print(url['href'])
For yours more like
for url in soup.find_all('url') if url.find('image:title') and url.loc: if "Yeezy" in url.find('image:title').get_text() print(url.find('image:loc').get_text())
For additional information visit get_text()
Because you are trying to get an image at this point you might want to look at this answer as well. You'll need a library that can read and store images rather than trying to access it as a builtin python object.
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module