Ad

How To Find Link Associated With Keyword Using Python, Requests, And Beautiful Soup

I am very new python requests and beautiful soup so my code is probably really bad.

What I have now:

f = open('sites.txt','r')
sitelist = []
for line in f:
    sitelist.append(line.strip())
getsites = ['']
print(sitelist)
for i in range(len(sitelist)):
    getsites.append(sitelist[i])

for i in range(len(sitelist)):
    temp = requests.get(sitelist[i])
    data = temp.text
    soup = BeautifulSoup(data, "html.parser")
    for url in soup.find_all("Yeezy"):
        print(element.find_previous_sibling('loc'))
        print(url.text)

Example of XML File I am parsing:

<url>
<loc>
https://www.a-ma-maniere.com/products/beanie-502805f16-black-white
</loc>
<lastmod>2016-12-24T22:25:05Z</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:loc>
https://cdn.shopify.com/s/files/1/0626/9065/products/502805F16-1.jpg?v=1472499019
</image:loc>
<image:title>Alexander Wang: Beanie (Black/White)</image:title>
</image:image>
</url>

What I want to do is grab a keyword via the then print the link associated with it stored in .

Ad

Answer

For find all you need to give it a tag to look for. If you only want tags of that type that contain the word "Yeezy" then in your for loop check to see if the text of the tag is the string you are looking for. If it is the string you are looking for then you have the element want and can print the url.

For most urls this is simply

for url in soup.find_all('a')
    if "Yeezy" in url.get_text():
        print(url['href'])

For yours more like

for url in soup.find_all('url')
    if url.find('image:title') and url.loc:
        if "Yeezy" in url.find('image:title').get_text()
            print(url.find('image:loc').get_text())

For additional information visit get_text()

Because you are trying to get an image at this point you might want to look at this answer as well. You'll need a library that can read and store images rather than trying to access it as a builtin python object.

Ad
source: stackoverflow.com
Ad