Ad

BeautifulSoup 4: Extracting Multiple Titles And Links From Different Ptag(s)

HTML Code:

<div>
    <p class="title">
       <a target="_blank" rel="nofollow noreferrer" href="/news/123456">title_1</a> 
    </p>
</div>

<div>
    <p class="title">
       <a target="_blank" rel="nofollow noreferrer" href="/news/789000">title_2</a> 
    </p>
</div>

My Code:

def web(WebUrl):
    site = urlparse(WebUrl)
    code = requests.get(WebUrl)
    plain = code.text
    s = BeautifulSoup(plain, "html.parser")
    p_containers = s.find('p', {'class':'title'})

    for title in s.find_all('p', {'class':'title'}):
        line = title.get_text()
        print(line)
        for link in p_containers.find_all('a'):
            line2 = link.get('href')
            print(site.netloc + str(line2))

Hi guys, I need some help with this, my task is to extract titles and links from a webpage, I was able to extract the titles but not the links. When I try to scrape the links, I got only the first link successfully scraped, the following links got ignored and replaced with the first scraped link.

Ad

Answer

You have most of the bits in your code, but are just a little bit off. I think the most simple way to get the titles and links is by using the below.

site = """<div>
    <p class="title">
       <a target="_blank" rel="nofollow noreferrer" href="/news/123456">title_1</a> 
    </p>
</div>

<div>
    <p class="title">
       <a target="_blank" rel="nofollow noreferrer" href="/news/789000">title_2</a> 
    </p>
</div>"""

s = BeautifulSoup(site, "html.parser")

for title in s.find_all('p', {'class':'title'}):
    links = [x['href'] for x in title.find_all('a', href=True)]
    line = title.get_text()
    print(line)
    print(links)

You can see that the links object is a list, that's just in case there's a situation where there's multiple links for each title.

Ad
source: stackoverflow.com
Ad