Ad

Remove Html Tag From Website - BeautifulSoup

- 1 answer

I am crawling data from a website. This website has code like this:

<span class="demo-span">
    <b>Tag b:</b> 
    <a target="_blank" rel="nofollow noreferrer" href="...">Hello</a> 
     world!
</span>

This is what I tried:

new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())

Expected output:

Hello world!

But the actual output is:

Tag b: Hello world!
Ad

Answer

You can use decompose() to delete a tag.

html = '''
<span class="demo-span">
    <b>Tag b:</b> 
    <a target="_blank" rel="nofollow noreferrer" href="...">Hello</a> 
     world!
</span>'''

soup = BeautifulSoup(html, 'html.parser')

new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!
Ad
source: stackoverflow.com
Ad