Ad

Finding A Specific String In A Page With Beautifilsoup

- 1 answer

I'm working with bs4 and want to return the description of what specific built in Python functions do from the docs e.g. From this page for abs():

https://docs.python.org/2/library/functions.html

Would return this:

abs (x)

Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.

I'm stuck on what I should be looking for apart from just the <p> element and how I can get that <p> element only and its text within. I know I could do a findAll search, but I want to do this without using the text that is in the page (e.g. As if the user doesn't know what the text is beforehand):

import requests, bs4, re

res = requests.get('https://docs.python.org/2/library/functions.html')
res.raise_for_status()
abs_soup = bs4.BeautifulSoup(res.text)
abs_elems = abs_soup.body.findAll(text=re.compile('^abs$'))
print abs_elems
abs_desc = abs_soup.select   # this is the part Im stuck on
print abs_desc
Ad

Answer

Well, the document of Python puts all functions inside <dl class="function">, and there's a <dt id="name_of_the_function"> inside it.

So I'd suggest just use:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://docs.python.org/2/library/functions.html')
abs_soup = BeautifulSoup(res.text, "html.parser")

print(abs_soup.find('dt', {'id': 'abs'}).find_next('dd').text)

Output:

Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.

First, we use abs_soup.find('dt', {'id': 'abs'}) to find the <dt> tag which has abs as it's id, and then we use .find_next('dd') to get the next <dd> tag after the dt tag.

Finally, use .text to get the text of that <dd> tag, however you can also use .find_next('p').text) instead, output is the same.

Ad
source: stackoverflow.com
Ad