Finding A Specific String In A Page With Beautifilsoup
I'm working with bs4 and want to return the description of what specific built in Python functions do from the docs e.g. From this page for abs():
https://docs.python.org/2/library/functions.html
Would return this:
abs (x)
Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.
I'm stuck on what I should be looking for apart from just the <p>
element and how I can get that <p>
element only and its text within. I know I could do a findAll
search, but I want to do this without using the text that is in the page (e.g. As if the user doesn't know what the text is beforehand):
import requests, bs4, re
res = requests.get('https://docs.python.org/2/library/functions.html')
res.raise_for_status()
abs_soup = bs4.BeautifulSoup(res.text)
abs_elems = abs_soup.body.findAll(text=re.compile('^abs$'))
print abs_elems
abs_desc = abs_soup.select # this is the part Im stuck on
print abs_desc
Answer
Well, the document of Python puts all functions inside <dl class="function">
, and there's a <dt id="name_of_the_function">
inside it.
So I'd suggest just use:
import requests
from bs4 import BeautifulSoup
res = requests.get('https://docs.python.org/2/library/functions.html')
abs_soup = BeautifulSoup(res.text, "html.parser")
print(abs_soup.find('dt', {'id': 'abs'}).find_next('dd').text)
Output:
Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.
First, we use abs_soup.find('dt', {'id': 'abs'})
to find the <dt>
tag which has abs
as it's id
, and then we use .find_next('dd')
to get the next <dd>
tag after the dt
tag.
Finally, use .text
to get the text of that <dd>
tag, however you can also use .find_next('p').text)
instead, output is the same.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module