Python Regex Match Paragraphs
I have a String that looks like this:
...
Art. 15 Gegenstand Dieses Gesetz regelt die Bekämpfung der Geldwäscherei im Sinne von Artikel 305 bis des Strafgesetzbuches6 (StGB), die Bekämpfung der Terrorismusfinanzierung im Sinne von Artikel 260quinquies Absatz 1 StGB und die Sicherstellung der Sorgfalt bei Finanzgeschäften.
Art. 22 Geltungsbereich 1 Dieses Gesetz gilt: a. für Finanzintermediäre; b. für natürliche und juristische Personen, die gewerblich mit Gütern handeln und dabei Bargeld entgegennehmen (Händlerinnen und Händler).
...
I am trying to split the String up into parts from Art. XX to the next Art. XX.
So for Example the first Match should contain the String:
Art. 15 Gegenstand Dieses Gesetz regelt die Bekämpfung der Geldwäscherei im Sinne von Artikel 305 bis des Strafgesetzbuches6 (StGB), die Bekämpfung der Terrorismusfinanzierung im Sinne von Artikel 260quinquies Absatz 1 StGB und die Sicherstellung der Sorgfalt bei Finanzgeschäften.
I tried this:
x = re.findall(r"Art\. (?s).*(?=Art)",text);
and this:
x = re.findall(r"Art\. .+(\n.*)*(?=Art)*",text);
But it seems not to work as expected... Also I am not sure wether I should use findall or split.
Answer
First of all, when using capturing groups in a pattern and passing it to re.findall
only the captures will be present in the output. Next, you should not try to quantify a lookaround, it makes no sense and is often treated as a user error. (?=Art)*
here in Python will be treated as if there was no (?=Art)*
, as it means "there can be Art
or there can be no Art
". Same as if there was no lookahead.
You may use
result = re.findall(r'(?m)^Art\..*(?:\n(?!Art\.).*)*', text)
See the regex demo
Details
(?m)^
- start of a lineArt\.
-Art.
string.*
- rest of the line(?:\n(?!Art\.).*)*
- 0 or more lines that do not start withArt.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module