Python - Regex - Match Characters Between Certain Characters
I have a textfile and i want to match/findall/parse all characters that are between certain characters ([\n"text to match"\n]). The text itself can differ a lot from each other in respect to the structure and characters they contain (they can contain every possible char there is).
I posted this question before (sorry for the duplicate) but so far the problem couldnt be solved, so now i am trying to be even more precise about the problem.
The text in the file is build up like this:
test ="""
[
"this is a text and its supposed to contain every possible char."
],
[
"like *.;#]§< and many "" more."
],
[
"plus there are even
newlines
in it."
]"""
My desired output should be a list (for example) with each text in between the seperators as an element, like the following:
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']
I tried to solve it with Regex and two solutions with the according output i came up with:
my_list = re.findall(r'(?<=\[\n {8}\").*(?=\"\n {8}\])', test)
print (my_list)
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.']
well this one was close. Its listing the first two elements as its supposed to but unfortunately not the third one as it has newlines within.
my_list = re.findall(r'(?<=\[\n {8}\")[\s\S]*(?=\"\n {8}\])', test)
print (my_list)
['this is a text and its supposed to contain every possible char."\n ], \n [\n "like *.;#]§< and many "" more."\n ], \n [\n "plus there are even\nnewlines\n \n in it.']
okay this time every element is included but the list has only one element in it and the lookahead doesnt seem to be working as i thought it would.
So whats the right Regex to use to get my desired output? Why does the second approach not include the lookahead?
Or is there even a cleaner, faster way to get what i want (beautifulsoup or other methods?)?
I am very thankful for any help and hints.
i am using python 3.6.
Answer
You should use DOTALL
flag for matching newlines
print(re.findall(r'\[\n\s+"(.*?)"\n\s+\]', test, re.DOTALL))
Output
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even\nnewlines\n\nin it.']
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module