Python Regex Find Everything Inside Curly Brackets After A Certain String

- 1 answer

Hi so I'm struggling with regex a little bit. I have a rpt file from which I need to get specific data from. The File looks something a bit like this:

lots of text...
;Mass   % BPI
238.85  0.943
247.64  0.984
378.65  0.990
lots of text...

I want to get everything inside the curly brackets after this string occurred: [MS]. The problem is there are many more curly brackets in this file and they are not only surrounding the data i need.

What I already tried is this:

import re

file = input("Enter file path: ")
if len(file) < 1:
    file = "path"
handle = open(file)

pattern = r'^([-0-9\.eE+]+)[ \t]*(;|,)?[ \t]*([-0-9\.eE+]*)$'
findings = re.findall(pattern, handle)


#and then making a single dict out of it with key-value pairs

But that doesn't give me all I need, it return some of the Values but not all of it.

In the end I want the Numbers inside curly brackets as a dictionary (example: Key: 238.85, Value: 0.943) so I can plot it afterwards.

note: The spaces between the Mass and BPI 'Columns' are Tabs.



You may extract all blocks between { and } after [MS] and then extract all the necessary data from the block:

import re
results = []

with open(path_to_file, 'r') as r:
    for block in re.findall(r'\[MS\]\s*{([^{}]+)}',
        results.extend(re.findall(r'^(\d[\d.]*)\t(\d[\d.]*)$', block, re.M))


See the Python demo

Block matching regex

  • \[MS\] - a literal [MS] text
  • \s* - 0+ whitespaces
  • { - a { char
  • ([^{}]+) - Group 1 (this is what re.findall will return): any 1+ chars other than { and }
  • } - a } char.

Number extraction regex

  • ^ - start of a line (due to re.M)
  • (\d[\d.]*) - Group 1 (key): a digit and then any 0+ digits or dots
  • \t - a tab
  • (\d[\d.]*) - Group 2 (value): a digit and then any 0+ digits or dots
  • $ - end of a line (due to re.M).