How To Match A Value None Or One Time In Regex Using Python Re.findall

- 1 answer

I want to match a pattern like java -c or java See that -c is optional it can come once or none. So far I am using


which is working fine Please don't check command validity, this is a sample. I prefer this way than using pipe symbol like


but when I use re.findall it is returning empty string but working fine with Since re.findall is compulsory for me, Is grouping like (-c) correct way or can you suggest any changes to the above regex?


seq="java -c"

Output: ['-c'] I want to get java -c As @9769953 pointed if seq="java", output is empty list and if seq="java" #Note the extra spaces, output is ['']. @mozway I have tried what you said when I use


it's returning ['-c'] What am I doing wrong?



From what I understand, you want to find the whole pattern of java <possible option flag> option-value with re.findall, while also retaining the possibility to use (the latter will only find the first occurrence, if any).

I assume this means the input could be

text = "blah blah java -c blah blah java"

and you want to find the two occurrences.

re.findall captures groups inside the text string. So you need to group the relevant pattern, which in this case is the full pattern. To avoid capturing also the optional -c, you need to make this group non-capturing.

A normal group is surrounded by parentheses; a non-capturing group would start with (?: and ends with a normal corresponding closing ).

Together with the allowance for single whitespace if -c is not present (and not two matches of \s+\s+, which would lead to a requirement of at least two whitespace characters)[1], and with the simplification of using a ? for an optional match, the pattern would be:

pattern = r"(java\s+(?:-c\s+)?[\d]+\.java)"

This also uses a raw string (by using the r prefix), which avoids the interpretation of some blackslashed character as something special, which is often not what one wants in a regular expression.

With the above input text and pattern, the results are now:

>>> regex = re.compile(pattern)
>>> regex.findall(text)
['java -c', 'java']
<re.Match object; span=(10, 26), match='java -c'>
'java -c'

[1] this pattern does not capture java, which for short (i.e., one-letter) options, is often standard. If you want to also capture that possibility, change the second \s+ into \s*.