Difference between /(a|b)/ and /[ab]/ in JavaScript split() using regex

- 1 answer

Ad

I just got in a situation where I wanted to split a string at each occurrence of ; or a newline. I wanted to make sure this would not leave any empty array elements in, so I decided to use regex to include a string as long as possible.

I first tried using foo.split(/(\n|\r|;)+/), but this strangely did not work as expected with double enters. Using the (for what I know) identical syntax foo.split(/[\n\r;]+/) did work as expected though.

Here's an example:

var foo = "test;\nexample;\n\ndouble enters;\nand without semicolon\nend;\n";
console.log(foo.split(/[\n\r;]+/));
console.log(foo.split(/(\n|\r|;)+/));
//and to show that the matching strings are indeed equal:
console.log(foo.match(/[\n\r;]+/g));
console.log(foo.match(/(\n|\r|;)+/g));

document.write('<pre>'+foo+'</pre>'); //demonstrating what the string looks like.

So as you can see in your console if you run that snippet, the regexes should be identical here, since they match() the exact same characters. The output when splitting the string is completely different though.

The same goes wrong when replacing the matched characters with A and B for example (replacing all newlines with A, semicolons with B). Even then, these 2 regex splits have different outcome.

So my question is: does anyone know what causes this mysterious behaviour?

Ad

Answer

Ad

It's because when you use a capturing group with the .split() method, the matched contents of the capturing group are included in the resulting array.

From MDN:

If separator contains capturing parentheses, matched results are returned in the array.

If you use a non-capturing group, the results will be the same:

foo.split(/(?:\n|\r|;)+/)

Example:

var foo = "test;\nexample;\n\ndouble enters;\nand without semicolon\nend;\n";
console.log(foo.split(/[\n\r;]+/));
console.log(foo.split(/(?:\n|\r|;)+/));
//and to show that the matching strings are indeed equal:
console.log(foo.match(/[\n\r;]+/g));
console.log(foo.match(/(\n|\r|;)+/g));

Ad
source: stackoverflow.com
Ad