Ad

How To Split A String Based On Either Spaces Or Symbols While Maintaining The Delimiters?

- 1 answer

I'm trying to split a string based on either spaces or certain symbols (presently *_-<>). I'll give some examples of input and output:

"Hello how are you" -> [ "Hello", " ", "how", " ", "are", " ", "you" ]

"Hello *how* are *you*" -> [ "Hello", " ", "*how*", " ", "are", " ", "*you*" ]

"Hello *how*are_you_" -> [ "Hello", " ", "*how*", "are", "_you_" ]

"*how*are _you_ \*doing*_today_ hm?" -> [ "*how*", "are", " ", "_you_", " ", "\*doing*", "_today_", " ", "hm?"

Splitting on space unfortunately turns cases like *how*_are_ into a single item in the array instead of multiple items.

I also tried using a Regex to split on, but unfortunately it doesn't maintain the symbols surrounding each word.

Sorry if this is a bit confusing. Is there a good way to handle this?

Ad

Answer

Rather than using split, one option is to use .match: either match one of the symbols, followed by characters that aren't that symbol, followed by that symbol again, or match non-space, non-symbol characters:

// Put the dash first, because it will be put into a character set:
const delims = '-*_<>';

// Construct a pattern like:
// ([-*_<>])(?:(?!\1).)+\1| |[^-*_<> ]+

const patternStr = String.raw
`([${delims}])(?:(?!\1).)+\1| |[^${delims} ]+`
const pattern = new RegExp(patternStr, 'g');

const doMatch = str => str.match(pattern);
console.log(doMatch("Hello how are you"));
console.log(doMatch("Hello *how*are_you_"));
console.log(doMatch("*how*are _you_ \*doing*_today_ hm?"));

([-*_<>])(?:(?!\1).)+\1|[^-*_<> ]+ means:

  • ([-*_<>])(?:(?!\1).)+\1 - First alternation:
    • ([-*_<>]) - Match and capture initial delimiter
    • (?:(?!\1).)+ - Followed by any characters which are not that initial delimitier
    • \1 - Followed by that initial delimiter again
  • \s Second alternation: match a space
  • [^-*_<> ]+ - Third alternation: match anything which is not a delimiter or a space
Ad
source: stackoverflow.com
Ad