Ad

How To Use `strsplit` Before Every Capital Letter Of A Camel Case?

- 1 answer

I want to use strsplit at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit? Why is that so and what is to change?

strsplit('AaaBbbCcc', '(?=\\p{Lu})', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[A-Z])', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[ABC])', perl=TRUE)[[1]]
# [1] "A"  "aa" "B"  "bb" "C"  "cc"

Expected result:

# [1] "Aaa" "Bbb" "Ccc"

In the Demo it actually looks fine.

Ideally it should split before every camel case, e.g. Aa and not AA; there's \\p{Lt} but this doesn't seem to work at all.

strsplit('AaaABbbBCcc', '(?=\\p{Lt})', perl=TRUE)[[1]]
# [1] "AaaABbbBCcc"

Expected result:

# [1] "AaaA" "BbbB" "Ccc" 
Ad

Answer

It seems that by adding (?!^) you can obtained the desired result.

strsplit('AaaBbbCcc', "(?!^)(?=[A-Z])", perl=TRUE)

For the camel case we may do

strsplit('AaaABbbBCcc', '(?!^)(?=\\p{Lu}\\p{Ll})', perl=TRUE)[[1]]
strsplit('AaaABbbBCcc', '(?!^)(?=[A-Z][a-z])', perl=TRUE)[[1]]  ## or
# [1] "AaaA" "BbbB" "Ccc" 
Ad
source: stackoverflow.com
Ad