JavaScript Substring Without Splitting Emoji

- 1 answer

in my js I am trying to substring() text which generally works but unfortunately decapitates emojis.

usaText = "AπŸ‡ΊπŸ‡ΈZ"
splitText = usaText.substring(0,2) //"AοΏ½"
splitText = usaText.substring(0,3) //"AπŸ‡Ί"
splitText = usaText.substring(0,4) //"AπŸ‡ΊοΏ½"
splitText = usaText.substring(0,5) //"AπŸ‡ΊπŸ‡Έ"

Is there a way to use substring without breaking emoji? In my production code I cut at about 40 characters and I wouldn't mind if it was 35 or 45. I have thought about simply checking whether the 40th character is a number or between a-z but that wouldn't work if you got a text full of emojis. I could check whether the last character is one that "ends" an emoji by pattern matching but this also seems a bit weird performance-wise.

Am I missing something? With all the bloat that JavaScript carries, is there no built-in count that sees emoji as one?

To the Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") thing:

chrs = Array.from( usaText )
(4) ["A", "πŸ‡Ί", "πŸ‡Έ", "Z"]
0: "A"
1: "πŸ‡Ί"
2: "πŸ‡Έ"
3: "Z"
length: 4

That's one too many unfortunately.



So this isn't really an easy thing to do, and I'm inclined to tell you that you shouldn't write this on your own. You should use a library like runes.

Just a simple npm i runes, then:

const runes = require('runes');
const usaText = "AπŸ‡ΊπŸ‡ΈZ";
runes.substr(usaText, 0, 2); // "AπŸ‡ΊπŸ‡Έ"