Ad

JavaScript Substring Without Splitting Emoji

- 1 answer

in my js I am trying to substring() text which generally works but unfortunately decapitates emojis.

usaText = "AπŸ‡ΊπŸ‡ΈZ"
splitText = usaText.substring(0,2) //"AοΏ½"
splitText = usaText.substring(0,3) //"AπŸ‡Ί"
splitText = usaText.substring(0,4) //"AπŸ‡ΊοΏ½"
splitText = usaText.substring(0,5) //"AπŸ‡ΊπŸ‡Έ"

Is there a way to use substring without breaking emoji? In my production code I cut at about 40 characters and I wouldn't mind if it was 35 or 45. I have thought about simply checking whether the 40th character is a number or between a-z but that wouldn't work if you got a text full of emojis. I could check whether the last character is one that "ends" an emoji by pattern matching but this also seems a bit weird performance-wise.

Am I missing something? With all the bloat that JavaScript carries, is there no built-in count that sees emoji as one?

To the Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") thing:

chrs = Array.from( usaText )
(4) ["A", "πŸ‡Ί", "πŸ‡Έ", "Z"]
0: "A"
1: "πŸ‡Ί"
2: "πŸ‡Έ"
3: "Z"
length: 4

That's one too many unfortunately.

Ad

Answer

So this isn't really an easy thing to do, and I'm inclined to tell you that you shouldn't write this on your own. You should use a library like runes.

Just a simple npm i runes, then:

const runes = require('runes');
const usaText = "AπŸ‡ΊπŸ‡ΈZ";
runes.substr(usaText, 0, 2); // "AπŸ‡ΊπŸ‡Έ"
Ad
source: stackoverflow.com
Ad