Ad

How To Convert A String Into It's Real Binary Representation (UTF-8 Or Whatever Is Currently Used)?

- 1 answer

I want to experiment with UTF-8 and Unicode, for that I want to build a small Website which helps me to understand the encoding better.

First I want the ability to enter some Text and then get the actual binary encoding of the string. For that I'm searching for a equivalent to ".GetBytes" from C# or Java. I do not want the resolved CharCodes!

Here a C# function I would like to reproduce in JavaScript

string ToBinary(string input)
{
    //this is the part I am looking for in JavaScript
    var utf8Bytes = Encoding.UTF8.GetBytes(input);

    var bytesFormatedToBin = utf8Bytes.Select(b => Convert.ToString(b, 2).PadLeft(8, '0'));
    return string.Join(' ', bytesFormatedToBin);
}

Here some sample results:

  • "abc" => "01100001 01100010 01100011"
  • "@©®" => "01000000 11000010 10101001 11000010 10101110"
  • "😀😄" => "11110000 10011111 10011000 10000000 11110000 10011111 10011000 10000100"

Is there a way to achieve this in JavaScript?

Thanks. Marc

Edit: Fixed truncated sample result.

Ad

Answer

String.prototype.charCodeAt(...) only works properly when the the string only contains ASCII characters. You'll have to use the standard TextEncoder if you want to deal with other characters:

const te = new TextEncoder('utf-8')
function toBinaryRepr(str) {
    return Array.from(te.encode(str))
        .map(i => i
            .toString(2)
            .padStart(8, '0'))
        .join(' ')
}
// '01100001 01100010 01100011'
toBinaryRepr('abc')
// '01000000 11000010 10101001 11000010 10101110'
toBinaryRepr('@©®')
// '11110000 10011111 10011000 10000000 11110000 10011111 10011000 10000100'
toBinaryRepr('😀😄')

Warning: TextEncoder is not a global constructor in older versions of Node.js - if you get some errors saying TextEncoder is not defined, try importing it by:

const { TextEncoder } = require('util')
Ad
source: stackoverflow.com
Ad