Ad

Node.JS How To Decode ISO-8859-1 Into UTF-8?

- 1 answer

I'm querying a database in ISO-8859-1 but since node runs in UTF8 mode, i must convert the data being returned this particular DBMS.

I tried iconv but I can't figure out how to get the desired output. For example, i got 0xc2 0x80 when I expected 0xe2 0x82 0xac to be returned.

var iconv = require('iconv-lite');

var buffer = Buffer.from([0x80]);
var str = iconv.decode(buffer, 'iso-8859-1');
console.log({str});
console.log(new Buffer(str, 'utf8'));
iconv.encode(new Buffer('€','utf8'),'iso-8859-1');

/*
Which outputs
{ str: '' }
<Buffer c2 80>*/
  • In UTF8 € is represented by 0xe2 0x82 0xac
  • In ISO-8859-1 is represented by 0x80

Updates:

  • Expected value for € is 0xe2 0x82 0xac and not 0xdb as I mentioned initially by mistake
  • As stated in the comments ISO-8859-1 doesn't contain a € character.
Ad

Answer

Thanks to the comments above I realize that despite of having a character set of "ISO8859_1" in my database, under the hood IBEXPERT is using and presenting me the data in WINDOWS-1252 (known as ANSI) encoding, which explains why I was seeing 0x80 in their HEX viewer.

Maybe WINDOWS-1252 extends somehow the ISO8859_1 character set??

For example: Running the code below works fine: € is correctly decoded.

var str = iconv.decode(buffer, 'WINDOWS-1252');
console.log({str});
console.log(new Buffer(str, 'utf8'));
var str2 = iconv.encode(new Buffer('€','utf8'),'WINDOWS-1252');
console.log({strEncoded: str2})
/*
{ str: '€' }
<Buffer e2 82 ac>
{ strEncoded: <Buffer 80> }
* */

The weird part is that my database query which uses node-firebirdlib-fbclient to communicate with my firebird database resolves with a UTF8 character that cannot be represented in UTF8 as you can see by the symbol value which is ' ' which translate into 0xc2 0x80.

   { idNumber: 1,
     id: 'EUR',
     taxPercentage: 1,
     isDefault: -1,
     accountNumber: null,
     dontUse: false,
     symbol: '' },
  eur: <Buffer c2 80> }

eur: is being output by console.log(new Buffer(result.symbol,'utf8'))

And decoding this from utf8 to 'WINDOWS-1252' with the following command iconv.decode(Buffer.from(currency.symbol, 'utf8'), 'WINDOWS-1252') returns

... "defaultCurrency": { "id": "EUR", "symbol": "€", "label": "EUR" }...

Ad
source: stackoverflow.com
Ad