Ad

Character Encoding Outputs Wrong Between Local And Server

- 1 answer

I have a Laravel 5.6 installation, with config/database.php options for charset and collation set as utf8mb4 and utf8mb4_unicode_ci respectively.

What I'm outputting is a simple RSS feed (so XML). I send character encoding as UTF-8 in header response (as such: return response()->view('rss', $data)->header('Content-Type', "text-xml; charset=utf-8"); and use <?xml version="1.0" encoding="UTF-8" ?> in the XML file.

Locally, on my mac running Valet and PHP 7.2, everything is fine, but when deployed to a Forge provisonned server, the output is wrong. I went on and checked, in case it made a difference, I also have some locale generated on the server that use the characters, so it can't be that.

Now, years ago, I'd have jumped on utf8_encode and be done with it, but I've never had to do this in so long, I can't wrap my head around the fact that I should be using it. I'm sure I don't have to. But I can't see where things gets scrambled, so I'm open to any inputs here! What is going wrong here?

Precisions: Here's an example of wrong output. Locally, I'm getting this string: L'Allongé. On the server, it outputs: L&#039;Allongé. Now the character outputed for XML string for ' is kind of ok (but I still don't get why it's different), the real trouble lies in the é that seems to be badly encoded.

Ad

Answer

Parsing é as ISO-8859-1 gives us the binary value C3 A9. This happens to be the UTF-8 representation of è. (You can find this at https://unicode-table.com/en/00E9/)

The most probable cause is that you're serving UTF-8 bytes, but the browser parses it as ISO-8859-1. While you do claim to send encoding declarations in several places, verify the browser encoding. Chrome has hidden these settings in newer versions, but Firefox still allows you to change the encoding of a page via the Hamburger menu > More > Text encoding.

Another scenario involves a failure to store the proper data. This usually involves data from a third party that has mixed up their encoding.

Ad
source: stackoverflow.com
Ad