Why Is ¿ Displayed Different In Windows Vs Linux Even When Using UTF-8?

- 1 answer

Why is the following displayed different in Linux vs Windows?

System.out.println(new String("¿".getBytes("UTF-8"), "UTF-8"));

in Windows:


in Linux:




System.out.println() outputs the text in the system default encoding, but the console interprets that output according to its own encoding (or "codepage") setting. On your Windows machine the two encodings seem to match, but on the Linux box the output is apparently in UTF-8 while the console is decoding it as a single-byte encoding like ISO-8859-1. Or maybe, as Jon suggested, the source file is being saved as UTF-8 and javac is reading it as something else, a problem that can be avoided by using Unicode escapes.

When you need to output anything other than ASCII text, your best bet is to write it to a file using an appropriate encoding, then read the file with a text editor--consoles are too limited and too system-dependent. By the way, this bit of code:

new String("¿".getBytes("UTF-8"), "UTF-8")

...has no effect on the output. All that does is encode the contents of the string to a byte array and decode it again, reproducing the original string--an expensive no-op. If you want to output text in a particular encoding, you need to use an OutputStreamWriter, like so:

FileOutputStream fos = new FileOutputStream("out.txt");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");