Ad

Best Practice Usage Of I18n Language Code To Identify Language And Region?

- 1 answer

I plan to use language/region codes in my current web project to identify language and region i.e. 'en-US' or 'de-CH'. Is a valid usage to use a code like i.e. 'en-IN' to identify content for India region with english text?

Ad

Answer

The following is about specifying the language in the actual HTML document. There might be different best practices for server-side coding that may depend on the actual programming technique used.

According to the way of how to declare a language in (X)HTML documents by the W3C [1] one should chose the language code based on the Best Current Practice 47 "Tags for Identifying Languages" (RFC5646)[2].

The suggest a format as follows

langtag   = language
           ["-" script]
           ["-" region]
           *("-" variant)
           *("-" extension)
           ["-" privateuse]

where among others the parts are defined as

language      = 2*3ALPHA            ; shortest ISO 639 code
region        = 2ALPHA              ; ISO 3166-1 code

Regarding the usage of the region part, the it says

Region subtags are used to indicate linguistic variations associated with or appropriate to a specific country, territory, or region. Typically, a region subtag is used to indicate variations such as regional dialects or usage, or region-specific spelling conventions. It can also be used to indicate that content is expressed in a way that is appropriate for use throughout a region, for instance, Spanish content tailored to be useful throughout Latin America.

The following rules apply to the region subtags:

  1. Region subtags MUST follow any primary language, extended language, or script subtags and MUST precede any other type of subtag.

[...]

  1. There MUST be at most one region subtag in a language tag and the region subtag MAY be omitted, as when it adds no distinguishing value to the tag.

Nothing is said about a language to be an official language of the country nor actually a commonly spoken language. Only if distinguishing wouldn't add much value, you can omit the region but don't have to.

Note that the language tag is officially all lowercase, while the region codes are uppercase.

tl;dr As long as you're using valid iso codes to represent language and region you're conform with the best practice and standard when using e.g. en-IN

The list of language codes in ISO 639-2 may be found at https://www.loc.gov/standards/iso639-2/php/English_list.php while a list of region codes are listed on wikipedia or you may use the search on the official ISO website.

Ad
source: stackoverflow.com
Ad