Localization Work (v.0.1.4)

May 18
Localization Work (v.0.1.4)

Changes

  • Support for Internationalized domain names, punycode xn--* domains (see notes)
  • Exported CSV's now support Unicode and have BOM added for correct display of characters in Excel
  • Translate function added to all pages using Google translate widget
  • Translate tool added to domain lists (see notes)
  • New shortcut URL, TLD.CX (see notes)
  • Yet more updates to the interface, default columns at different screen sizes and a change of font

Release Notes

Internationalized domain names (IDNs)

All domains that are punycode encoded will now display with the correct character set. To see this in action view the .COM Alexa 1 Million list and enter 'xn--' (the ASCII Compatible Encoding prefix (ACE)) in the search box to show all encoded domains:

Internationalized domain names

Domain names (labels) are stored in the DNS as ASCII. However Internationalized domain names cannot be stored as ASCII and so have to be encoded. The encoding is achieved with the Punycode algorithm which encodes the international character set label into a string of ASCII characters. This code is then prefixed with the ACE prefix 'xn--' marking it as being encoded and then stored in the Domain Name System (DNS). Browsers and other software and sites (such as this one) then interrogate the DNS to receive domain name labels in ASCII and decode any names (labels) that are prefixed with 'xn--' back to their intended character sets and displayed correctly.

Incidentally entering the 'xn--' prefix in the home page search box filters by Top Level domains encoded and containing international character sets. You can achieve a similar effect by using the IDN category. However this shows all IDN domains which is not necessarily as useful. If the TLD category chosen is 'Country TLDs' 'In Root Zone', at time of writing 303 results are returned e.g. 303 country-code top level domains. Entering 'xn--' into the search box filters this list to IDN country code domains only, of which there are 56. The top in this list in terms of volume of registrations is .рф (.xn--p1ai), the Cyrillic country code top-level domain for the Russian Federation. The next two are IDN's for Taiwan both containing the exact same number of domains (37,045) as their registrations are mirrored: .台灣 (.xn--kpry57d, Taiwan in traditional Chinese characters) and .台湾 (.xn--kprw13d, Taiwan in simplified Chinese characters).

I don't know about anyone else but I find IDN's and their uses fascinating (really should get out more).

Translate tool & Google translate

Aside from Punycode decoding I've also added a link to Google translate which appears for any domain that has international characters. This link utilises Google's language detection algorithm to translate second level domain labels into English, the default language of the site. I've also added the Google widget for localization of DomainNameStats.com (but not the blog yet) to any number of different languages. Whilst not perfect this seemed like a good halfway house before full site localization at a later date. By default the domain translate link translates to English. However if a different site language is chosen (using the widget) then the domain names are translated into that language instead (a neat little hack which I was quite proud of, for all of a minute).

But thinking about it, all domain names (not just those with an xn-- prefix) may need to be translated from any language to any other, depending on the audience. Will look at implementing this in a later release.

New shortcut URL, TLD.CX

I've had this domain for some time but have never got around to using it properly. .CX is the country code Top Level Domain (ccTLD) for Christmas Island which is located in the Indian Ocean just south of Java and Sumatra with a population of 1,843 according to a census in 2016. We don't have a great deal of domain data on .CX (hopefully this will change soon) other than some pricing data and Alexa footprint. I like the extension though, short with a trailing 'X' and the domain as a whole has a nice rhyming rhythm to it (the D.C part). The trailing 'X' (to me anyway) implies the concept of exchange, so this kind of fits with the mission of DomainNameStats.com as a centre of information and ideas about domain names.

I should point out however I'm no domaining expert. I have particular ideas about what makes a good domain name, experts may roll their eyes at some of these ideas (like the above). But that contention I think is what makes working in this area fun. I'll explore some of these ideas in more depth and why I got into domains and the DNS at a later date (whether you like it or not).

To use the new shortcut just replace DomainNameStats.com with TLD.CX in for any page. So TLD.CX/com (top page for .COM detail), TLD.CX/com/new-domains (new domains just registered today) etc. Interested to see if anyone uses it:)