|Online Tools » Text & Languages » Detect language|| |
This tool gives suggestions on what language a text might have been written in. The tool supports about 200 of the most common languages, which covers the native languages of a majority of the world's population.
The suggestions are presented as an ordered list of languages, with the most probable languages at the top. A long green bar means that the language is likely to be the language that the text is written in. In general, longer texts result in more reliable suggestions.
There is a number different text properties that can be used to decide what language a text is written in. One way is to calculate the letter frequencies (i.e. the relative percentage of each letter) and compare it to the precomputed average of each language. It is also possible to look at groups of two or three consecutive letters in the same way. This gives a more detailed description of the language and makes the result more reliable, so this is what the tool on this page uses.
Another possibility would have been to look at whole words. The distribution of word lengths could have been another complimentary metric used to indicate the likelihood of each language. A more reliable method would have been to compare each word against a dictionary for each language. This would, however, require a lot of storage space and computational power so it is out of scope for a simple tool like this. Not all languages use words so this method would still have had to be used in combination with some of the other methods.