A gift for language Bog-standard PC software has an unsuspected talent |
|
YOUR PC has a hidden flair
for recognising foreign languages. WinZip, the software routinely used to
compress PC files, can help you figure out the language a document is written
in, say Italian scientists. It could even be used to guess a document's
author. The technique, which can identify the language of a segment of text as little as 20 characters long, will be described in a future edition of the journal Physical Review Letters. Vittorio Loreto, a physicist at Rome's La Sapienza University, and his colleagues Dario Benedetto and Emanuele Caglioti have applied for a patent on their idea. Anybody with WinZip software on their PC can use the technique. Take a page of text written in unknown language X. Compress a large piece of text in a known language, say a whole e-book in French, and then repeat the task with the page written in X added to the end of the e-book. Note the difference in the sizes of the two compressed versions. Next, take an e-book in another language, say German or Spanish, and do it all again. Then repeat the procedure with as many more languages as you can. Finally, find which e-book gives you the smallest difference between the two compressed files, says Loreto. The e-book and the text written in X will be the same language (provided one of the languages you've used is the right one, of course). But how come? The method works because WinZip's compression routine records the occurrence and positions of duplicated words and |
![]() |
phrases. As it analyses text, WinZip does an increasingly
efficient packing job - because it encounters fewer and fewer words it
hasn't already seen. But if, when it reaches the added page, it runs into
text that's in a different language, nearly all the extra words will be
unfamiliar ones. Since these cannot be recorded as incidences of words
WinZip has already seen, the compressed version will be larger than if
language X contained only familiar words. |
|
![]() |
|
15 December 2001 - New Scientist
- www.newscientist.com
|