Kevin Kelly recently posted an explanation of how science works in the absence of theory and being strictly driven by the mass amounts of data that are able to be culled via the large computing ecosystems that currently exist. When talking about massive amounts of data collected from a broad expanse of human beings it is only appropriate to model on Google which Kevin does.
Among his examples is an interesting explanation wherein Google employs duplicate documents that have been translated into multiple languages, in order to present sites in different languages. The system compares numerous examples and finds where correlating terms occur. As you add more and more data, you arrive at a better representation of what the true translation is. No dictionaries are involved. It is utterly fascinating how superior this model is over the straight word for word translations that are derived from dictionaries (this is how Engrish happens). Straight word-for-word translations leave out verb tenses, conjugations, plurals, idioms and colloquialisms. In this model, “I love you” translates to “Je Adorer Toi” instead of the grammatically correct and culturally more accepted “Je t’aime”. The word-for-word model is poor at best, and yet I am always surprised out how often this technique is used )particularly in business situations where it would seem unforgiveable and very risky). In the Google model, you get to see language as actually executed. More and more, the machine would see “Je t’aime” in the French document where “I love you” appeared in the English, and the correct culturally correct version would become the default translation by Google.
What’s interesting is that he ramp-up for this should be rapid and is purely data driven. Growth of the internet in non-industrialized nations will suddenly cause a deluge of data for those languages and machine generation of the language will become possible. It would seem that eventually documents, web sites, and the like will release simultaneously and with virtually perfect fidelity in every spoken language and dialect. The impact of this could be significant as no longer would the distribution of information be blocked by the artificial barrier of language. Perhaps more significant is that this could even serve as a fragmenting technology as there would be less of a benefit to learning additional languages. It seems bizarre that one could have a more perfect understanding of another culture and yet have an even lesser understanding of the local language, and yet that could be the result.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.