Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Icelandic Language in the Digital Age — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear? What are the Icelandic language’s chances of survival?

Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications.

In this context, Icelandic is not very well off. At the end of the 20th century, Icelandic language technology was virtually non-existent. There was a relatively good spell checker, a not-so-good speech synthesiser, and that was all. There were no programs or even individual courses on language technology or computational linguistics at any Icelandic university or college, there was no ongoing research in these areas, and no Icelandic software companies were working on language technology.

Things started to change after a specially appointed Expert Group delivered a white paper on Language Technology to the Minister of Education, Science and Culture in 1999. In this white paper, several actions to establish Icelandic language technology were proposed. In 2000, the Government launched a special Language Technology Programme, with the aim of supporting institutions and companies in creating basic resources for Icelandic language technology work. This initiative resulted in a number of projects which have laid the groundwork for Icelandic language technology.

After the Language Technology Programme ended in 2004, researchers from three institutes (University of Iceland, Reykjavik University, and the Árni Magnússon Institute for Icelandic Studies), who had been involved in most of the projects funded by the programme, decided to join forces in a consortium called the Icelandic Centre for Language Technology (ICLT), in order to follow up on the tasks of the programme. Since 2005, the ICLT researchers have initiated several new projects which have been partly supported by the Icelandic Research Fund and the Icelandic Technical Development Fund.

The present report reveals that despite considerable achievements in the last decade, it is only with respect to the most basic tools and resources such as tokenisers, part-of-speech taggers, morphological analysers/generators, syntactic parsers, reference corpora, and syntax corpora, that the situation for Icelandic is reasonably good. When it comes to advanced fields like sentence and text semantics, advanced discourse processing, information retrieval, language generation, summarisation, dialogue management, semantics and discourse corpora, ontological resources, etc., no tools and resources exist for Icelandic. Thus, it is clear that we still have a long way to go to ensure the future of Icelandic as a full-fledged player in the modern – and future – European information society.

Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between 57 languages, IBM’s supercomputer Watson that was able to defeat the US-champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.

The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios. For example, it will help immigrants to learn the Icelandic language and integrate more fully into the country’s culture. The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements. This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.

Not all European languages are equally well prepared for this future. This report presents an evaluation of the status of language technology support for 30 European languages, based on four key areas: machine translation, speech processing, text analysis, as well as basic resources needed for building language technology applications. The languages were grouped into five clusters. Unsurprisingly, Icelandic is in the bottom cluster for all of the tools and resources listed. It compares well with other languages with a small number of speakers, such as Irish, Latvian, Lithuanian, and Maltese. These languages lag far behind large languages like German and French, for instance. But even language technology resources and tools for those languages clearly do not yet reach the quality and coverage of comparable resources and tools for the English language, which is in the lead in almost all language technology areas.

What needs to be done in order to ensure the future of the Icelandic language in the information society? In 1999, the Language Technology Expert Group estimated that it would cost around one billion Icelandic krónas (which then amounted to about ten million Euros) to make Icelandic language technology self-sustained. After that, the free market should be able to take over, since it would have access to public resources that would have been created by the government-funded Language Technology Programme, and that would be made available on an equal basis to everyone who was going to use these resources in their commercial products.

Even though the Language Technology Programme was successful and had a great impact on the development of Icelandic language technology, the fact remains that its total budget from 2000–2004 was only around 1/8 of the sum that the expert group estimated would be needed. It should therefore come as no surprise that Icelandic language technology is still in its infancy. 330,000 speakers are simply to few to sustain costly development of new products. At present, almost no companies are working in the language technology area because they do not see it as profitable. Continued public support for Icelandic language technology is necessary in order to guarantee exploitation of the tools already developed and the knowledge and experience of researchers and companies which has already been accrued.

The Icelandic language is not in imminent danger, even from the prowess of English language computing. However, the whole situation could change dramatically when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with very small populations of speakers. If not, even ‘larger’ languages will come under severe pressure. If Icelandic is to survive as a viable national language in the developed world, it must be able to meet IT demands. Consequently, investment in language technology must form an essential part of its language preservation policy.

META-NET’s vision is high-quality language technology for all languages that supports political and economic unity through cultural diversity. This technology will help tear down existing barriers and build bridges between Europe’s languages. This requires all stakeholders – in politics, research, business, and society – to unite their efforts for the future.

This white paper series complements the other strategic actions taken by META-NET. Up-to-date information such as the current version of the META-NET vision paper or the Strategic Research Agenda (SRA) can be found on the META-NET web site: