Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Norwegian Language in the Digital Age (Nynorsk Version) — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear? What are the Norwegian language’s chances of survival?

Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications.

In this context, Norwegian is still having growing pains. At the beginning of the 21st century, Norwegian language technology existed on a very small scale. A quite satisfactory translation existed from Bokmål and Nynorsk, there was spell checking, and there was a small question answering system, but people laughed at the poor performance of the first speech recognition applications. An ambitious language industry initiative at Voss failed. There were higher education programmes on language technology and computational linguistics and there was ongoing research in these areas, but there was a shortage of language resources and tools.

Things started to change when the Research Council of Norway took the initiative for a language technology research programme in 2002, with the aim of developing knowledge and tools. This programme resulted in a number of projects which created new competence and laid the groundwork for Norwegian language technology. The largest projects in this programme delivered a text-to-speech system and a demonstrator of quality translation from Norwegian to English.

More recently, a government White Paper from 2008 and its acceptance in Parliament led to the establishment of the Language Technology Resource Collection for Norwegian – Språkbanken in 2010. This unit has begun to build up and distribute language data that has long been wanting in R & D. If these efforts are sustained, they will be an invaluable investment in the future of Norwegian.

However, this report reveals that despite considerable achievements in the last decade, the situation is only acceptable with respect to the most basic tools and resources for Norwegian. When it comes to advanced applications, few tools and resources exist for Norwegian. It is clear that we still have a long way to go to ensure the future of Norwegian as a full-fledged player in the modern – and future – European information society.

Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are IBM’s supercomputer Watson that was able to defeat the US-champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese. A Norwegian speech dictation system for the iPhone has also become available but it is still less reliable than the English version.

Human users are starting to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents and to support users in learning scenarios. For example, it may help immigrants to learn the Norwegian language and integrate more fully into our society. Information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements. This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.

Not all European languages are equally well prepared for this future. This report presents an evaluation of the status of language technology support for 30 European languages, based on four key areas: machine translation, speech processing, text analysis, as well as basic resources needed for building language technology applications. The languages were grouped into five clusters. Unsurprisingly, Norwegian is in the cluster at the bottom or only one up for all of the tools and resources listed. It lags far behind large languages like German and French, for instance. But even language technology resources and tools for those languages clearly do not yet reach the quality and coverage of comparable resources and tools for the English language, which is in the lead in almost all language technology areas.

In the government White Paper no. 48 it is asserted that language technology will be one of the most crucial areas in the battle to preserve our language. What needs to be done, then, in order to ensure the future of the Norwegian language in the information society? In 2002, an expert group established by the government estimated that it would require an investment of 20 million NOK per year during the first five years. Even though Språkbanken is now established, fact remains that the yearly investment so far has been only a small fraction of the estimated required effort. It should therefore come as no surprise that Norwegian language technology is still in its infancy. Five million speakers are too few to sustain costly development of new products. Norwegian IT industries and especially SMEs cannot by themselves take the cost of building up large language resources and tools for Norwegian. Continued public support for Norwegian language technology is necessary in order to guarantee the exploitation of the tools already developed and the knowledge and experience of researchers and companies which has already been accrued.

The Norwegian language is not in imminent danger from the prowess of English language computing. However, the whole situation could change dramatically when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with relatively small populations of speakers. Consequently, the continued investment in language technology must form an essential part of its language preservation policy.

META-NET’s vision is high-quality language technology for all languages that supports political and economic unity through cultural diversity. This technology will help tear down existing barriers and build bridges between Europe’s languages. This requires all stakeholders – in politics, research, business, and society – to unite their efforts for the future.

This white paper series complements the other strategic actions taken by META-NET. Up-to-date information such as the current version of the META-NET vision paper or the Strategic Research Agenda (SRA) can be found on the META-NET web site: