Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Danish Language in the Digital Age — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear?

All our computers are linked together into an increasingly dense and powerful global network. The girl in Ipanema, the customs officer in Padborg and the engineer in Kathmandu can all chat with their friends on Facebook, but they are unlikely ever to meet one another in online communities and forums. If they are worried about how to treat earache, they will all check Wikipedia to find out all about it, but even then they won’t read the same article. When Europe’s netizens discuss the effects of the Fukushima nuclear accident on European energy policy in forums and chat rooms, they do so in cleanly-separated language communities. What the internet connects is still divided by the languages of its users. Will it always be like this?

Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. What are the Danish language’s chances of survival?

With approximately 5 million native speakers, Danish must be considered a relatively small language at least when compared to several of the other EU languages. Similar to other small industrialised countries, people’s daily lives are greatly influenced by the English language: English movies and TV series are usually not dubbed, but shown with subtitles; big international companies increasingly use English as a “corporate language”; English is also becoming the lingua franca in higher education, similar to science and technology where it is playing this role for a long time.

There are plenty of complaints about the ever-increasing use of Anglicisms, and some even fear that the Danish language is becoming riddled with English words and expressions. But the only way to maintain Danish words and phrases is to actually use them – frequently and consciously; linguistic polemics about foreign influences and government regulations do not usually help. Our main concern should not be the gradual Anglicisation of our language, but its complete disappearance from major areas of our personal lives. Not science, aviation and the global financial markets, which actually need a world-wide lingua franca. We mean the many areas of life in which it is far more important to be close to a country’s citizens than to international partners – domestic policies, for example, administrative procedures, the law, culture and shopping.

The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications. Here the Danish language is fairly well-placed: many international software products are available in Danish versions; the Danish Wikipedia is growing, and with more than 1 million internet domains registered in 2011, Danish is well represented on the Web relative to its population.

In the field of language technology, however, the Danish language is not sufficiently equipped with products, technologies and resources for meeting future demands. There are applications and tools for speech synthesis, speech recognition, spelling correction, and grammar checking, but substantial improvements are required to ensure proper functionality in all relevant contexts. There are also some applications for automatically translating language, even though these often fail to produce linguistically and idiomatically correct translations, some of which can be explained by the lack of training material in terms of parallel corpora which include Danish. More advanced applications like text understanding, language generation, and dialogue management, are still in very early prototype stage, requiring typically semantically rich resources at a larger scale which are not available for Danish today.

Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between 57 languages, IBM’s supercomputer Watson that was able to defeat the US-champion in the game of “Jeopardy”, and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.

The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios. For example, it will help immigrants to learn the Danish language and integrate more fully into the country’s culture.

The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then proudly report on their achievements. This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modelling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.

However, there is a yawning technological gap between English and Danish, and it is currently getting wider. Every international technology competition tends to show that results for the automatic analysis of English are far better than those for less-resourced languages such as Danish, even though (or precisely because) the methods of analysis are similar, if not identical. This holds true for extracting information from texts, grammar checking, machine translation and a whole range of other applications. Many researchers reckon that these setbacks are due to the fact that, for fifty years now, the methods and algorithms of computational linguistics and language technology application research have first and foremost focused on English. However, other researchers believe that English is inherently better suited to computer processing. In any case, there is no doubt of the fact that we need a dedicated, consistent, and sustainable research effort if we want to be able to use the next generation of information and communication technology in those areas of our private and work life where we live, speak and write Danish.

After a relatively successful research record with several national and Nordic initiatives in the area of language technology in the period from 1985-2001, Danish is currently beginning to lack behind, also in the Nordic landscape. During the last decade, no substantial funding has been given to drive Danish language technology forward and the educational situation in the field is equally critical. As the present report will show, we cannot afford this stagnation. Denmark is ranked low on the European list when it comes to availability and development of language technology and there is an indispensable need for invigorating programs focusing on research and resource and technology development in the field. Otherwise we will fail to keep up when a new generation of technologies really starts to master human languages effectively. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available, then it will be able to ensure the survival of languages with very small populations of speakers. If not, even ‘larger’ languages will come under severe pressure.