The Dutch Language in the Digital Age — Executive Summary
Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear?
All our computers are linked together into an increasingly dense and powerful global network. The girl in Ipanema, the customs officer in Venlo, and the engineer in Kathmandu can all chat with their friends on Facebook, but they are unlikely ever to meet one another in online communities and forums. If they are worried about how to treat earache, they will all check Wikipedia to find out all about it, but even then they won’t read the same article. When Europe’s netizens discuss the effects of the Fukushima nuclear accident on European energy policy in forums and chat rooms, they do so in cleanly-separated language communities. What the internet connects is still divided by the languages of its users. Will it always be like this?
Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. What are survival chances of the Dutch language?
With about 23 million native speakers, Dutch is the 8th most widely spoken native language in the EU. It is just a ‘small’ language in comparison to its neighbouring languages English, German, and French. The influence of English on language use especially by younger people is significant. Business, even if confined to the Low Countries (the Netherlands and Flanders), is often conducted in English, especially in transnational companies. The language of communication in science is English. Higher education is increasingly given in English instead of Dutch. Book publications in Dutch, films, and TV and radio programmes in Dutch exist of course, but the market for them is rather small. Within the European Union, Dutch is an official language, but Dutch is hardly used in European Union business. The Dutch language will surely not disappear completely, but there is a real danger that the use of the Dutch language will disappear from major areas of our personal lives, in particular, e.g., from domestic policies, administrative procedures, the law, culture and shopping.
The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications. The Dutch Wikipedia is the ninth largest in the world. With about 1.24 million Internet domains, the Netherlands’s top-level country domain .nl is the 11th country extension. Though not bad for a small region and growing, the amount of Dutch language data available on the web is of course minor compared to the English language data and language data from several other bigger languages such as German and French. Thanks to the STEVIN programme, which had the consolidation of the Dutch language in the modern communication and information society as one of its explicit goals, the Dutch language is also not doing too bad in terms of software for the Dutch language and language resources needed to develop such software. It plays in the same league as German and French, but it is still far behind on English.
Information and communication technology are now preparing for the next revolution. After personal computers, networks, miniaturisation, multimedia, mobile devices and cloud-computing, the next generation of technology will feature software that understands not just spoken or written letters and sounds but entire words and sentences, and supports users far better because it speaks, knows and understands their language. Forerunners of such developments are the free online service Google Translate that translates between 57 languages, IBM’s supercomputer Watson that was able to defeat the US-champion in the game of "Jeopardy", and Apple’s mobile assistant Siri for the iPhone that can react to voice commands and answer questions in English, German, French and Japanese.
The next generation of information technology will master human language to such an extent that human users will be able to communicate using the technology in their own language. Devices will be able to automatically find the most important news and information from the world’s digital knowledge store in reaction to easy-to-use voice commands. Language-enabled technology will be able to translate automatically or assist interpreters; summarise conversations and documents; and support users in learning scenarios. For example, it will help immigrants – as required by the governments of the Low Countries – to learn the Dutch language and integrate more fully into the country’s culture.
The next generation of information and communication technologies will enable industrial and service robots (currently under development in research laboratories) to faithfully understand what their users want them to do and then ‘proudly’ report on their achievements.
This level of performance means going way beyond simple character sets and lexicons, spell checkers and pronunciation rules. The technology must move on from simplistic approaches and start modeling language in an all-encompassing way, taking syntax as well as semantics into account to understand the drift of questions and generate rich and relevant answers.
However, there is a yawning technological gap between English and other languages, including Dutch, and it is currently getting wider. Commercial companies investigate, develop, sell and use language technology initially for the (American) English language, simply because the most interesting markets are in (American) English speaking countries. The technological forerunners mentioned above will in some cases come only much later for the Dutch language, and in many cases not at all. Partially as result of this, most academic research is also done on the (American) English language. The Dutch language is hardly anywhere in sight in these developments.
International technology competitions tend to show that results for the automatic analysis of English are far better than those for Dutch, even though (or precisely because) the methods of analysis are similar, if not identical. This holds true for extracting information from texts, grammar checking, machine translation and a whole range of other applications.
Many researchers reckon that these setbacks are due to the fact that, for fifty years now, the methods and algorithms of computational linguistics and language technology application research have first and foremost focused on English. In a selection of leading conferences and scientific journals published between 2008 and 2010, the number of publications on language technology for English was an order of magnitude larger than the number of publications on language technology for any European language.
However, other researchers believe that the currently used methods in natural language processing are more suited to the English language than to, e.g., German or Dutch (because of linguistic properties of these languages). This means that we need a dedicated, consistent, and sustainable research effort if we want to be users of the next generation of information and communication technology in those areas of our private and work life where we live, speak and write Dutch.
Only by dedicated programmes such as the STEVIN programme was it possible to create language resources and basic tools to be able to carry out research on language technology for the Dutch language, and to make it more attractive to companies to develop and offer products and services in the Dutch language. There surely is a very high research potential on this side of the Atlantic. Apart from internationally renowned research centres and universities, there are a number of innovative small and medium-sized language technology companies that manage to survive through sheer creativity and immense efforts, despite the lack of venture capital or sustained public funding.
Summing up, the Dutch language will surely not disappear as a whole, even from the prowess of English language computing. But, with the increasing expansion of the digital information society, it may disappear in selected domains such as policy discussions and decisions, culture, education, administrative procedures, the law and shopping. We can prevent this by ensuring that the Dutch language survives in the digital world. This requires sustainable support for research into and development of language technology for the Dutch language. Through improvements in machine translation, language technology will help in overcoming language barriers, but it will only be able to operate between those languages that have managed to survive in the digital world. If there is adequate language technology available for a language, then it will be able to survive in the digital world even if it has a very small speaker population. If not, the language will come under severe pressure.