Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The Croatian Language in the Digital Age — Executive Summary

Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitisation of information, knowledge and everyday communication affect our language? Will our language change or even disappear? What are the Croatian language’s chances of survival?

Many of the world’s 6,000 languages will not survive in a globalised digital information society. It is estimated that at least 2,000 languages are doomed to extinction in the decades ahead. Others will continue to play a role in families and neighbourhoods, but not in the wider business and academic world. The status of a language depends not only on the number of speakers or books, films and TV stations that use it, but also on the presence of the language in the digital information space and software applications.

In today's information society accessibility of information in your mother tongue is considered to be the civilisational level necessary for overcoming the digital divide. The linguistic communities without developed language technologies for their language will remain on the other side of digital divide. When it comes to the Croatian language and its language technologies, it is not just the assurance that it will be able to participate on equal grounds with other languages in our globalised information society, but even more it is about the imminent change of its sociolinguistic conditions. It is projected that from mid 2013 the Croatian language will become the 24th official language of the European Union. Starting with that moment it will be expected that for Croatian the whole range of different language resources, tools and services will be accessible, similar to the ones that already exist and are being developed further for other EU languages. Search engines providing full-text search with all word forms in which Croatian words could appear, dictation systems, i.e., speech to text systems for Croatian, or – maybe the most important – machine translation systems to and from Croatian, are just some of examples of important language technologies. These systems are not expected as research prototypes only, but also as useful commercial products. We can't expect that they will be developed for the Croatian language by researchers dealing with English, French, German, Czech, Slovenian or Serbian, but we have to develop these language resources, tools and services on our own. However, this will be easier to achieve if we harmonise and coordinate our efforts with similar efforts for other EU languages. It is exactly what the initiative described in this publication is about.

This white paper for the Croatian language demonstrates that a basic language research environment exists in Croatia, although the language industry is not really developed. Despite the fact that a small number of technologies and resources for Croatian exist, there are fewer of them developed for the Croatian language than for other Slavic languages, e.g., Czech, and far fewer than for the major EU languages, like English, German or French.

Although in Croatia there's a half-century long tradition of research in computational linguistics, natural language processing and corpus linguistics (with compiling such important language resources as the Croatian Frequency Dictionary, the Croatian National Corpus, the Croatian-English Parallel Corpus, the Croatian Morphological Lexicon, the Croatian Dependency Treebank, etc.), it can't be assumed that the current status of language technologies is satisfactory. Beside the nationally funded projects – unfortunately, still only few of them – since 2008 started more substantial funding through five EC projects: CLARIN, ACCURAT, LetsMT!, ATLAS, XLike; but they are also mainly oriented towards solving individual problems or providing technological solutions, and rarely towards advancing the overall situation of language technologies for Croatian. For the Croatian language the sixth project – CESAR – takes exactly this role within the wider META-NET initiative, by producing this white paper.

According to the assessment detailed in this report, focused action must be taken in order to bring the Croatian language resources and tools at the level of quality and quantity of language resources and tools that already exist for other European languages.

META-NET’s vision is high-quality language technology for all languages that supports political and economic unity through cultural diversity. This technology will help tear down existing barriers and build bridges between Europe’s languages. This requires all stakeholders – in politics, research, business, and society – to unite their efforts for the future.

This white paper series complements the other strategic actions taken by META-NET. Up-to-date information such as the current version of the META-NET vision paper or the Strategic Research Agenda (SRA) can be found on the META-NET web site: