The Portuguese Language in the Digital Age — Executive Summary

The human language is a gateway to the world around us. It is by its daily usage that we communicate, learn, share information, plan our future, coordinate with each other to better act together, or get pleased with a story or a poem.

However, in the digital age and in a globalized world, human language is also one of the largest communicational barriers we are faced with. The new technologies of information and communication permit to reach people all over the world with whom we could communicate, and make available an endless repository of information that we could have access to. Nevertheless, for every one of us, most of this new universe keeps inaccessible and closed, locked within the invisible barriers of the languages that split it.

Europe is perhaps one of the most paradigmatic cases of the impact of linguistic barriers. During the last 60 years, it has become a distinct political and economic structure. Culturally and linguistically, it is rich and diverse. However, from Portuguese to Polish and Italian to Icelandic, everyday communication between Europe’s citizens, within business or among politicians is inevitably confronted with language barriers. The European Union's institutions, in turn, spend about a billion euros a year on maintaining their policy of multilingualism, i.e., translating texts and interpreting spoken communication.

Multilingualism constitutes a most precious heritage of mankind. A digital world in which a single language would take a dominant position, and would end up replacing all other languages, would imply losing this huge immaterial wealth which makes the world, in general, and Europe, in particular, a privileged space for cultural exchanges.

It is however a fact, that we have no advantage to ignore, that linguistic diversity hampers communication in daily life. It represents an insurmountable obstacle for citizens, hampers the political debate and delays economical and scientific progress.

Language technology and linguistic research can make a significant contribution to removing these linguistic borders. Combined with intelligent devices and applications, language technology will help people to talk and do business together even if they do not speak a common language. While preserving multlingualism, it will permit to tear down the linguistic barriers that are blocking the access to knowledge, thus helping to unleash the full potential of the information society.

To achieve this goal, and preserve Europe and world’s cultural and linguistic diversity, it is necessary to first carry out a systematic analysis of the linguistic particularities of different languages, and of the current state of language technology support for them. That is the goal of the present book in what concerns the Portuguese language.

The language technology and speech processing tools and applications currently available on the market – ranging from question answering systems to natural language interfaces, and including computational grammars or summarization tools, among many others –, still fall short, however, of this ambitious goal. This is specially true of automated translation, a particularly relevant technology to support multilinguality in the digital age. Already in the late 1970s, the European Union realised the profound relevance of language technology as a driver of European unity, and began funding its first research projects, such as EUROTRA. At the same time, national projects were set up that generated valuable results but never led to concerted European action. In contrast to this highly selective funding effort, other multilingual societies such as India (22 official languages) or South Africa (11 official languages) have recently set up long term national programmes for language research and technology development.

In this field, the dominant actors are primarily privately owned for profit enterprises based in Northern America. These companies today rely on imprecise statistical approaches that do not make use of deeper linguistic methods and knowledge. For example, sentences are automatically translated by comparing a new sentence against thousands of sentences previously translated by humans. The quality of the output largely depends on the amount and quality of the available sample corpus. While the automatic translation of simple sentences in languages with sufficient amounts of available text material can achieve useful results, such shallow statistical methods are doomed to fail in the case of languages with a much smaller body of sample material or in the case of sentences with little more complex structures.

This book provides a detailed analysis of this and other applications and solutions supported by language technology. As expected and as authoritatively substantiated by the volumes in this White Paper series, there are dramatic differences among the countries and their languages with respect to the available solutions and the advancement of research in terms of language technology.

Portuguese is the fifth language with the largest number of speakers in the world, with around 220 million speakers in four continents – Africa, America, Asia and Europe. From the European languages, it is the third one with the largest number of speakers in the world. Considering the new challenges raised by the information society in a globalized world, there is an urgent need to direct substantially more efforts both for the creation of language resources and for research and development of tools and applications for the computational processing of Portuguese.

The present volume provides a detailed rendering of the challenges, opportunities and needs for the Portuguese language in the digital age. One of the major conclusions drawn from the analysis undertaken in this book is that the development of language technology for Portuguese is urgent and of utmost importance for the consolidation of the Portuguese language as a language of international communication with global projection.