Personal tools
A Network of Excellence forging the
Multilingual Europe Technology Alliance

The French Language in the Digital Age — Executive Summary

Multilingualism is an essential component of Europe's construction. It is mandatory to give to each European citizen the right to use his or her native language and to each European Member State the capacity to preserve its culture, just as it is essential to facilitate the communication among citizens and help them to cross the language barriers in the European Community information and commercial space. The same need exists at the level of the whole planet.

Is it acceptable to just watch the disappearance of European languages, together with the cultures they are a part of? As a consequence of the language barriers, can we constrain ourselves to accept to merely note the weakness of the European market growth, to not have access to the cultural richness of other countries, to not know the genuine sources of information that shape Europe?

Multilingualism represents an important cost. As a result, minority languages progressively disappear to the benefit of majority languages. Among the circa 6,500 languages which exist in the world, it is estimated that half of them will have disappeared before the end of the present century. Many European languages have already disappeared, or had almost disappeared but were saved only thanks to political will.

How can we process the 48 hours of video that are uploaded on YouTube in hundreds of languages every minute? How can we ensure that European patents are accessible to companies using languages other than English, French or German? How can we allow a professor to teach students who don’t speak his or her language? How can we avoid making a researcher write articles in a single language, giving up his or her own language? How can we ensure that a language keeps on enriching itself with new terms at the pace of increasing knowledge? How can we avoid having to shift from our mother tongue to another language when we go from a café to a university lecture hall?

Digital technologies, especially language technologies, make a difference. The Web facilitates the production of and getting access to information and knowledge for all users. Wikipedia exists in about 300 languages. Social networks imply the use of the users’ languages. Facebook therefore exists in 80 languages, and Twitter in 20. The progress of science has resulted in the availability of language technologies: search engines, speech recognition and synthesis systems, automatic translation of text and speech, etc. for an increasing number of languages. Google Translate addresses about 60 languages, including 20 with spoken interaction; Apple’s Siri is available for four languages; Jibbigo, a stand-alone speech translation system, covers a dozen. However, those technologies are only available for about 60 languages, thus about 1% of existing languages, and at various levels of quality and therefore of usability, depending on the language. New systems bring even more advanced functionalities, such as IBM’s Watson for Question Answering, which won the Jeopardy TV game show in the US in 2011 but only works for English. However, human knowledge in general cannot be reduced to the knowledge that has been encoded in one single language and according to one single culture.

The contribution of these technologies reduces the cost of multilingualism, therefore making it possible. It is even the only way to make it possible. And some technologies, such as automatic subtitling with translation or spell checkers, facilitate language training.

Can we accept that, in the best case scenario, these technologies are today provided for free by US companies,at the loss of our independence and sovereignty? How can we understand that a community of countries, which are willing to share the richness of their cultures but are facing the linguistic barrier as an obstacle to their mutual exchanges, do not invest, do not join together in order to highlight this cultural richness and get rid of this obstacle, except to think that they do not address the basic questions that are crucial for their union?

Trying to convince decision makers of the necessity of developing these technologies is a difficult task. No large industrial group would put multilingualism among their top priorities, whether it be in the sector of the car or plane industries, telecommunications, consumer electronics, computer, medical or audiovisual businesses. However, each of those sectors needs multilingualism for different purposes, and the sum of those small priorities itself is huge. Who will make this computation? Who will explain it? Who will gather the various stakeholders to back it up? Only a strong political will at the level of the European Union could achieve that and demonstrate that language technologies are not only a topic of research and development among others, that language resources are not only data lost among many others, but that they are an essential component of the European construction, shared by most of the European Commission sectors and by all Member States.

META-NET is a Network of Excellence supported by the European Commission. It presently comprises more than 50 research laboratories among the best in the area of language sciences and technologies, in more than 30 European countries. META-NET wrote White Papers on each language that it covered, each being written in that language as well as in English.

The French language is an important international language, with approximately 220 million speakers around the world and approximately 100 million language learners. It is one of the official languages of the European Union, of more than 30 countries, and also of large international organizations. It was considered for a long time as the preferred language for diplomacy or culture, but English progressively replaced it for all uses. French is very well placed on the Internet where it is ranked 8th of the languages used for Web search queries, following English, but also Spanish, Portuguese and German. As a sign of its capacity to express human knowledge, French is ranked 3rd in Wikipedia, behind English and German. More than 60 languages, including regional languages, are also spoken in metropolitan France or in its overseas territories.

Language technologies exist for the automatic processing of French either for the written, spoken and sign languages. They include spell checkers, search engines, speech recognition, synthesis and dialog systems, text and speech translation engines, but also speaker verification or language identification tools, information retrieval or automatic summarization.

French research benefited from programs in this domain, such as the Language Industries Francophone Program (FRANCIL) of the Francophone Universities Association (AUF), or the TechnoLangue program supported by several French ministries. Nowadays, the large French-German Quaero program on the automatic processing of multilingual and multimedia documents gathers about 30 industrial and academic partners around the development of eight applicative projects, and of more than 30 technologies for the processing of written and spoken language, image, video and music. It is entirely structured around the systematic evaluation of technology progress and the production of the data that is necessary for developing and testing those technologies.

All those projects allowed for investing in producing the data necessary for the development of technologies for the French language. This puts French in an excellent position within the group of European languages benefiting from those technologies, ranking with German, Spanish, Italian and Dutch, yet far behind English, given that none of the languages presently benefit from the full set of language technologies with a sufficient level of quality and from the data needed for developing those technologies. International evaluation campaigns show in an objective and quantitative way that French research laboratories and the technologies they develop are among the best in the world.

However, French companies, just like European ones, are almost all SMEs that compete with difficulty with big US companies such as Google, Apple, IBM, Microsoft or Nuance, all of which invested massively in these technologies. Ironically, many researchers of those US companies have been trained in European research laboratories.

The situation looks similar in other large industrialized countries where the French language is widely used: Belgium, Switzerland and Canada.

The funding of research and innovation on language technologies lacks continuity and is made up of short-term coordinated programs interrupted by periods of low or sparse financing. It also lacks coordination with programs that exist in other states of the European Union, or at the European Commission, even though this research topic seems to be ideally placed for benefiting from a shared transnational effort. The situation is similar within the European Commission, where the priority assigned to this domain varies over time. It sometimes benefits from a specific interest through a dedicated Commissioner, Unit and action line in the Framework Program, and other times it gets merged into a conglomerate of various natures, while its unique role in the construction of Europe is clearly identified.

A European directive such as the one for the access to information for the handicapped people, expressing the importance of removing language barriers and stating that any European citizen, whatever his or her language, should be able to have access to any information produced in the European Union, be it a book, a newspaper, a TV or radio broadcast, a movie, etc., whatever the language in which it has been produced, would provide a big push to that sector.

A large, coordinated program on language technologies in the framework of the next European program for research and innovation would help to allow for multilingualism and therefore save French in all its dimensions, as well as other national or regional languages, and facilitate cultural and commercial exchanges in Europe and beyond.