The very diverse and heterogeneous landscape of huge amounts of digital and digitized resources collections (publications, datasets, multimedia files, processing tools, services and applications) has drastically transformed the requirements for their publication, archiving, discovery and long-term maintenance. Digital repositories provide the infrastructure for describing and documenting, storing, preserving, and making this information publicly available in an open, user-friendly and trusted way. Repositories represent an evolution of the digital libraries paradigm towards open access, advanced search capabilities and large-scale distributed architectures.
META-SHARE aims at providing such an open, distributed, secure, and interoperable infrastructure for the Language Technology domain. Open, since the infrastructure is conceived as an ever-evolving, scalable resource base including free and for-a-fee resources and services; distributed because it will consist of networked repositories/data centers accessible through common interfaces; interoperable, because the resource base will be standards-compliant, trying to overcome format, terminological and semantic differences; secure, since it will guarantee legally sound governance, legal compliance and secure access to licensable resources.
META-SHARE builds a multi-layer infrastructure that will:
- make available quality documented LRs and related metadata over the network,
- ensure that such LRs and metadata are properly managed, preserved and maintained,
- provide a set of services to all META-SHARE members and users,
- promote the use of widely acceptable standards for language resource building ensuring the maximum possible interoperability of LRs,
- allow associated third parties to export their LRs over the META-SHARE network,
- allow potential users of the LRs to easily and legally safely acquire the LRs requested for their own purposes.
The targeted resources and technologies of META-SHARE, in order of priority, include:
- language data, such as written and spoken corpora,
- language-related data, including and/or associated to other media and modalities where written and spoken natural language plays an important role,
- language processing and annotation tools and technologies,
- services through the use of language processing tools and technologies,
- evaluation tools, metrics and protocols, services addressing assessment and evaluation,
- service workflows by combining and orchestrating interoperable services.
META-SHARE intends to turn into a useful infrastructure for providers and users of language resources and technologies, as well as LT integrators/vendors, language professionals (translators, interpreters, localization experts), national and international data centers and repositories of LRs and technologies, and national and international LT policy makers and other LR & LT funders and sponsors.
META-SHARE will be a freely available facility, supported by a large user and developer community, based on distributed networked repositories accessible through common interfaces. Users (consumers, providers or aggregators) will have single sign-on accounts and will be able to access everything within the repositories network.
Language resources and their metadata will reside at the members’ repositories. Metadata only are exported to be available for harvesting and for populating the network’s inventory that will include metadata-based descriptions of all LRs in the network. Software for building one’s own repository will be made available by META-SHARE itself, free of charge.
In META-SHARE the requested LRs are just a few clicks away.