In the context of META-SHARE, the term metadata refers to descriptions of Language Resources, encompassing both data sets (textual, multimodal/multimedia and lexical data, grammars, language models etc.) and tools/technologies/services used for their processing.
The META-SHARE metadata descriptions will constitute the means by which LR users will identify the resources they seek in the META-SHARE context. Thus, the META-SHARE metadata model forms an integral part of the search and retrieval mechanism, with a subset of its elements serving as the access points to the LRs catalogue. The model is, therefore, as informative and flexible as possible, allowing for multi-faceted search and viewing of the catalogue, as well as dynamic re-structuring thereof, offering LR consumers the chance to easily and quickly spot the resources they are looking for among a large bulk of resources. Although META-SHARE aims at an informed community (HLT specialists), this should by no means be interpreted as a permission to create an overly complex schema; user-friendliness of the search interface is supported by a well motivated, easy-to-understand schema.
In this effort, we build upon previous initiatives so that the model is easily adopted by the target community. The aim is not to create yet another metadata model but rather to adapt existing resource description models to a unified proposal catering for the specific requirements of the community.
META-SHARE promotes the use of widely acceptable standards for language resource building ensuring the maximum possible interoperability of language resources, as documented in "The Standards' Landscape Towards an Interoperability Framework".
As a general framework, the mechanism adopted is the component-based mechanism grouping together semantically coherent elements and relations as well as other components. More specifically, elements are used to encode specific descriptive features of the LRs, while relations are used to link together resources that are both included in the META-SHARE repository (e.g. original and derived, raw and annotated resources, a language resource and the tool that has been used to create it etc.), as well as resources with related entities (e.g. documentation manuals, publications, standards used, licences etc.).
In order to accommodate flexibility, the elements belong to two basic levels of description:
- an initial level providing the basic elements for the description of a resource (minimal schema), and
- a second level with a higher degree of granularity (maximal schema), providing more detailed information on each resource .
The maximal META-SHARE metadata model comprises all elements and relations assisting the description of LRs put together in components. Elements will be linked to existing ISOCcat DCR data categories and, if they have no counterpart, these will be added with appropriate definitions. Specific profiles will be built for distinct LR types (and subtypes) using the various components, providing also exemplary instantiations (e.g. for wordnet-type resources, for parallel corpora, for treebanks etc.) as guiding assistance to LRs metadata providers.
This document presents the metadata schema developed for META-SHARE. It is meant to act as a user manual, providing explanations on the model contents for LRs providers and LRs curators that wish to describe their resources in accordance to it.
The document will be updated inline with the META-SHARE platform, reflecting progress and changes as requested and necessary due to developments in the field.