This document is aimed to give advice on representing lexicographical data as linked data on the Web. Lexicographic linked data can either be migrated from earlier, non Semantic Web, data sources, or build from scratch using linked data mechanisms. We will not discuss here the technical aspects of the data creation or conversion process, but will focus on modelling issues related to the lexicographic nature of the data. To that end, we will focus on Ontolex lemon and its lexicog module as modelling choices, giving advise on how to use them with real data examples. TBC

There are a number of ways that one may participate in the development of this report:

------------------- UNDER CONSTRUCTION -------------------

Types of resources

The main type of language resources covered in this document are dictionaries or any other lexicographic resource whose data exists in a machine-processable format, no matter whether it is stored locally or is accessible on the Web (e.g., for download). We assume that the data is represented in a structured or semi-structured way (e.g., relational database, xml, csv, etc.). We will illustrate our discussion with real examples from the LiLa project [LILA]. DESCRIBE LILA BRIEFLY

Selected Vocabularies

TBC Here we will mention Ontolex and lexicog

Table 1: Namespaces of the relevant vocabularies
owl<http://www.w3.org/2002/07/owl#>
rdfs<http://www.w3.org/2000/01/rdf-schema#>
ontolex<http://www.w3.org/ns/lemon/ontolex#>
lime<http://www.w3.org/ns/lemon/lime#>
vartrans<http://www.w3.org/ns/lemon/vartrans>
lexicog<http://www.w3.org/ns/lemon/lexicog>
lexinfo<http://www.lexinfo.net/ontology/3.0/lexinfo#>
lila<http://lila-erc.eu/data/>

Lexicog in a nutshell

TBC

Guidelines/Best practises

TBC

When to choose lexicog

[extracted from the lexicog specification:]

  1. As long as the entities in OntoLex and the other lemon modules, together with those of catalogues of linguistic categories (e.g. LexInfo), suffice to represent the information encoded in the lexicographic resource (e.g., lexical entry, part of speech, translation, ...), the OntoLex lexicography module need not be applied.
  2. In the case of lexicographic information that cannot be modelled by using either OntoLex nor any of the other lemon modules (e.g., to denote sense ordering), the OntoLex lexicography module should be used, at the same time avoiding redundancies and keeping additional information to the minimum.
The reason behind this is that this module adds some complexity by providing additional description capabilities to the purely lexical description accounted by OntoLex. If this information is not needed for a specific conversion, i.e, if the lexicographic view is not essential, reusing lemon would allow for keeping the representation simpler but yet sufficient.

When to go beyond lexicog

TBC

How to choose proper metadata for dictionaries

TBC

Some modelling examples

TBC

Different POS in the same entry

TBC

Modelling usage examples

TBC

Collocation dictionary

TBC

Acknowledgements

The authors would like to thank the BPMLOD community group members for their valuable feedback.

References

[AP_PAPER]
M. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O'Regan, S. Ortiz-Rojas, J. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. Tyers, Apertium: a free/open-source platform for rule-based machine translation . Machine Translation, vol. 25, no. 2, pp. 127-144, 2011.
[AP_RDF]
RDF version of the Apertium bilingual dictionaries. URL: http://linguistic.linkeddata.es/apertium/
[DC]
DCMI Metadata Terms. URL: http://purl.org/dc/elements/1.1/
[DCAT]
F. Maali, J. Erickson (Eds.). Data Catalog Vocabulary (DCAT). W3C Recommendation. January 2014 URL: http://www.w3.org/TR/vocab-dcat/
[GUIDE_MLD]
A. Gómez-Pérez, D. Vila-Suero, E. Montiel-Ponsoda, J. Gracia, and G. Aguado-de Cea, Guidelines for multilingual linked data , in Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics (WIMS'13). New York, NY, USA: ACM, Jun. 2013.
[ISA_URIS]
P. Archer, S. Goedertier, and N. Loutas, Study on persistent URIs Tech. Rep., ISA, Dec. 2012.
[LEMON_PAPER]
J. McCrae, G. Aguado-de Cea, P. Buitelaar, P. Cimiano, T. Declerck, A. Gómez-Pérez, J. Gracia, L. Hollink, E. Montiel-Ponsoda, D. Spohr, and T. Wunner, Interchanging lexical resources on the Semantic Web . Language Resources and Evaluation, vol. 46, 2012.
[LEMON]
The lemon model. URL: http://lemon-model.net/
[LEXINFO]
Lexinfo. URL: https://lexinfo.net/index.html
[LILA]
Marco C Passarotti, Flavio Massimiliano Cecchini, Greta Franzini, Eleonora Litta, Francesco Mambrini, Paolo Ruffolo The LiLa Knowledge Base of Linguistic Resources and NLP Tools for Latin . LDK-PS 2019.
[LMF]
Lexical Markup Framework (LMF). URL: http://www.lexicalmarkupframework.org/
[ONTOLEX]
Lexicon Model for Ontologies. URL: https://www.w3.org/2016/05/ontolex/
[ONTOLEX_PAPER]
John P McCrae, Julia Bosque-Gil, Jorge Gracia, Paul Buitelaar, and Philipp Cimiano, The Ontolex-Lemon model: development and applications . Proceedings of eLex 2017 conference.
[TR]
Translation Module. URL: http://purl.org/net/translation
[TR_PAPER]
J. Gracia, E. Montiel-Ponsoda, D. Vila-Suero, and G. Aguado-de Cea, Enabling language resources to expose translations as linked data on the web, in Proc. of 9th Language Resources and Evaluation Conference (LREC'14), Reykjavik (Iceland), May 2014.
[TRCAT]
OEG Translation Categories. URL: http://purl.org/net/translation-categories
[VOID]
K. Alexander, R. Cyganiak, M. Hausenblas, J. Zhao, Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note. March 2011. URL: http://www.w3.org/TR/void/