Lexical resource

Digital database of words

In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database.[1]

Characteristics

Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an ISO standard for encoding lexical resources, comprising an abstract data model and an XML serialization,[2] and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as knowledge graphs on the web, e.g., as Linguistic Linked Open Data.[3]

Depending on the type of languages that are addressed, a lexical resource may be qualified as monolingual, bilingual or multilingual. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the equivalence from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation vartrans:translatableAs in OntoLex-Lemon) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ontolex:Concept in OntoLex-Lemon).[4]

It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.

Machine-readable dictionary vs. NLP dictionary

Lexical resources in digital lexicography are often referred to as machine-readable dictionary (MRD), a dictionary stored as machine (computer) data instead of being printed on paper. It is an electronic dictionary and lexical database. The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.[5]

Lexical database

A lexical database is a lexical resource which has an associated software environment database which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered.

Information typically stored in a lexical database includes spelling, lexical category and synonyms of words, as well as semantic and phonological relations between different words or sets of words.

See also

References

  1. ^ SARMA, Shikhar Kr, et al. Building multilingual lexical resources using wordnets: Structure, design and implementation. In: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon. 2012. S. 161-170.
  2. ^ Francopoulo, Gil; Bel, Nuria; George, Monte; Calzolari, Nicoletta; Monachini, Monica; Pet, Mandy; Soria, Claudia (2009-03-01). "Multilingual resources for NLP in the lexical markup framework (LMF)" (PDF). Language Resources and Evaluation. 43 (1): 57–70. doi:10.1007/s10579-008-9077-5. ISSN 1574-0218. S2CID 7697316.
  3. ^ Cimiano, Philipp; Chiarcos, Christian; McCrae, John P.; Gracia, Jorge (2020), Linguistic Linked Data: Representation, Generation and Applications, Springer International Publishing, pp. 45–59, doi:10.1007/978-3-030-30225-2_4, ISBN 978-3-030-30225-2, S2CID 214148590
  4. ^ Cimiano, Phillip; McCrae, John P.; Buitelaar, Paul. "Lexicon Model for Ontologies: Community Report, 10 May 2016 Final Community Group Report 10 May 2016". W3C. Retrieved 6 December 2019.
  5. ^ Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 (ISBN 978-1-84821-430-9)

External links

  • The WordNet Home Page
  • Lexicographic Search Engine
  • v
  • t
  • e
General termsText analysis
Text segmentation
Automatic summarizationMachine translationDistributional semantics modelsLanguage resources,
datasets and corpora
Types and
standards
Data
Automatic identification
and data captureTopic modelComputer-assisted
reviewingNatural language
user interfaceOther software