Database schema

This page describes the general schema of the Ġabra database. Since the database is based on JSON, this is not a schema in the traditional sense; it is rather a set of guidelines for what fields can be contained in each collection.

Collection lexemes

Field Type Description Example / allowed values
lemma string Main lemma
Required
"bahrad",
"kiteb",
"ħarġa"
alternatives array List of spelling alternatives ["bahraġ"]
pos string Part of speech "ADJ" / "ADP" / "ADV" / "AUX" / "CONJ" / "DET" / "INTJ" / "NOUN" / "NUM" / "PART" / "PRON" / "PROPN" / "PUNCT" / "SCONJ" / "SYM" / "VERB" / "X"
sources array Source keys ["Spagnol2011","Falzon2013"]
glosses array English glosses, with examples
root object Root of entry {"radicals":"k-t-b"},
{"radicals":"b-ħ-b-ħ","variant":2}
headword object Headword for entry {"lemma":"abbozz","pos":"NOUN"}
form string General form "mimated" / "comparative" / "verbalnoun" / "diminutive" / "participle" / "accretive"
derived_form integer Derived form of verb (1–10)
gender string "m" / "f"
transitive boolean
intransitive boolean
ditransitive boolean
hypothetical boolean
archaic boolean
multiword boolean
pending boolean Flagged as incorrect or new suggestion
phonetic string Phonetic description of lemma "'skrɛjjɛn"
apertium_paradigm string Name of paradigm in Apertium lexicon "epi/ku__adj"
onomastic_type string Onomastic type (proper nouns) "toponym" / "organisation" / "anthroponym" / "cognomen" / "other"
comment string General comment

Source: lexeme.json
Last updated 2020-10-27T19:07:17.847Z

Collection wordforms

Field Type Description Example / allowed values
lexeme_id object Should be a valid ID in lexemes collection
Required
surface_form string Surface form
Required
"skrejjen"
alternatives array List of spelling alternatives ["doxxa","duxxa"]
gloss string English gloss
sources array Source keys ["Spagnol2011","Falzon2013"]
gender string m (masculine), f (feminine), mf (both masculine and feminine) "m" / "f" / "mf"
number string sg (singular), dl (dual), sgv (singulative), coll (collective), sp (both sg and pl), pl (plural), pl_ind (indeterminate plural - probably not needed), pl_det (determinate plural), pl_pl (plural of plural) "sg" / "dl" / "pl" / "sgv" / "coll" / "sp" / "pl_ind" / "pl_det" / "pl_pl"
plural_form string Plural type "counted"
subject null,object Subject agreement (verbs) {"person":"p3","number":"sg","gender":"m"}
dir_obj null,object Direct object agreement {"person":"p3","number":"pl"}
ind_obj null,object Indirect object agreement {"person":"p1","number":"pl"}
possessor null,object Agreement for nouns which inflect for possessive {"person":"p3","number":"sg","gender":"m"}
form string General morphological form "comparative" / "superlative" / "diminutive" / "interrogative" / "mimated" / "verbalnoun"
aspect string Aspect (verbs) "perf" / "impf" / "imp" / "pastpart" / "prespart"
polarity string "pos" / "neg"
stem string
phonetic string Phonetic transcription "'skrɛjjɛn"
pattern string Vowel-consonant pattern "CCVVCVC"
hypothetical boolean
archaic boolean
generated boolean
pending boolean Flagged as incorrect or new suggestion

Source: wordform.json
Last updated 2020-01-27T08:24:49.637Z

Collection roots

Field Type Description Example / allowed values
radicals string Radicals separated with hyphens
Required
"k-t-b",
"ċ-p-ċ-p"
variant integer For distinguishing different roots with same radicals
alternatives string Alternative roots or cross-reference "b-h-r-d",
"see h-ż-ż"
type string Root class
Required
"strong" / "geminated" / "weak-initial" / "weak-medial" / "weak-final" / "irregular"
sources array Source keys (all roots come from Spagnol2011)
Required
["Spagnol2011"]

Source: root.json
Last updated 2019-06-20T17:55:50.022Z

Collection sources

Field Type Description Example / allowed values
key string Key
Required
"Spagnol2011"
author string Full author name "Michael Spagnol"
title string Title of resource "A Tale of Two Morphologies. Verb structure and argument alternations in Maltese"
year integer Year of release 2011
note string General note "Germany: University of Konstanz dissertation"

Source: source.json
Last updated 2019-06-20T17:55:50.022Z

Collection logs

Field Type Description Example / allowed values
collection string Collection
Required
"lexemes" / "wordforms" / "roots"
object_id ObjectId Must be valid ID in collection
Required
date ISODate Date/time of edit
Required
action string Type of edit "created" / "modified" / "deleted"
username string Username of user making edit
Required
"john.camilleri"
ip string IP address of user making edit "192.168.0.1"
new_value object New document, if available
Required

Source: log.json
Last updated 2019-06-20T17:55:50.022Z

Note: Because of a previous bug, the collection field may erroneously show 'lexemes' instead of 'wordforms', in particular for deletions.

Universal POS tag set

See: https://universaldependencies.org/u/pos/

Tag Description
ADJ adjective
ADP adposition
ADV adverb
AUX auxiliary verb
CONJ coordinating conjunction
DET determiner
INTJ interjection
NOUN noun
NUM numeral
PART particle
PRON pronoun
PROPN proper noun
PUNCT punctuation
SCONJ subordinating conjunction
SYM symbol
VERB verb
X other