Details
Original language | English |
---|---|
Title of host publication | Proceedings of the Fifth Workshop on Resources for African Indigenous Languages |
Subtitle of host publication | @ LREC-COLING 2024 |
Editors | Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen |
Pages | 94-106 |
Number of pages | 13 |
ISBN (electronic) | 9782493814401 |
Publication status | Published - 2024 |
Event | 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 - Torino, Italy Duration: 25 May 2024 → 25 May 2024 |
Abstract
Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.
Keywords
- Ge’ez, morphological synthesizer, morphology, NLP, rule-based
ASJC Scopus subject areas
- Social Sciences(all)
- Linguistics and Language
- Arts and Humanities(all)
- Language and Linguistics
- Social Sciences(all)
- Education
- Social Sciences(all)
- Library and Information Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. ed. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. 2024. p. 94-106.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Morphological Synthesizer for Ge’ez Language
T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024
AU - Gebremariam, Gebrearegawi
AU - Teklehaymanot, Hailay
AU - Mezgebe, Gebregewergs
N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.
PY - 2024
Y1 - 2024
N2 - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.
AB - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.
KW - Ge’ez
KW - morphological synthesizer
KW - morphology
KW - NLP
KW - rule-based
UR - http://www.scopus.com/inward/record.url?scp=85195217813&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195217813
SP - 94
EP - 106
BT - Proceedings of the Fifth Workshop on Resources for African Indigenous Languages
A2 - Mabuya, Rooweither
A2 - Matfunjwa, Muzi
A2 - Setaka, Mmasibidi
A2 - van Zaanen, Menno
Y2 - 25 May 2024 through 25 May 2024
ER -