Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Gebrearegawi Gebremariam
  • Hailay Teklehaymanot
  • Gebregewergs Mezgebe

Research Organisations

External Research Organisations

  • Aksum University (AKU)
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the Fifth Workshop on Resources for African Indigenous Languages
Subtitle of host publication@ LREC-COLING 2024
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Pages94-106
Number of pages13
ISBN (electronic)9782493814401
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages, RAIL 2024 - Torino, Italy
Duration: 25 May 202425 May 2024

Abstract

Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

Keywords

    Ge’ez, morphological synthesizer, morphology, NLP, rule-based

ASJC Scopus subject areas

Cite this

Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. / Gebremariam, Gebrearegawi; Teklehaymanot, Hailay; Mezgebe, Gebregewergs.
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. ed. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. 2024. p. 94-106.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Gebremariam, G, Teklehaymanot, H & Mezgebe, G 2024, Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (eds), Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. pp. 94-106, 5th Workshop on Resources for African Indigenous Languages, RAIL 2024, Torino, Italy, 25 May 2024. <https://aclanthology.org/2024.rail-1.11>
Gebremariam, G., Teklehaymanot, H., & Mezgebe, G. (2024). Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Eds.), Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024 (pp. 94-106) https://aclanthology.org/2024.rail-1.11
Gebremariam G, Teklehaymanot H, Mezgebe G. Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. In Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, editors, Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. 2024. p. 94-106
Gebremariam, Gebrearegawi ; Teklehaymanot, Hailay ; Mezgebe, Gebregewergs. / Morphological Synthesizer for Ge’ez Language : Addressing Morphological Complexity and Resource Limitations. Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. editor / Rooweither Mabuya ; Muzi Matfunjwa ; Mmasibidi Setaka ; Menno van Zaanen. 2024. pp. 94-106
Download
@inproceedings{e1c3f4f7bdca4d6eb6077694324d801b,
title = "Morphological Synthesizer for Ge{\textquoteright}ez Language: Addressing Morphological Complexity and Resource Limitations",
abstract = "Ge{\textquoteright}ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia{\textquoteright}s cultural and religious development during the Aksumite kingdom era. Ge{\textquoteright}ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge{\textquoteright}ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge{\textquoteright}ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge{\textquoteright}ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge{\textquoteright}ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.",
keywords = "Ge{\textquoteright}ez, morphological synthesizer, morphology, NLP, rule-based",
author = "Gebrearegawi Gebremariam and Hailay Teklehaymanot and Gebregewergs Mezgebe",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 ; Conference date: 25-05-2024 Through 25-05-2024",
year = "2024",
language = "English",
pages = "94--106",
editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",
booktitle = "Proceedings of the Fifth Workshop on Resources for African Indigenous Languages",

}

Download

TY - GEN

T1 - Morphological Synthesizer for Ge’ez Language

T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024

AU - Gebremariam, Gebrearegawi

AU - Teklehaymanot, Hailay

AU - Mezgebe, Gebregewergs

N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.

PY - 2024

Y1 - 2024

N2 - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

AB - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

KW - Ge’ez

KW - morphological synthesizer

KW - morphology

KW - NLP

KW - rule-based

UR - http://www.scopus.com/inward/record.url?scp=85195217813&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85195217813

SP - 94

EP - 106

BT - Proceedings of the Fifth Workshop on Resources for African Indigenous Languages

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

Y2 - 25 May 2024 through 25 May 2024

ER -