Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Gebrearegawi Gebremariam
  • Hailay Teklehaymanot
  • Gebregewergs Mezgebe

Organisationseinheiten

Externe Organisationen

  • Aksum University (AKU)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the Fifth Workshop on Resources for African Indigenous Languages
Untertitel@ LREC-COLING 2024
Herausgeber/-innenRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Seiten94-106
Seitenumfang13
ISBN (elektronisch)9782493814401
PublikationsstatusVeröffentlicht - 2024
Veranstaltung5th Workshop on Resources for African Indigenous Languages, RAIL 2024 - Torino, Italien
Dauer: 25 Mai 202425 Mai 2024

Abstract

Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

ASJC Scopus Sachgebiete

Zitieren

Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. / Gebremariam, Gebrearegawi; Teklehaymanot, Hailay; Mezgebe, Gebregewergs.
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. Hrsg. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. 2024. S. 94-106.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Gebremariam, G, Teklehaymanot, H & Mezgebe, G 2024, Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (Hrsg.), Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. S. 94-106, 5th Workshop on Resources for African Indigenous Languages, RAIL 2024, Torino, Italien, 25 Mai 2024. <https://aclanthology.org/2024.rail-1.11>
Gebremariam, G., Teklehaymanot, H., & Mezgebe, G. (2024). Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Hrsg.), Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024 (S. 94-106) https://aclanthology.org/2024.rail-1.11
Gebremariam G, Teklehaymanot H, Mezgebe G. Morphological Synthesizer for Ge’ez Language: Addressing Morphological Complexity and Resource Limitations. in Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, Hrsg., Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. 2024. S. 94-106
Gebremariam, Gebrearegawi ; Teklehaymanot, Hailay ; Mezgebe, Gebregewergs. / Morphological Synthesizer for Ge’ez Language : Addressing Morphological Complexity and Resource Limitations. Proceedings of the Fifth Workshop on Resources for African Indigenous Languages: @ LREC-COLING 2024. Hrsg. / Rooweither Mabuya ; Muzi Matfunjwa ; Mmasibidi Setaka ; Menno van Zaanen. 2024. S. 94-106
Download
@inproceedings{e1c3f4f7bdca4d6eb6077694324d801b,
title = "Morphological Synthesizer for Ge{\textquoteright}ez Language: Addressing Morphological Complexity and Resource Limitations",
abstract = "Ge{\textquoteright}ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia{\textquoteright}s cultural and religious development during the Aksumite kingdom era. Ge{\textquoteright}ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge{\textquoteright}ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge{\textquoteright}ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge{\textquoteright}ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge{\textquoteright}ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.",
keywords = "Ge{\textquoteright}ez, morphological synthesizer, morphology, NLP, rule-based",
author = "Gebrearegawi Gebremariam and Hailay Teklehaymanot and Gebregewergs Mezgebe",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 ; Conference date: 25-05-2024 Through 25-05-2024",
year = "2024",
language = "English",
pages = "94--106",
editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",
booktitle = "Proceedings of the Fifth Workshop on Resources for African Indigenous Languages",

}

Download

TY - GEN

T1 - Morphological Synthesizer for Ge’ez Language

T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024

AU - Gebremariam, Gebrearegawi

AU - Teklehaymanot, Hailay

AU - Mezgebe, Gebregewergs

N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.

PY - 2024

Y1 - 2024

N2 - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

AB - Ge’ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia’s cultural and religious development during the Aksumite kingdom era. Ge’ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge’ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge’ez is a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we proposed a rule-based Ge’ez morphological synthesis to generate surface words from root words according to the morphological structures of the language. Consequently, we proposed an automatic morphological synthesizer for Ge’ez using TLM. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. Finally, we get a performance of 97.4%. This result outperforms the baseline model, suggesting that other scholars build a comprehensive system considering morphological variations of the language.

KW - Ge’ez

KW - morphological synthesizer

KW - morphology

KW - NLP

KW - rule-based

UR - http://www.scopus.com/inward/record.url?scp=85195217813&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85195217813

SP - 94

EP - 106

BT - Proceedings of the Fifth Workshop on Resources for African Indigenous Languages

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

Y2 - 25 May 2024 through 25 May 2024

ER -