Show simple item record

dc.contributor.advisorAmalia
dc.contributor.advisorLubis, Tasnim
dc.contributor.authorHusein, Yanda Aziz
dc.date.accessioned2025-02-04T02:58:37Z
dc.date.available2025-02-04T02:58:37Z
dc.date.issued2024
dc.identifier.urihttps://repositori.usu.ac.id/handle/123456789/100808
dc.description.abstractLeukon, a language spoken by approximately 1,200 individuals on Simeulue Island, Aceh, is critically endangered due to the dominance of other languages and social changes. To preserve this language, the development of digital resources, such as machine translation systems, is considered a viable solution for documentation and educational purposes for future generations. However, this endeavor is hindered by the limited availability of parallel corpora essential for training translation models. This study aims to construct a Leukon-Indonesian parallel corpus as a linguistic resource to support machine translation development.The corpus development involved several stages, including transcription extraction, corpus normalization, and sentence structure refinement using deletion, insertion, and replacement techniques. Spelling corrections were performed by building a word dictionary as a reference and applying fuzzy matching techniques based on the Levenshtein algorithm to detect and correct errors. Optimization was further achieved by removing duplicates and employing Concatenation Augmentation techniques to enhance data diversity.The resulting parallel corpus was evaluated by training a machine translation model using a Bidirectional Long Short-Term Memory (BiLSTM) architecture with Attention mechanisms. Performance metrics, including Cosine Similarity and Quadratic Weighted Kappa (QWK), were used for evaluation. The corpus comprises 1,111 lines, achieving a Cosine Similarity score of 0.666 and a QWK score of 0.975. These findings underscore the potential of the constructed corpus to support the preservation of the Leukon language through the development of effective and sustainable machine translation systems.en_US
dc.language.isoiden_US
dc.publisherUniversitas Sumatera Utaraen_US
dc.subjectLeukon Languageen_US
dc.subjectParallel Corpusen_US
dc.subjectDigital Resource Linguisticsen_US
dc.subjectSpelling Correctionen_US
dc.subjectNeural Machine Translationen_US
dc.subjectCosine Similarityen_US
dc.subjectQuadratic Weighted Kappaen_US
dc.titleResource Linguistic Generation Bahasa Leukon untuk Pengembangan Mesin Translasi Bahasa Indonesia – Leukonen_US
dc.title.alternativeResource Linguistic Generation Leukon Language for the Development of Indonesian to Leukon Language Translation Machineen_US
dc.typeThesisen_US
dc.identifier.nimNIM191401103
dc.identifier.nidnNIDN0121127801
dc.identifier.nidnNIDN0121037701
dc.identifier.kodeprodiKODEPRODI55201#Ilmu Komputer
dc.description.pages115 Pagesen_US
dc.description.typeSkripsi Sarjanaen_US
dc.subject.sdgsSDGs 9. Industry Innovation And Infrastructureen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record