Resource Linguistic Generation Bahasa Leukon untuk Pengembangan Mesin Translasi Bahasa Indonesia – Leukon

Husein, Yanda Aziz

Resource Linguistic Generation Bahasa Leukon untuk Pengembangan Mesin Translasi Bahasa Indonesia – Leukon

dc.contributor.advisor	Amalia
dc.contributor.advisor	Lubis, Tasnim
dc.contributor.author	Husein, Yanda Aziz
dc.date.accessioned	2025-02-04T02:58:37Z
dc.date.available	2025-02-04T02:58:37Z
dc.date.issued	2024
dc.identifier.uri	https://repositori.usu.ac.id/handle/123456789/100808
dc.description.abstract	Leukon, a language spoken by approximately 1,200 individuals on Simeulue Island, Aceh, is critically endangered due to the dominance of other languages and social changes. To preserve this language, the development of digital resources, such as machine translation systems, is considered a viable solution for documentation and educational purposes for future generations. However, this endeavor is hindered by the limited availability of parallel corpora essential for training translation models. This study aims to construct a Leukon-Indonesian parallel corpus as a linguistic resource to support machine translation development.The corpus development involved several stages, including transcription extraction, corpus normalization, and sentence structure refinement using deletion, insertion, and replacement techniques. Spelling corrections were performed by building a word dictionary as a reference and applying fuzzy matching techniques based on the Levenshtein algorithm to detect and correct errors. Optimization was further achieved by removing duplicates and employing Concatenation Augmentation techniques to enhance data diversity.The resulting parallel corpus was evaluated by training a machine translation model using a Bidirectional Long Short-Term Memory (BiLSTM) architecture with Attention mechanisms. Performance metrics, including Cosine Similarity and Quadratic Weighted Kappa (QWK), were used for evaluation. The corpus comprises 1,111 lines, achieving a Cosine Similarity score of 0.666 and a QWK score of 0.975. These findings underscore the potential of the constructed corpus to support the preservation of the Leukon language through the development of effective and sustainable machine translation systems.	en_US
dc.language.iso	id	en_US
dc.publisher	Universitas Sumatera Utara	en_US
dc.subject	Leukon Language	en_US
dc.subject	Parallel Corpus	en_US
dc.subject	Digital Resource Linguistics	en_US
dc.subject	Spelling Correction	en_US
dc.subject	Neural Machine Translation	en_US
dc.subject	Cosine Similarity	en_US
dc.subject	Quadratic Weighted Kappa	en_US
dc.title	Resource Linguistic Generation Bahasa Leukon untuk Pengembangan Mesin Translasi Bahasa Indonesia – Leukon	en_US
dc.title.alternative	Resource Linguistic Generation Leukon Language for the Development of Indonesian to Leukon Language Translation Machine	en_US
dc.type	Thesis	en_US
dc.identifier.nim	NIM191401103
dc.identifier.nidn	NIDN0121127801
dc.identifier.nidn	NIDN0121037701
dc.identifier.kodeprodi	KODEPRODI55201#Ilmu Komputer
dc.description.pages	115 Pages	en_US
dc.description.type	Skripsi Sarjana	en_US
dc.subject.sdgs	SDGs 9. Industry Innovation And Infrastructure	en_US

Files in this item

Name:: Resource Linguistic Generation ...
Size:: 528.0Kb
Format:: PDF
Description:: Cover

View/Open

Name:: Yanda Aziz Husein_Resource ...
Size:: 6.221Mb
Format:: PDF
Description:: Fulltext

View/Open

This item appears in the following Collection(s)

Undergraduate Theses [1253]
Skripsi Sarjana

Show simple item record