• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Resource Linguistic Generation Bahasa Leukon untuk Pengembangan Mesin Translasi Bahasa Indonesia – Leukon

    Resource Linguistic Generation Leukon Language for the Development of Indonesian to Leukon Language Translation Machine

    Thumbnail
    View/Open
    Cover (528.0Kb)
    Fulltext (6.221Mb)
    Date
    2024
    Author
    Husein, Yanda Aziz
    Advisor(s)
    Amalia
    Lubis, Tasnim
    Metadata
    Show full item record
    Abstract
    Leukon, a language spoken by approximately 1,200 individuals on Simeulue Island, Aceh, is critically endangered due to the dominance of other languages and social changes. To preserve this language, the development of digital resources, such as machine translation systems, is considered a viable solution for documentation and educational purposes for future generations. However, this endeavor is hindered by the limited availability of parallel corpora essential for training translation models. This study aims to construct a Leukon-Indonesian parallel corpus as a linguistic resource to support machine translation development.The corpus development involved several stages, including transcription extraction, corpus normalization, and sentence structure refinement using deletion, insertion, and replacement techniques. Spelling corrections were performed by building a word dictionary as a reference and applying fuzzy matching techniques based on the Levenshtein algorithm to detect and correct errors. Optimization was further achieved by removing duplicates and employing Concatenation Augmentation techniques to enhance data diversity.The resulting parallel corpus was evaluated by training a machine translation model using a Bidirectional Long Short-Term Memory (BiLSTM) architecture with Attention mechanisms. Performance metrics, including Cosine Similarity and Quadratic Weighted Kappa (QWK), were used for evaluation. The corpus comprises 1,111 lines, achieving a Cosine Similarity score of 0.666 and a QWK score of 0.975. These findings underscore the potential of the constructed corpus to support the preservation of the Leukon language through the development of effective and sustainable machine translation systems.
    URI
    https://repositori.usu.ac.id/handle/123456789/100808
    Collections
    • Undergraduate Theses [1181]

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV