• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Implementasi Arsitektur EfficientNetV2-Transformer pada Aplikasi Image Captioning Bahasa Indonesia

    Implementation of EfficientNetV2-Transformer Architecture for Indonesian Image Captioning Application

    Thumbnail
    View/Open
    Cover (679.6Kb)
    Fulltext (4.201Mb)
    Date
    2024
    Author
    Sinulingga, Muhammad Teguh
    Advisor(s)
    Amalia
    Jaya, Ivan
    Metadata
    Show full item record
    Abstract
    Image captioning is a task that combines computer vision, natural language processing (NLP), and machine learning. In this task, the model not only needs to recognize objects or scenes in the image, but also needs to be able to describe the relationships between them. Image captioning has various use case, such as adding titles to news images, creating descriptions for medical images, supporting text-based image search, providing image information for visually impaired users, and facilitating interaction between humans and robots. Currently, research on image captioning in Bahasa Indonesia using a combination of CNN-Transformer architectures is still limited. Recent research shows that one of the CNN families, EfficientNetV2, as a development from EfficientNet, has good performance in image feature extraction. In addition, the Transformer architecture has been widely used in NLP-based tasks such as machine translation. However, until now there has been no study that develops an image captioning system in Bahasa Indonesia using a combination of these two architectures. This research aims to develop an image captioning system that can generate image descriptions in Bahasa Indonesia. The test results show that the developed model is able to achieve the best BLEU-1, BLEU-2, BLEU-3, and BLEU-4 metric scores of {0.6028, 0.3547, 0.2247, 0.1572} respectively. This study also found that the use of EfficientNetV2 at small scale and medium scale resulted in different image descriptions and varied evaluation scores.
    URI
    https://repositori.usu.ac.id/handle/123456789/95953
    Collections
    • Undergraduate Theses [1254]

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV