Implementasi Arsitektur EfficientNetV2-Transformer pada Aplikasi Image Captioning Bahasa Indonesia

Sinulingga, Muhammad Teguh

Implementation of EfficientNetV2-Transformer Architecture for Indonesian Image Captioning Application

View/Open

Cover (679.6Kb)

Fulltext (4.201Mb)

Date

2024

Author

Sinulingga, Muhammad Teguh

Advisor(s)

Amalia

Jaya, Ivan

Metadata

Show full item record

Abstract

Image captioning is a task that combines computer vision, natural language processing (NLP), and machine learning. In this task, the model not only needs to recognize objects or scenes in the image, but also needs to be able to describe the relationships between them. Image captioning has various use case, such as adding titles to news images, creating descriptions for medical images, supporting text-based image search, providing image information for visually impaired users, and facilitating interaction between humans and robots. Currently, research on image captioning in Bahasa Indonesia using a combination of CNN-Transformer architectures is still limited. Recent research shows that one of the CNN families, EfficientNetV2, as a development from EfficientNet, has good performance in image feature extraction. In addition, the Transformer architecture has been widely used in NLP-based tasks such as machine translation. However, until now there has been no study that develops an image captioning system in Bahasa Indonesia using a combination of these two architectures. This research aims to develop an image captioning system that can generate image descriptions in Bahasa Indonesia. The test results show that the developed model is able to achieve the best BLEU-1, BLEU-2, BLEU-3, and BLEU-4 metric scores of {0.6028, 0.3547, 0.2247, 0.1572} respectively. This study also found that the use of EfficientNetV2 at small scale and medium scale resulted in different image descriptions and varied evaluation scores.

URI

https://repositori.usu.ac.id/handle/123456789/95953

Collections

Undergraduate Theses [1254]