Analisis Sentimen terhadap Opini Masyarakat Mengenai Program Kerja Kota Medan Menggunakan LSTM (Long Short Term Memory) dengan Media Sosial Twitter

Metta, Roshan Ram

View/Open

Cover (227.3Kb)

Fulltext (2.913Mb)

Date

2023

Author

Metta, Roshan Ram

Advisor(s)

Sharif, Amer

Amalia

Metadata

Show full item record

Abstract

This thesis aims to conduct research related to sentiment analysis of public opinion regarding the work program of the City of Medan during the administration of the Mayor and Deputy Mayor for the period 2020 to the present. This research uses the LSTM (Long Short Term Memory) algorithm and a weighting method using BERT (Bidirectional Encoder Representations From Transformers). The dataset used in this research was obtained from Twitter using hashtags such as #MedanBerkah, #MedanBersih, #MedanKolaborasi, and keywords such as Medan City, Medan Maju, and Medan Sejahtera. This dataset goes through a scraping process and then a pre-processing stage, including case folding, tokenizing, stopword removal, punctuation removal, and lemmatization. Data that has been cleaned is called clean data. Next, the data that has been cleaned will be labeled with sentiment, which is done automatically using the Lexicon Based method with a word piece approach. The total labeled data consists of 3642 tweets, with 1723 having positive sentiment, 1665 negative sentiment, and 256 neutral sentiment. The labeled data is ready for the model training process. Before training, researchers carried out a hyperparameter search using the Optuna library to get the best parameters. Optuna tries a combination of parameters that have been prepared and chooses the best parameters based on the calculation results. After getting the best hyperparameters, researchers trained the model using the LSTM algorithm and weighting using the BERT algorithm. The training results show the lowest accuracy of 75% and the highest accuracy of 76%. These results show that the model can classify public opinion about the Medan City work program well, although there is still room for further development, especially in the data labeling process. Constraints in this research include the quality of data labeling using the Lexicon Based method, the model not understanding some slang words in the dataset, and the need for a larger dataset for a more complex model. Additional features are also needed in the pre-processing process, such as data normalization, to improve the quality of sentiment analysis

URI

https://repositori.usu.ac.id/handle/123456789/90180

Collections

Undergraduate Theses [873]