Analisis Sentimen terhadap Opini Masyarakat Mengenai Program Kerja Kota Medan Menggunakan LSTM (Long Short Term Memory) dengan Media Sosial Twitter
Abstract
This thesis aims to conduct research related to sentiment analysis of public opinion
regarding the work program of the City of Medan during the administration of the Mayor and
Deputy Mayor for the period 2020 to the present. This research uses the LSTM (Long Short
Term Memory) algorithm and a weighting method using BERT (Bidirectional Encoder
Representations From Transformers). The dataset used in this research was obtained from
Twitter using hashtags such as #MedanBerkah, #MedanBersih, #MedanKolaborasi, and
keywords such as Medan City, Medan Maju, and Medan Sejahtera. This dataset goes through
a scraping process and then a pre-processing stage, including case folding, tokenizing,
stopword removal, punctuation removal, and lemmatization. Data that has been cleaned is
called clean data. Next, the data that has been cleaned will be labeled with sentiment, which is
done automatically using the Lexicon Based method with a word piece approach. The total
labeled data consists of 3642 tweets, with 1723 having positive sentiment, 1665 negative
sentiment, and 256 neutral sentiment. The labeled data is ready for the model training process.
Before training, researchers carried out a hyperparameter search using the Optuna library to
get the best parameters. Optuna tries a combination of parameters that have been prepared and
chooses the best parameters based on the calculation results. After getting the best
hyperparameters, researchers trained the model using the LSTM algorithm and weighting using
the BERT algorithm. The training results show the lowest accuracy of 75% and the highest
accuracy of 76%. These results show that the model can classify public opinion about the
Medan City work program well, although there is still room for further development, especially
in the data labeling process. Constraints in this research include the quality of data labeling
using the Lexicon Based method, the model not understanding some slang words in the dataset,
and the need for a larger dataset for a more complex model. Additional features are also needed
in the pre-processing process, such as data normalization, to improve the quality of sentiment
analysis
Collections
- Undergraduate Theses [767]