• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Analisis Sentimen pada Aplikasi Halodoc Berbasis Clustering dengan Word2vec

    View/Open
    Fulltext (2.804Mb)
    Date
    2022
    Author
    Sinaga, Aisyah Fajarini
    Advisor(s)
    Amalia
    Harumy, T. Henny Febriana
    Metadata
    Show full item record
    Abstract
    This study aims to conduct a sentiment analysis of Halodoc application reviews on Google Play. The process of manually grouping user reviews based on their sentiments such as positive, negative, or neutral is not easy to do because of the large number of review datas. In order to automatically classify reviews based on their sentiments, machine learning approaches such as supervised learning and unsupervised learning are needed. The supervised learning approach for sentiment analysis has drawbacks in terms of inefficient time consumption because it requires a data labeling process for training data, as of it is less effective when applied to large amounts of data. For this reason, the sentiment analysis process in this study was carried out through an unsupervised learning approach using the clustering method with the K-Means++ algorithm as an initial centroid initialization technique on K-Means which can predict the sentiment of each unlabeled data. The review dataset used goes through a preprocessing step starting from case folding, filtering, tokenizing, to normalization. The feature extraction process into vector form in this study uses the Word2Vec method which can understand the semantic relationship between words by studying patterns from unlabeled data. Sentiment analysis test based on clustering with Word2Vec was carried out with imbalanced dataset and balanced dataset. The evaluation results obtained by testing the imbalanced dataset as many as 81,501 reviews resulted in a precision value of 95%, recall 91%, F1-Score 92% based on weighted-average parameters. Meanwhile, testing with a balanced dataset of 900, 2100, and 3000, respectively, resulted in an average accuracy of 62.4%
     
    Penelitian ini bertujuan untuk melakukan analisis sentimen terhadap ulasan aplikasi Halodoc pada Google Play. Proses pengelompokan ulasan pengguna secara manual berdasarkan sentimennya seperti positif, negatif, maupun netral tidak mudah dilakukan karena data ulasan yang biasanya berjumlah sangat banyak. Agar dapat mengelompokkan ulasan secara otomatis berdasarkan sentimennya, maka dibutuhkan pendekatan machine learning seperti supervised learning dan unsupervised learning. Pendekatan supervised learning untuk analisis sentimen memiliki kekurangan dalam hal konsumsi waktu yang kurang efisien karena membutuhkan proses pelabelan data untuk data latih, sehingga kurang efektif apabila diterapkan pada jumlah data besar. Untuk itu proses analisis sentimen dalam penelitian ini dilakukan melalui pendekatan unsupervised learning menggunakan metode clustering dengan algoritma K-Means++ sebagai teknik inisialiasi centroid awal pada K-Means yang dapat memprediksi sentimen dari setiap ulasan yang tidak memiliki label. Dataset ulasan yang digunakan melalui tahap preprocessing mulai dari case folding, filtering, tokenizing, hingga normalisasi. Proses ekstraksi fitur kedalam bentuk vektor pada penelitian ini menggunakan metode Word2Vec yang dapat memahami hubungan semantik antar kata dengan mempelajari pola dari data tidak berlabel. Pengujian analisis sentimen berbasis clustering dengan Word2Vec dilakukan dengan imbalanced dataset dan balanced dataset. Hasil evaluasi yang diperoleh dengan pengujian pada imbalanced dataset sebanyak 81.501 ulasan menghasilkan nilai precision 95%, recall 91%, F1-Score 92% berdasarkan parameter weighted-average. Sedangkan pengujian dengan balanced dataset yang masing-masing berjumlah 900, 2100, dan 3000 menghasilkan rata-rata akurasi sebesar 62,4%.

    URI
    https://repositori.usu.ac.id/handle/123456789/48512
    Collections
    • Undergraduate Theses [1180]

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV