Sentiment Analysis pada Teks Bahasa Indonesia Menggunakan Support Vector Machine (SVM) dan K-Nearest Neighbor (K-NN)

Lidya, Syahfitri Kartika

View/Open

Fulltext (1.788Mb)

Date

2014

Author

Lidya, Syahfitri Kartika

Advisor(s)

Sitompul, Opim Salim

Efendi, Syahril

Metadata

Show full item record

Abstract

Sentiment analysis is the process of analyzing, understanding, and classifying opinions, evaluation, assessment, attitudes, and emotions to an entity such as products, services, organizations, individuals, events, topics, automatically to obtain the information. This study uses Indonesian text contained in the website in the form of news articles, then the K-Nearest Neighbor method will classify directly to the learning data in order to determine the model that will be established by the Support Vector Machine method for determining the category of the new data to be determined categories of textual, the class of sentiment is positive, negative and neutral. Based on the test results, that influence the value of k in the k-fold cross validation is too small resulting in low accuracy, while too large values of k produce great accuracy value, then the value of k on the Influence of K-NN to accuracy, if n has an accuracy low when the value of k is small. This is because, the incoming data on the k nearest neighbor too little and can not represent a class on test data.

Analisis Sentimen adalah proses menganalisis, memahami, dan mengklasifikasi pendapat, evaluasi, penilaian, sikap, dan emosi terhadap suatu entitas seperti produk, jasa, organisasi, individu, peristiwa, topik, secara otomatis untuk mendapatkan informasi. Penelitian ini menggunakan teks Bahasa Indonesia yang terdapat di website berupa artikel berita, kemudian metode K-Nearest Neighbor akan mengklasifikasi secara langsung pada data pembelajaran agar dapat menentukan model yang akan dibentuk oleh metode Support Vector Machine untuk menentukan kategori dari data baru yang ingin ditentukan kategori tekstual, yaitu kelas sentimen positif, negatif dan netral. Berdasarkan seluruh hasil pengujian, bahwa pengaruh nilai k pada k-fold cross validation yang terlalu kecil menghasilkan akurasi yang rendah, sedangkan nilai k yang terlalu besar menghasilkan nilai akurasi yang besar, kemudian Pengaruh nilai k pada K-NN terhadap akurasi, jika n memiliki akurasi rendah pada saat nilai k kecil. Hal ini dikarenakan, data yang masuk pada k tetangga terdekat terlalu sedikit dan belum bisa merepresentasikan kelas pada data uji.

URI

http://repositori.usu.ac.id/handle/123456789/42877

Collections

Master Theses [627]