dc.contributor.advisor | Mawengkang, Herman | |
dc.contributor.advisor | Nababan, Erna Budhiarti | |
dc.contributor.author | Huliman | |
dc.date.accessioned | 2021-09-16T06:46:04Z | |
dc.date.available | 2021-09-16T06:46:04Z | |
dc.date.issued | 2013 | |
dc.identifier.uri | http://repositori.usu.ac.id/handle/123456789/43496 | |
dc.description.abstract | The development of modern database technology has enabled large space of storage
and this concept has become the background of the data mining applications. One of
the main functions of data mining is the classification that is used to predict the class
and generate information based on historical data. In the classification, there is a lot of
algorithms that can be used to process the input into the desired output, thus it is very
important to observe and measure the performance of each algorithm. The purpose of
this research is to analyze and compare the performance of decision tree (C4.5) and k-
Nearest Neighbor (k-NN) algorithm from the point of view of accuracy. Data sets are
derived from UCI data sets, namely BreastCancer, Car, Diabetes, Ionosphere, and Iris.
The evaluation method used in both kinds of algorithms is 10-fold cross validation.
Evaluation result for each algorithm is a confusion matrix for measuring the precision,
recall, F-measure, and success rate. Comparative analysis of the accuracy showed that
the accuracy of the decision tree algorithm is better by variation of 2.28% - 2.5%
compared to k-NN algorithm in the implementation for 5 research data sets. | en_US |
dc.description.abstract | Perkembangan teknologi basis data modern telah memungkinkan ruang penyimpanan
yang besar dan hal ini menjadi latar belakang dikembangkannya konsep data mining.
Salah satu fungsi utama data mining adalah fungsi klasifikasi yang digunakan untuk
memprediksi kelas dan menghasilkan informasi berdasarkan data historis. Pada fungsi
klasifikasi, terdapat banyak algoritma yang dapat digunakan untuk mengolah input
menjadi output yang diinginkan, sehingga harus diperhatikan aspek performance dari
masing-masing algoritma tersebut. Tujuan penelitian ini adalah untuk menganalisis
dan membandingkan performance algoritma klasifikasi pohon keputusan (C4.5) dan
k-Nearest Neighbor (k-NN) dari sudut pandang akurasi. Data sets penelitian berasal
dari UCI data sets, yaitu BreastCancer, Car, Diabetes, Ionosphere, dan Iris. Adapun
metode evaluasi yang digunakan pada kedua macam algoritma adalah 10-fold cross
validation. Hasil evaluasi berupa confusion matrix untuk penilaian precision, recall,
F-measure, dan success rate. Hasil analisis perbandingan akurasi menunjukkan bahwa
nilai keakuratan algoritma pohon keputusan lebih baik dengan variasi 2.28% - 2.5%
dibandingkan algoritma k-NN pada implementasi terhadap 5 data sets penelitian. | en_US |
dc.language.iso | id | en_US |
dc.publisher | Universitas Sumatera Utara | en_US |
dc.subject | Klasifikasi | en_US |
dc.subject | Pohon Keputusan | en_US |
dc.subject | k-NN | en_US |
dc.subject | 10-fold Cross Validation | en_US |
dc.subject | Confusion Matrix | en_US |
dc.subject | Akurasi | en_US |
dc.title | Analisis Akurasi Algoritma Pohon Keputusan dan K-Nearest Neighbor (K-Nn) | en_US |
dc.type | Thesis | en_US |
dc.identifier.nim | NIM117038025 | |
dc.description.pages | 125 Halaman | en_US |
dc.description.type | Tesis Magister | en_US |