Analisis Optimasi Klasterisasi Dokumen Berbahasa Indonesia dengan Metode Pembobotan Term Frequency-Inverse Documnet Frequency dan Latent Semantic Analysisserta Klastering dengan Algoritma K-Means, K-Means++, dan Agglomerative

Huda, Muhammad Miftahul

Analisis Optimasi Klasterisasi Dokumen Berbahasa Indonesia dengan Metode Pembobotan Term Frequency-Inverse Documnet Frequency dan Latent Semantic Analysisserta Klastering dengan Algoritma K-Means, K-Means++, dan Agglomerative

dc.contributor.advisor	Lydia, Maya Silvi
dc.contributor.advisor	Amalia
dc.contributor.author	Huda, Muhammad Miftahul
dc.date.accessioned	2018-04-20T01:59:02Z
dc.date.available	2018-04-20T01:59:02Z
dc.date.issued	2017
dc.identifier.uri	http://repositori.usu.ac.id/handle/123456789/2228
dc.description.abstract	The development of technology makes everything that are manual to be convert into digital form. Such as documents in paper form to be digital to save space, easy to carry, accessible, and also quick/fast in data storage. The more digital documents are stored in a storage can make the data search processing longer. One of the ways to handle it is to group (cluster) documents that are related or have the same subject. Clustering itself is divided into two types, they are flat clustering and hierarchy clustering. The K-Means and K-Means++ algorithms are (included in) flat clustering and Agglomerative Algorithms belonging to the hierarchy clustering. For Agglomerative Algorithm alone there are three methods that can be done to find the value of distance between clusters, there are Single Link, Complete Link, and Average Link. Furthermore, clustering results are influenced by the process before the document clustered, the initial process and weighting. The initial process here is an equalization into small letters, the conversion into the root words, and the synonyms of synonymous words. Meanwhile, the weighting process in the form of weighting vector documents with Term Frequency - Inverted Document Frequency and Latent Semantic Analysis. From the given problem then analyzed which type of clustering is better and what is the influence of the initial process and weighting on the document against the cluster. Implementation is done by using Java programming language. The results of the study show that the initial process and weighting make the cluster better. And the K-Means ++ algorithm is better than K-Means and Agglomerative in terms of data clustering. The complexity of K-Means, K-Means ++, and Agglomerative algorithms is θ(n2). This means that the number of documents used is directly quadratic proportional to the processing time.	en_US
dc.description.abstract	Semakin berkembangnya teknologi membuat semua hal yang bersifat manual perlahan-lahan berubah menjadi dalam bentuk digital. Seperti halnya dokumen-dokumen yang sebelumnya berbentuk kertas menjadi berbentuk digital untuk menghemat ruang, mudah dibawa dan diakses, serta cepat dalam penyimpanan data. Semakin banyaknya dokumen-dokumen digital yang tersimpan dalam suatu ruang penyimpanan dapat membuat proses pencarian data semakin lama. Salah satu cara untuk menanganinya adalah dengan mengelompokkan (mengklaster) dokumen-dokumen yang saling berkaitan atau mempunyai bahasan yang sama. Klastering sendiri dibagi menjadi dua jenis, yaitu flat klastering dan hirarki klastering. Algoritma K-Means dan K-Means++ termasuk bagian dari flat klastering dan Algoritma Agglomerative termasuk ke dalam hirarki klastering. Untuk Algoritma Agglomerative sendiri ada tiga metode yang dapat dilakukan untuk mencari nilai jarak antar klaster, yaitu Single Link, Complete Link, dan Average Link. Selanjutnya hasil klastering dipengaruhi oleh proses sebelum dokumen diklaster yaitu proses awal dan pembobotan. Proses awal disini berupa penyamaan ke dalam huruf kecil, pengubahan ke dalam kata dasar, dan penyamaan kata yang bersinonim. Sedangkan proses pembobotan berupa membobot dokumen vektor dengan Term Frequency – Inverted Document Frequency dan Latent Semantic Analysis. Dari permasalahan diatas maka dianalisislah jenis klastering mana yang lebih baik serta apa pengaruh dari proses awal dan pembobotan pada dokumen terhadap hasil klaster. Implementasi dilakukan dengan menggunakan bahasa pemrograman Java. Hasil dari penelitian menunjukkan bahwa proses awal dan pembobotan membuat hasil klaster menjadi lebih baik. Dan algortima K-Means++ lebih baik daripada K-Means dan Agglomerative dalam hal pengklasteran data. Kompleksitas algoritma K-Means, K-Means++, dan Agglomerative adalah θ(n2). Hal tersebut berarti jumlah dokumen yang digunakan berbanding lurus kuadratik dengan waktu proses.	en_US
dc.language.iso	id	en_US
dc.subject	Clustering	en_US
dc.subject	Latent Semantic Analysis	en_US
dc.subject	Term Frequency – Inverted Document Frequency	en_US
dc.subject	K-Means	en_US
dc.subject	K-Means++	en_US
dc.subject	Agglomerative	en_US
dc.title	Analisis Optimasi Klasterisasi Dokumen Berbahasa Indonesia dengan Metode Pembobotan Term Frequency-Inverse Documnet Frequency dan Latent Semantic Analysisserta Klastering dengan Algoritma K-Means, K-Means++, dan Agglomerative	en_US
dc.type	Thesis	en_US
dc.identifier.nim	NIM121401046	en_US
dc.identifier.submitter	Nurhusnah Siregar
dc.description.type	Skripsi Sarjana	en_US

Files in this item

Name:: 121401046.pdf
Size:: 3.731Mb
Format:: PDF
Description:: fulltext

View/Open

This item appears in the following Collection(s)

Undergraduate Theses [1253]
Skripsi Sarjana

Show simple item record