Analisis Akurasi Random Forest Menggunakan Principal Component Analysis (PCA)

Diba, Farah

View/Open

Fulltext (1.685Mb)

Date

2023

Author

Diba, Farah

Advisor(s)

Lydia, Maya Silvi

Sihombing, Poltak

Metadata

Show full item record

Abstract

Data that has high dimensions requires machine learning methods that can work faster and more effectively in the classification process. One of the algorithms that can handle complex data is Random Forest. Random Forest works by building several decision trees randomly as a reference for feature selection. However, high- dimensional data requires more storage space, resulting in a longer computation time. Therefore, Principal Component Analysis is a reliable dimension reduction method for representing high-dimensional data. PCA will form several Principal Components that contain important information from the original data. The dataset used in this study is sourced from the Kaggle Repository which consists of 3 types of datasets, namely the water quality dataset (continuous dataset), stroke disease dataset (nominal dataset), and airline satisfaction (ordinal dataset). The results of this study, Random Forest with n_estimators = 9 without reduction has the best accuracy of 95.86% in the Airline Satisfaction dataset. At n_estimators = 3, 5, 7, and 9 the accuracy decreases when reduced by PCA. So it can be concluded that without reducing the dimensions of the Random Forest, it has been able to provide the best accuracy by forming 9 n_estimators trees. This means that the more trees built on high-dimensional data, the better the resulting accuracy.

URI

https://repositori.usu.ac.id/handle/123456789/86407

Collections

Master Theses [621]