Analisis Akurasi Random Forest Menggunakan Principal Component Analysis (PCA)
View/ Open
Date
2023Author
Diba, Farah
Advisor(s)
Lydia, Maya Silvi
Sihombing, Poltak
Metadata
Show full item recordAbstract
Data that has high dimensions requires machine learning methods that can work faster and more effectively in the classification process. One of the algorithms that can handle complex data is Random Forest. Random Forest works by building several decision trees randomly as a reference for feature selection. However, high- dimensional data requires more storage space, resulting in a longer computation time. Therefore, Principal Component Analysis is a reliable dimension reduction method for representing high-dimensional data. PCA will form several Principal Components that contain important information from the original data. The dataset used in this study is sourced from the Kaggle Repository which consists of 3 types of datasets, namely the water quality dataset (continuous dataset), stroke disease dataset (nominal dataset), and airline satisfaction (ordinal dataset). The results of this study, Random Forest with n_estimators = 9 without reduction has the best accuracy of 95.86% in the Airline Satisfaction dataset. At n_estimators = 3, 5, 7, and 9 the accuracy decreases when reduced by PCA. So it can be concluded that without reducing the dimensions of the Random Forest, it has been able to provide the best accuracy by forming 9 n_estimators trees. This means that the more trees built on high-dimensional data, the better the resulting accuracy.
Collections
- Master Theses [621]