2Department of Industrial Engineering, Faculty of Engineering, Adana Alparslan Türkeş Science and Technology University, Adana, 01250, Türkiye
Abstract
This study purposes examine the effect of the Principal Component Analysis method on Hierarchical Clustering techniques in terms of dimension reduction in high-dimensional data sets. The study was carried out using Principal Component Analysis and Hierarchical Clustering methods on the data sets with 22, 38, and 46 variables, created with 2020 data from the United Nations data platform. The variables of the dataset1 consist of the general information (GI) variables and the economic indicator (EI) variables of the countries and the objects of the data set consist of Africa countries. The variables of the dataset2 consist of the general information (GI) variables, the economic indicator (EI) variables and the social indicator (SI) variables of the countries and the objects of the data set consist of Europe countries. The variables of the dataset3 consist of the general information (GI) variables, the economic indicator (EI) variables, the social indicator (SI) variables and the environmental and infrastructural indicator (EII) variables of the countries and the objects of the data set consist of Asia countries. For dataset1, the mean absolute correlation value is 0.2426, and dimension reduction with PCA is decreased 22 variables to 8 variables. For dataset2, the mean absolute correlation value is 0.2346, and dimension reduction with PCA is decreased 38 variables to 10 variables. For dataset3 the mean absolute correlation value is 0.2265 and dimension reduction with PCA is decreased 46 variables to 11 variables. The results obtained from the analysis were compared and interpreted using tanglegrams and some similarity coefficients. The results of the study showed that the Principal Component Analysis method had positive effects on hierarchical clustering results and dendrograms despite low correlation and outliers. In this study, despite the outlier and noise problems of high-dimensional datasets, the facilitating role of PCA in clustering analysis is investigated.