Examining dimensionality reduction effect of principal component analysis via hierarchical clustering techniques

ŞAN, Yağmur; GÖÇKEN, Tolunay

doi:10.14744/sigma.2025.00154

Examining dimensionality reduction effect of principal component analysis via hierarchical clustering techniques

Yağmur ŞAN ¹

, Tolunay GÖÇKEN ²

¹Department of Industrial Engineering, Faculty of Engineering, Erciyes University, Kayseri, 38039, Türkiye
²Department of Industrial Engineering, Faculty of Engineering, Adana Alparslan Türkeş Science and Technology University, Adana, 01250, Türkiye

Sigma J Eng Nat Sci 2025; 43(5): 1607-1627 DOI: 10.14744/sigma.2025.00154

Full Text PDF

Abstract

This study purposes examine the effect of the Principal Component Analysis method on Hierarchical Clustering techniques in terms of dimension reduction in high-dimensional data sets. The study was carried out using Principal Component Analysis and Hierarchical Clustering methods on the data sets with 22, 38, and 46 variables, created with 2020 data from the United Nations data platform. The variables of the dataset1 consist of the general information (GI) variables and the economic indicator (EI) variables of the countries and the objects of the data set consist of Africa countries. The variables of the dataset2 consist of the general information (GI) variables, the economic indicator (EI) variables and the social indicator (SI) variables of the countries and the objects of the data set consist of Europe countries. The variables of the dataset3 consist of the general information (GI) variables, the economic indicator (EI) variables, the social indicator (SI) variables and the environmental and infrastructural indicator (EII) variables of the countries and the objects of the data set consist of Asia countries. For dataset1, the mean absolute correlation value is 0.2426, and dimension reduction with PCA is decreased 22 variables to 8 variables. For dataset2, the mean absolute correlation value is 0.2346, and dimension reduction with PCA is decreased 38 variables to 10 variables. For dataset3 the mean absolute correlation value is 0.2265 and dimension reduction with PCA is decreased 46 variables to 11 variables. The results obtained from the analysis were compared and interpreted using tanglegrams and some similarity coefficients. The results of the study showed that the Principal Component Analysis method had positive effects on hierarchical clustering results and dendrograms despite low correlation and outliers. In this study, despite the outlier and noise problems of high-dimensional datasets, the facilitating role of PCA in clustering analysis is investigated.

Keywords: Baker’s Gamma Correlation Coefficient; Cophenetic Correlation Coefficient Dimension Reduction; F_M Index; Hierarchical Cluster Analysis, Principal Component Analysis