Abstract
In this study, the effects of reducing the number of features or classes and creating new features on the model success criteria were examined. For this purpose, the parameters that will maximize the model success of the models built with k-nearest neighbor algorithm, naive bayes algorithm, support vector machine, random forest algorithm, boosting, decision trees, artificial neural networks algorithms for the selected dataset were investigated and the model with the best performance was determined. In order to determine the best model, the Wilcoxon test was applied to all algorithm results evaluated on the same layers. To provide a more balanced evaluation, weighted classification report was generated and additionaly model accuracy values were obtained by applying SMOTE. All the results obtained were evaluated together and the best performed model was determined. Visualization of the data was plotted based on the two features with the highest impact on the model. Firstly the effects of supervised and unsupervised dimensionality reduction technique, such as principal component analysis, linear discriminant analysis, neighbourhood components analysis, non-negative matrix factorization, on the model performance were examined; then a method was proposed that calculates fuzzy set number and membership degree of each feature, and adds them to the dataset, and effects of using it together with dimensionality reduction methods on model success were examined. As a result of the analysis, it was observed that applying the proposed technique to the dataset before reducing it to two dimensions using the principal component analysis enhanced performance of all algorithms.