2Department of Mathematical Engineering, Istanbul Technical University, Istanbul, 34469, Türkiye
3Department of Mathematics, Yildiz Technical University, Istanbul, 34220, Türkiye
Abstract
Training datasets are not the only elements affecting the overall prediction system; data mining parameters also have effects on the implementation processes that need to be taken into account. The purpose of this research is to investigate the influence of the main characteristics of the most used data mining approaches on anemia prediction. In this context, for the K-Nearest Neighbour (K-NN) approach, it is critical to define the k-value to specify the number of points used to measure the distance between various types of classes. Furthermore, the Local Weighted Learning (LWL) has a kernel value that specifies the width of the search process used to generate the LWL weight function. The Sequential Minimal Optimization (SMO) has an n-tuple alpha value that is determined by the training data in order to meet the Kraush Kuhh Tucker (KKT) condition and speed up the prediction process. When a superior choice is optimized for each strategy, these data mining methods are shown to produce high-performance predictions. It has also been noticed that the number of features and dataset size have an impact on the performance of these methods. In this study, feature selection methods and mining methods are compared in terms of appropriate selection of parameters and dependency on dataset information. The methods pro-posed here have predicted anemia more accurately than prior versions of each method. For the applied dataset, the features are reduced from 11 to 8. In addition to this feature reduction and parameter selections of a good method, i.e. K-NN, has an increase of about 3.8% in prediction performances based on the proposed model.