2Dr. D. Y. Patil Institute of Technology, Pimpri, Pune, 411018, India
Abstract
Social media networking sites introduce specific difficulties to the researchers dealing with high-dimensional data. Particularly, this holds true when trying to find a typical users. Irrel-evant or redundant features can significantly reduce classifier accuracy, increasing prediction time which in turn diminishes overall model effectiveness. In this respect, feature selection techniques usually are applied to mitigate such obstacles via removing irrelevant features, which in turn increases computational efficiency, improves accuracy, and applies simpler models. However, traditional methods of feature selection are computationally expensive and often come with reduced accuracy in classification since the redundant features may not have been removed-error-prone and generally filter and wrapper-based. While hybrid approaches are more efficient, they sometimes fail to consider interactions between features effectively, which can be complex. We identify the shortcomings of these representatives and propose an optimal hybrid approach that integrates GOA with Majority Voting. Then, the two-step pro-cess starts with a feature filter based on an information-theoretic measure that selects 24 fea-tures with existing approach out of 79 for the phishing dataset in order to capture the co-evo-lutionary behavior. GOA follows the second step by applying our hybrid approach selection top 10 most optimal features from GOA, keeping in consideration both maximum relevance and minimum redundancy.The strength of this hybrid approach lies in its versatility, which has been applied successfully across different datasets. We achieved an accuracy of 99.7% on the Phishing dataset, outperforming ten benchmark feature selection methods.This yielded accuracy, which outperformed some of the results of infashion classifiers for the KDD data-set, WSN-DS dataset, CICIDS2017 dataset. Obviously, these results assure that this optimized feature set enhances not only the accuracy but also reduces prediction time significantly, hence being very efficient for real-world applications in social media analytics and beyond. This not only advances the state-of-art in feature selection but also sets the ground for further research on how to optimize classifier performance in various domains. The advantages of our hybrid approach not only come from accuracy improvements but also from computational efficiency, making it a practically applicable powerful tool in high-dimensional data analysis.