Optimizing textual sentiment recognition through LASSO-based feature selection and ensemble voting technique

, NISHA; KUMAR, Rakesh

doi:10.14744/sigma.2025.00136

Optimizing textual sentiment recognition through LASSO-based feature selection and ensemble voting technique

NISHA ¹

, Rakesh KUMAR ¹

¹1Department of Computer Science & Applications, Kurukshetra University, Haryana, 136119, India

Sigma J Eng Nat Sci 2025; 43(6): 1915-1929 DOI: 10.14744/sigma.2025.00136

Full Text PDF

Abstract

The nuances of opinion mining across varied datasets demands robust, generic models that can efficiently handle varied emotions in a text. This work handles the stated problem by proposing a novel ensemble-based model aimed at boosting both accuracy and interpretability in sentiment analysis tasks, which is crucial for applications such as customer feedback analysis, public opinion monitoring, and review systems. This article utilizes ensemble soft voting that uses Support Vector Machine and Naive Bayes as base classifiers, leveraging state-of-art feature selection approaches such as grid search optimized LASSO and Chi-square. The rationale of using these strategies due to their proven capacity of dealing high dimensional textual data with reducing the variability.The proposed method was independently evaluated using three publicly available datasets: Sentiment140, US Airlines, and Internet Movie Database, achieving accuracies of 81.75%, 93.25%, and 93.2% respectively. The results depict the proposed model adaptability with both balanced and imbalanced datasets and its strength to identify meaning-ful features, affirming consistent performance throughout. This work innovation is in fusing ensemble method with grid search-optimized LASSO for selecting the features, surpassing current individual and ensemble models. This work pave the groundwork for future progress, encompassing the extension to larger datasets and the integration of multiple emotion.

Keywords: Grid Search; LASSO; Multidomain Textual Sentiment Recognition; Preprocessing; Random Forest; Soft Voting; SVM