Abstract
The intricacy of sentiment analysis across multiple datasets necessitates robust, generalized models that can efficiently manage varied sentiment expressions. This paper addresses this difficulty by introducing an innovative ensemble-based framework aimed at enhancing both accuracy and interpretability in sentiment analysis tasks, which is crucial for applications such as customer feedback analysis, public opinion monitoring, and review systems. The system utilizes soft voting to integrate Support Vector Machine and Naive Bayes classifiers, employing sophisticated feature selection methods such as grid search optimized LASSO and Chi-square. These approaches were selected to capitalize on the advantages of individual classifiers while mitigating the variability and high dimensionality of textual characteristics. The system was assessed using three benchmark datasets: Sentiment140, US Airlines, and Internet Movie Database, attaining classification accuracies of 81.75%, 93.25%, and 93.2% respectively. The results illustrate the framework's flexibility with both balanced and imbalanced datasets and its capacity to extract significant features, guaranteeing reliable performance. This work's innovation is its hybrid ensemble method utilizing grid search-optimized LASSO for feature selection, surpassing current individual and ensemble models. This research lays the groundwork for future progress, encompassing the expansion to larger datasets and the incorporation of emotion detection.