Abstract
Business news assists as an important tool which enables business leaders and marketers to create strategic plans through their decision-making process. The large number of news articles spread across different platforms creates difficulties when trying to find important information through efficient methods. The research presents an approach which employs unsupervised learning methods to cluster business news articles for solving information overload problems. The system uses K-Means and Agglomerative clustering algorithms together with Bag-of-Words and TF-IDF and BERT embeddings to identify articles with similar content. The research evaluates clustering performance through various assessment methods which include Elbow Score and Silhouette Score and Calinski-Harabasz Index and Davies-Bouldin Index to determine which approach works best. The combination of BERT with K-Means produces better results than TF-IDF because it reaches an accuracy level of 84.46%. The pre-processing stage required tokenization and stop word removal and stemming to solve the problems which occurred when dealing with noisy data and unimportant information. The results show that BERT achieves better clustering results because it understands deep semantic meanings in text data. The research investigates both scalability and practical implementation of the system through its proposed studies about real-time system flexibility and data bias ethical problems. The proposed solution improves users’ ability to access business news which enables them to make better decisions.
