Abstract
This study addresses the critical challenge of handling missing data in time series analysis, which is maintaining the accuracy and reliability of financial forecasting and other predictive models. The study aims to assess various imputation techniques' and estimation methods’ performance. The purpose of using imputed data is to enhance the robustness and accuracy of time series analyses, especially when dealing with incomplete datasets. We compared eight different imputation methods to identify the most effective approach. We also compared the performance of the Transformer model, Autoregressive Integrated Moving Average, and Generalized Autoregressive Conditional Heteroskedasticity methods in time series analysis using both complete and imputed datasets. The study employed a comprehensive approach, utilizing the Transformer model, Autoregressive Integrated Moving Average, and Generalized Autoregressive Conditional Heteroskedasticity for time series analysis. Eight imputation methods—last observation carried forward, next observation carried backward, mean imputation, linear interpolation, seasonal decomposition, moving average, regression imputation, and Kalman filtering—were evaluated. Monte Carlo simulations and an application were conducted on generated and real data-driven datasets with different proportions of missing data to assess the performance of these methods. The findings suggest that imputation techniques, such as mean imputation, considered conventional, and Kalman filtering, can significantly enhance the accuracy of time series models, particularly when integrated with innovative models like the Transformer. Moreover, the last observation carried forward, seasonal decomposition, and moving average did not provide better results in any scenario. Simulation-based synthetic data and application-based real data also revealed that the Transformer model outperformed traditional methods in scenarios with complete data (the original dataset) and new datasets generated through imputation at different rates. The results obtained from the real data-driven application support the findings from the simulation results. In addition to the simulation findings, the application results show that mean imputation performs well in cases with low levels of imputation, while Kalman filtering proves more successful when imputing a high proportion of missing data. This work goes beyond previous studies by systematically comparing a wide range of imputation methods within a unified framework, incorporating both traditional and modern time series models. A comprehensive evaluation of estimation techniques and imputation strategies applicable to time series analysis is presented, exploring appropriate combinations of estimation methods and imputation techniques.