Recognition of audio source recording device using Mel frequency cepstral coefficients and recurrent neural networks

, Venkata Lalitha NARLA; SURESH, Gulivindala; GANGWAR, D. P.; PRASAD3, KRKV; BHATTACHARJEE, Rita Rani

doi:10.14744/sigma.2026.2054

Recognition of audio source recording device using Mel frequency cepstral coefficients and recurrent neural networks

Venkata Lalitha NARLA ¹

, Gulivindala SURESH ¹

, D. P. GANGWAR ²

, KRKV PRASAD3 ³

, Rita Rani BHATTACHARJEE ⁴

¹Department of Electronics and Communication Engineering, Aditya University, Surampalem, A.P, India
²Central Forensic Science Laboratory, Chandigarh, Punjab, India
³Department of Computer Science Engineering (AI & ML), Aditya University, Surampalem, A.P, India
⁴Centre for VIT Happiness and Well Being, Vellore Institute of Technology, Vellore, 632014, India

Sigma J Eng Nat Sci 2026; 44(3): 1572-1586 DOI: 10.14744/sigma.2026.2054

Full Text PDF

Abstract

Accurate identification of audio source recording devices is paramount in digital forensic in-vestigations, including topics like copyright protection, tamper detection, and audio source forensics. This work presented a novel method for learning feature representations using tem-poral audio characteristics, such as Mel Frequency Cepstral Coefficients (MFCC) and Con-stant-Q Transform (CQT), obtained from segmented acoustic features. Subsequently creates a structured representation learning model by combining Long Short-Term Memory Networks (LSTM) with Recurrent Neural Networks (RNN). This model efficiently condenses spatial information, resulting in accurate recognition, by utilizing temporal modelling and time-fre-quency representation. Audio samples were collected from four widely used mobile devices—iPhone, Realme, Vivo, and Poco—with each contributing 70 speech recordings of 10 seconds duration, totaling 280 samples. Recordings were captured in semi-controlled indoor environ-ments using standardized speech content to simulate real-world conditions. The outcomes of the experiment show an amazing degree of accuracy, with 96% in classifying four types of re-cording audio source devices. This method promises improved efficacy in a variety of forensic circumstances and represents a substantial development in audio forensic analysis. The per-formance metrics of audio source recording using CQT-RNN and MFCC-RNN are compared and compared with state-of-the-art methods. A user interface has been developed to facilitate the recognition of the source device for test audio signals using the proposed method. The entire study represents a significant breakthrough in audio forensic analysis with a powerful, precise, and easy-to-use solution to the problem of identifying audio source recording devices and highlights its possibility of extensive use in forensic practice.

Keywords: Constant-Q Transform; Digital Forensics; Mel Frequency Cepstral Coefficients, Recurrent Neural Networks, Long Short-Term Memory Networks