Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning

Md. Shafiul Alam Chowdhury; Md. Farukuzzaman Khan; Mohammed Sowket Ali; Shahriar Ahmed; Md. Abdul Mannan; Md. Amanat Ullah

doi:https://doi.org/10.14445/22315381/IJETT-V73I5P121

Research Article | Open Access | Download PDF

Volume 73 | Issue 5 | Year 2025 | Article Id. IJETT-V73I5P121 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I5P121

Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning

Md. Shafiul Alam Chowdhury, Md. Farukuzzaman Khan, Mohammed Sowket Ali, Shahriar Ahmed, Md. Abdul Mannan, Md. Amanat Ullah

Received	Revised	Accepted	Published
19 Nov 2024	12 May 2025	26 May 2025	31 May 2025

Citation :

Md. Shafiul Alam Chowdhury, Md. Farukuzzaman Khan, Mohammed Sowket Ali, Shahriar Ahmed, Md. Abdul Mannan, Md. Amanat Ullah, "Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 5, pp. 241-255, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I5P121

Abstract

Speech recognition technology has already become a part of our everyday lives, and many works have been done mostly in the English language because it is an international language, but there is still more that researchers could do. Speech recognition technology has already become a part of the daily life. As can be seen, AI robots can converse with people, particularly in English. The topic of this study is speech recognition in Bangla (Bengali). To determine the highest feasible speech recognition accuracy in the Bangla (Bengali) language, several methods have been employed for pattern recognition and deep learning. Native speakers of Bangla provided the core dataset. It includes extensive experiments with Bangla phonemes, isolated words, commands, and sentences. Speech samples are subjected to feature extraction using MFCC. Simultaneously, LPC and FFT are employed. Using the maximum-likelihood approach, a multilayer feedforward deep neural network model has been utilized. A random dataset has been used to assess the model’s accuracy in speech recognition. Deep learning using a neural network model and feature extraction using MFCC outperform Power spectral testing and linear predictor coefficient tests regarding recognition outcomes. The investigation found that increasing the number of speech samples affected the recognition accuracy rate, as did the speech samples from the opposing gender.

Keywords

Automatic Voice Recognition (AVR), Deep learning, Linear Predictor Coefficient analysis (LPC), Mel Frequency Cepstral Coefficient (MFCC), Power spectral analysis (FFT), FFNN, Zero Crossing Rate (ZCR).

References

[1] Syed Akhter Hossain, M. Lutfar Rahman, and Farruk Ahmed, “Spectral Analysis of Bangla Vowels,” Pakistan Section Multitopic Conference, Karachi, Pakistan, pp. 1-5, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Ghulam Muhammad, Yousef A. Alotaibi, and Mohammad Nurul Huda, “Automatic Speech Recognition for Bangla Digits,” 12th International Conference on Computers and Information Technology, Dhaka, Bangladesh, pp. 379-383, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Umme Muslima, and M. Babul Islam, “Experimental Framework for Mel-Scaled LP Based Bangla Speech Recognition,” 16th International Conference on Computer and Information Technology, Khulna, Bangladesh, pp. 56-59, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Anup Kumar Paul, Dipankar Das, and Md. Mustafa Kamal, “Bangla Speech Recognition System Using LPC and ANN,” 7th International Conference on Advances in Pattern Recognition, Kolkata, India, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Qamrun Nahar Eity et al., “Bangla Speech Recognition using Two Stage Multi-Layer Neural Networks,” 2010 International Conference on Signal and Image Processing, Chennai, pp. 222-226, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Mohammed Rokibul Alam Kotwal et al., “Bangla Phoneme Recognition Using Hybrid Features,” International Conference on Electrical and Computer Engineering (ICECE 2010), Dhaka, Bangladesh, pp. 718-721, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Foyzul Hassan, Mohammed Rokibul Alam Kotwal, and Mohammad Nurul Huda, “Bangla Phonetic Feature Table Construction for Automatic Speech Recognition,” 16th International Conference on Computer and Information Technology, Khulna, Bangladesh, pp. 51-55, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Mohammed Rokibul Alam Kotwal et al., “Bangla Phoneme Recognition for Different Acoustic Features,” 2010 International Conference on Computer Applications and Industrial Electronics, Kuala Lumpur, Malaysia, pp. 543-547, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Rhythm Shahriar et al., A Communication Platform between Bangla and Sign Language, 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, Bangladesh, pp. 1-4, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Mohammad Mahedi Hasan et al., “Bangla Triphone Hmm Based Word Recognition,” 2010 IEEE Asia Pacific Conference on Circuits and Systems, Kuala Lumpur, Malaysia, pp. 883-886, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Md. Shahadat Hossain et al., “Evaluation of Bangla Word Recognition Performance using Acoustic Features,” International Conference on Computer Applications and Industrial Electronics, Kuala Lumpur, Malaysia, pp. 490-494, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Bulbul Ahamed et al., “Effect of Speaker Variation on the performance of Bangla ASR,” 2013 International Conference on Informatics Electronics and Vision (ICIEV), Dhaka, Bangladesh, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Nafis Sadeq et al., “Bangla Voice Command Recognition in End-To-End System Using Topic Modeling Based Contextual Rescoring,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[14] S.M. Saiful Islam Badhon et al., “State of art Research in Bengali Speech Recognition, 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Shafkat Kibria et al., “Acoustic Analysis of Accent-Specific Pronunciation Effect on Bangladeshi Bangla: A Study on Sylheti Accent,” 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh, pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Md Gulzar Hussain et al., “Classification of Bangla Alphabets Phoneme Based on Audio Features Using MLPC and SVM,” 2021 International Conference on Automation Control and Mechatronics for Industry 4.0 (ACMI), Rajshahi, Bangladesh, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Tonmoy Ghosh et al., “Formant Analysis of Bangla Vowel for Automatic Speech Recognition,” Signal and Image Processing International Journal, vol. 7, no. 5, 2016.
[Google Scholar] [Publisher Link]
[18] Biswajit Das, Sandipan Mandal, and Pabitra Mitra, “Bengali Speech Corpus for Continuous Automatic Speech Recognition System,” 2011 International Conference on Speech Database and Assessment (Oriental COCOSDA), Hsinchu, Taiwan, pp. 51-55, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[19] The Emille Corpus, Lancaster University, 2025. [Online]. Available: http://www.lancaster.ac.uk/fass/projects/corpus/emille/
[20] Muhammad Asadullah, and Shibli Nisar, “A Silence Removal and Endpoint Detection Approach for Speech Processing,” Sarhad University International Journal of Basic and Applied Sciences, vol. 4, no. 1, pp. 10-15, 2016.
[Google Scholar] [Publisher Link]
[21] John R. Deller, John H.L. Hansen, and John G. Proakiset, Discrete-Time Processing of Speech Signals, Wiley, 1999.
[Google Scholar] [Publisher Link]
[22] Marina Bosi, and Richard E. Goldberg, Introduction to Digital Audio Coding and Standards, Springer, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing, Pearson Education, 3rd ed., 2010.
[Google Scholar] [Publisher Link]
[24] R. Vergin, and D. O'Shaughnessy, “Pre-Emphasis and Speech Recognition,” Proceedings 1995 Canadian Conference on Electrical and Computer Engineering, Montreal, QC, Canada, pp. 1062-1065, 1995.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Simon Haykin, and Michael Moher, Communication Systems, John Wiley and Sons Inc., 5th ed., 2009.
[Google Scholar] [Publisher Link]
[26] Richard A. Haddad, and Thomas W. Parsons, “Digital Signal Processing: Theory, Applications and Hardware,” Computer Science Press, New York, USA, 1991.
[Google Scholar] [Publisher Link]
[27] Jean-Claude Junqua, and Jean-Paul Haton, Robustness in Automatic Speech Recognition: Fundamentals and Applications, Springer US, pp. 1-440, 1996.
[Google Scholar] [Publisher Link]
[28] Lawrence R. Rabiner, Digital Processing of Speech Signals, Prentice-Hall, 1978.
[Google Scholar]
[29] Md. Sahidullah, and Goutam Saha, “Design, Analysis and Experimental Evaluation of Block Based Transformation in MFCC Computation for Speaker Recognition,” Speech Communication, vol. 54, no. 4, pp. 543-565, 2012.
[CrossRef] [Google Scholar] [Publisher Link]