Dissecting Instrumental Acoustics by Comparing Traditional and Avant-Garde Techniques
Dissecting Instrumental Acoustics by Comparing Traditional and Avant-Garde Techniques |
||
|
||
© 2024 by IJETT Journal | ||
Volume-72 Issue-10 |
||
Year of Publication : 2024 | ||
Author : S.P Sakthidevi, C. Divya, V. Kowsalya |
||
DOI : 10.14445/22315381/IJETT-V72I10P127 |
How to Cite?
S.P Sakthidevi, C. Divya, V. Kowsalya,"Dissecting Instrumental Acoustics by Comparing Traditional and Avant-Garde Techniques," International Journal of Engineering Trends and Technology, vol. 72, no. 10, pp. 282-305, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I10P127
Abstract
An in-depth study of audio separation, delving into avant-garde and conventional methodologies for isolating musical tones. The exploration aims to investigate various techniques for isolating musical tones and extracting individual components of sound. By comparing traditional and advanced approaches, the study seeks to offer insights beneficial to researchers, educators, musicians, and composers. This paper briefly investigates conventional approaches like Non-Negative Matrix Factorization (NMF), Independent Deeply Learned Matrix Analysis (IDLMA), Independent Low Rank Matrix Analysis (ILRMA), Independent Component Analysis (ICA), and Principal Component Analysis (PCA), detailing their working principles and advantages. Also, Analysis of the numerous forms of machine learning techniques. Afterwards, it explores how the models are used to dissect the instrumental acoustics when deep learning techniques like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are applied. Additionally, coupled deep learning frameworks that include High-Resolution Long Short-Term Memory (HR-LSTM), Dense-U-Net, Wave-U-Net, Conv-tasnet, Res-U-Net and Long-term Recurrent Convolutional Network (LRCN) are analyzed. DenseLSTM and Audio Spectrogram Transformer are evaluated because the combined architecture is more efficient than the individual architecture. This Paper bridges avant-garde and conventional audio separation methodologies, offering valuable insights for various stakeholders and indicating a path towards enhanced practical applications in the field of audio separation.
Keywords
Conventional approaches, Machine learning, Deep learning, Coupled deep learning framework, Avant-Gard
References
[1] Mikkel N. Schmidt, and Morten Mørup, “Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation,” Independent Component Analysis and Blind Signal Separation, Lecture Notes in Computer Science, vol. 3889, pp. 700-707, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Daniel Lee, and H. Sebastian Seung, “Algorithms for Non-Negative Matrix Factorization,” Advances in Neural Information Processing Systems, vol. 13, pp. 1-7, 2000.
[Google Scholar] [Publisher Link]
[3] Paris Smaragdis, “Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs,” Independent Component Analysis and Blind Signal Separation, Lecture Notes in Computer Science, vol. 3195, pp. 494-499, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Alexey Ozerov, and Cédric Fevotte, “Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 550-563, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Seokjin Lee, Sang Ha Park, and Koeng-Mo Sung, “Beamspace-Domain Multichannel Nonnegative Matrix Factorization for Audio Source Separation,” IEEE Signal Processing Letters, vol. 19, no. 1, pp. 43-46, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Jianyu Wang, and Shanzheng Guan, “Multichannel Blind Speech Source Separation with a Disjoint Constraint Source Model,” Arxiv, pp. 1-5, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Naoki Makishima et al., “Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1601-1615, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Takuya Hasumi et al., “Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models,” 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Tokyo, Japan, pp. 1226-1233, 2021.
[Google Scholar] [Publisher Link]
[9] Takuya Hasumi et al., “PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2680-2694, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Shinichi Mogami et al., “Independent Low-Rank Matrix Analysis Based on Complex Student's t-Distribution for Blind Audio Source Separation,” 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing, Tokyo, Japan, pp. 1-6, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Daichi Kitamura, and Kohei Yatabe, “Consistent Independent Low-Rank Matrix Analysis for Determined Blind Source Separation,” EURASIP Journal on Advances in Signal Processing, vol. 2020, pp. 1-35, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Zahoor Uddin, Aamir Qamar, and Farooq Alam, “ICA Based Sensors Fault Diagnosis: An Audio Separation Application,” Wireless Personal Communications, vol. 118, pp. 3369-3384, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] M.R. Ezilarasan et al., Blind Source Separation in the Presence of AWGN Using ICA-FFT Algorithms a Machine Learning Process, 1st ed., Recent Trends in Computational Intelligence and its Application, CRC Press, pp. 1-9, 2023.
[Google Scholar] [Publisher Link]
[14] Zaineb H. ibrahemm, and Ammar I. Shihab, “Voice Separation and Recognition Using Machine Learning and Deep Learning a Review Paper,” Journal of Al-Qadisiyah for Computer Science and Mathematics, vol. 15, no. 3, pp. 11-34, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Harshada Burute, and P.B. Mane, “Separation of Singing Voice from Music Background,” International Journal of Computer Applications, vol. 129, no. 4, pp. 22-26, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Tomohiro Watanabe, Takanori Fujisawa, and Masaaki Ikehara, “Vocal Separation Using Improved Robust Principal Component Analysis and Post-Processing,” 2016 IEEE 59th International Midwest Symposium on Circuits and Systems, Abu Dhabi, United Arab Emirates, pp. 1-4, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Feng Li, Yujun Hu, and Lingling Wang, “Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection,” Sensors, vol. 23, no. 6, pp. 1-17, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Shrikant Venkataramani, Efthymios Tzinis, and Paris Smaragdis, “End-To-End Non-Negative Autoencoders for Sound Source Separation,” 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 116-120, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Pankaj Ramakant Kunekar et al., “Audio Feature Extraction: Foreground and Background Audio Separation Using KNN Algorithm,” International Journal of Science and Research Archive, vol. 9, no. 1, pp. 269-276, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Kilian Schulze-Forster et al., “Unsupervised Music Source Separation Using Differentiable Parametric Source Models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1276-1289, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Gaël Richard, Pierre Chouteau, and Bernardo Torres, “A Fully Differentiable Model for Unsupervised Singing Voice Separation,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, pp. 946-950, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Sangeun Kum et al., “Semi-Supervised Learning Using Teacher-Student Models for Vocal Melody Extraction,” Arxiv, pp. 1-8, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Zhepei Wang et al., “Semi-Supervised Singing Voice Separation With Noisy Self-Training,” 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, pp. 31-35, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Hazem Toutounji et al., “Learning the Sound Inventory of a Complex Vocal Skill via an Intrinsic Reward,” Science Advances, vol. 10, no. 13, pp. 1-16, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Yu Wang et al., “Few-Shot Musical Source Separation,” 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, pp. 121-125, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Aakanksha Desai et al., “Targeted Voice Separation,” International Journal of Innovative Science and Research Technology, vol. 7, no. 10, pp. 947-950, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Samiul Basir et al., “U-NET: A Supervised Approach for Monaural Source Separation,” Arabian Journal for Science and Engineering, vol. 49, pp. 12679-12691, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Tom Le Paine et al., “Fast Wavenet Generation Algorithm,” Arxiv, pp. 1-6, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Dario Rethage, Jordi Pons, and Xavier Serra, “A Wavenet for Speech Denoising,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5069-5073, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Gao Huang et al., “Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognitio, Honolulu, HI, USA, pp. 2261-2269, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Naoya Takahashi, and Yuki Mitsufuji, “Multi-Scale Multi-Band Densenets for Audio Source Separation,” 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, pp. 21-25, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Abhimanyu Sahai, Romann Weber, and Brian McWilliams, “Spectrogram Feature Losses for Music Source Separation,” 2019 27th European Signal Processing Conference, A Coruna, Spain, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Woon-Haeng Heo, Hyemi Kim, and Oh-Wook Kwon, “Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents,” Applied Sciences, vol. 10, no. 5, pp. 1-18, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Guoqing Li et al., “Efficient Densely Connected Convolutional Neural Networks,” Pattern Recognition, vol. 109, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Cem Subakan et al., “Attention is All You Need in Speech Separation,” 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, pp. 21-25, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Simon Rouard, Francisco Massa, and Alexandre Défossez, “Hybrid Transformers for Music Source Separation,” 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, pp. 1-5, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Jiale Qian et al., “Stripe-Transformer: Deep Stripe Feature Learning for Music Source Separation,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, pp. 1-13, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Yongwei Gao, Xulong Zhang, and Wei Li, “Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation,” Electronics, vol. 10, no. 3, pp. 1-14, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Jingdong Wang et al., “Deep High-Resolution Representation Learning for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3349-3364, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Sasha Targ, Diogo Almeida, and Kevin Lyman, “Resnet in Resnet: Generalizing Residual Architectures,” Arxiv, pp. 1-7, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Gino Brunner et al., “Monaural Music Source Separation Using a ResNet Latent Separator Network,” 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, Portland, OR, USA, pp. 1124-1131, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Tsubasa Ochiai et al., “Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-Domain Beamformer,” 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 6384-6388, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Alfian Wijayakusuma et al., “Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN),” Procedia Computer Science, vol. 179, pp. 762-772, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Satvik Venkatesh et al., “Real-Time Low-Latency Music Source Separation Using Hybrid Spectrogram-Tasnet,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, pp. 611-615, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[45] Vanshaj Agrawal, and Sunil Karamchandani, “Audio Source Separation as Applied to Vocals-Accompaniment Extraction,” e-Prime - Advances in Electrical Engineering, Electronics and Energy, vol. 5, pp. 1-8, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[46] Rawad Melhem, Assef Jafar, and Riad Hamadeh, “Improving Deep Attractor Network by BGRU and GMM for Speech Separation,” Journal of Harbin Institue of Technology, vol. 28, no. 3, pp. 90-96, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[47] Bhuwan Bhattarai et al., “High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation,” Circuits, Systems, and Signal Processing, vol. 42, pp. 1083-1104, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[48] Yi Luo, and Rongzhi Gu, “Improving Music Source Separation with Simo Stereo Band-Split Rnn,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, pp. 426-430, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[49] Xulong Zhang et al., “Research on Singing Voice Detection Based on a Long-Term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing,” Electronics, vol. 9, no. 9, pp. 1-23, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[50] Daniel Stoller, Sebastian Ewert, and Simon Dixon, “Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation,” Arxiv, pp. 1-7, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[51] Alice Cohen-Hadria, Axel Roebel, and Geoffroy Peeters, “Improving Singing Voice Separation Using Deep U-Net and Wave-U-Net with Data Augmentation,” 2019 27th European Signal Processing Conference, A Coruna, Spain, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[52] Jung-Hee Kim, and Joon-Hyuk Chang, “Attention Wave-U-Net for Acoustic Echo Cancellation,” Interspeech, Shanghai, China, pp. 3969-3973, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[53] Tomohiko Nakamura, and Hiroshi Saruwatari, “Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform,” 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 386-390, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[54] Vasiliy Kuzmin et al., “Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement,” Arxiv, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[55] Yuzhou Liu et al., “Voice and Accompaniment Separation in Music Using Self-Attention Convolutional Neural Network,” Arxiv, pp. 1-5, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[56] Shun Takeda, and Shuichi Arai, “Music Source Separation Using Deform-Conv Dense U-Net,” 2021 3rd International Conference on Cybernetics and Intelligent System, Makasar, Indonesia, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[57] Bahareh Tolooshams et al., “Channel-Attention Dense U-Net for Multichannel Speech Enhancement,” 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 836-840, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[58] Thomas Sgouros, Angelos Bousis, and Nikolaos Mitianoudis, “An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation,” IEEE Access, vol. 10, pp. 119448-119459, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[59] DaDong Wang, Jie Wang, and MingChen Sun, “3 Directional Inception-ResUNet: Deep Spatial Feature Learning for Multichannel Singing Voice Separation with Distortion,” Plos One, vol. 19, no. 1, pp. 1-17, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[60] Naoya Takahashi, Nabarun Goswami, and Yuki Mitsufuji, “Mmdenselstm: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation,” 2018 16th International Workshop on Acoustic Signal Enhancement, Tokyo, Japan, pp. 106-110, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[61] Woon-Haeng Heo, Hyemi Kim, and Oh-Wook Kwon, “Integrating Dilated Convolution into DenseLSTM for Audio Source Separation,” Applied Sciences, vol. 11, no. 2, pp. 1-19, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[62] Yi Luo, and Nima Mesgarani, “Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256-1266, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[63] Berkan Kadıoğlu et al., “An Empirical Study of Conv-Tasnet,” 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 7264-7268, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[64] Alexandre Défossez et al., “Music Source Separation in the Waveform Domain,” Arxiv, pp. 1-16, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[65] Xiaoman Qiao et al., “VAT-SNet: A Convolutional Music-Separation Network Based on Vocal and Accompaniment Time-Domain Features,” Electronics, vol. 11, no. 24, pp. 1-17, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[66] Umut Isik et al., “PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss,” Arxiv, pp. 1-5, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[67] Florian Strub et al., “FiLM: Visual Reasoning with a General Conditioning Layer,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 3942-3951, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[68] Andreas Jansson et al., “Singing Voice Separation with Deep U-Net Convolutional Networks,” 18th International Society for Music Information Retrieval Conference, Suzhou, China, pp. 745-751, 2017.
[Google Scholar] [Publisher Link]
[69] Aäron van den Oord et al., “WaveNet: A Generative Model for Raw Audio,” Arxiv, pp. 1-15, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[70] Karol J. Piczak, “ESC: Dataset for Environmental Sound Classification,” Proceedings of the 23rd ACM international Conference on Multimedia, New York, United States, pp. 1015-1018, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[71] Andrew Varga, and Herman J.M. Steeneken, “Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems,” Speech Communication, vol. 12, no. 3, pp. 247-251, 1993.
[CrossRef] [Google Scholar] [Publisher Link]
[72] Jiaqi Gu et al., “Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, pp. 12084-12093, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[73] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-Excitation Networks,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 7132-7141, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[74] Yi Luo et al., “Rethinking The Separation Layers In Speech Separation Networks,” 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[75] Kaiming He et al., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770-778, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[76] Mehrez Souden, Jacob Benesty, and SofiÈne Affes, “On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 2, pp. 260-276, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[77] Yuan Gong, Yu-An Chung, and James Glass, “AST: Audio Spectrogram Transformer,” Arxiv, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[78] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” Arxiv, pp. 1-14, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[79] Kaiming He et al., “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1026-1034, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[80] Ke Tan, Jitong Chen, and DeLiang Wang, “Gated Residual Networks with Dilated Convolutions for Supervised Speech Separation,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, pp. 21-25, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[81] Siddique Latif et al., “A Survey on Deep Reinforcement Learning for Audio-Based Applications,” Artificial Intelligence Review, vol. 56, pp. 2193-2240, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[82] Yong Xu et al., “Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement,” Arxiv, pp. 1-5, 2017.
[CrossRef] [Google Scholar] [Publisher Link]