تشخیص گوینده با شبکه های عصبی کانولوشنال و تئوری نتروسافیک

نوع مقاله : مقاله برق

نویسندگان

1 استادیار، دانشکده مهندسی، دانشگاه یاسوج.

2 استادیار، دانشکده مهندسی، دانشگاه لرستان.

چکیده

تشخیص گوینده، فرآیند تشخیص افراد بر اساس صوت آنها است که در کاربردهای زیادی مورد استفاده قرار می‌گیرد. اگرچه تاکنون تحقیقات زیادی در زمینه‌ی تشخیص گوینده صورت گرفته است، اما چالش‌هایی وجود دارد که هنوز حل نشده‌اند. در این مقاله به منظور بهبود دقت سیستم‌های تشخیص گوینده از نتروسافیک و شبکه‌های عصبی کانولوشنال بهره گرفته شده است. در روش پیشنهادی، ابتدا اسپکتروگرام سیگنال صوتی تشکیل می‌گردد سپس اسپکتروگرام به فضای نتروسافیک منتقل می‌شود. در مرحله‌ی بعد عملگرهای بهبود بتا به مجموعه‌های نتروسافیک اعمال می‌شود و این عملیات تا ثابت شدن آنتروپی مجموعه‌های نتروسافیک تکرار می‌گردد. در نهایت یک مدل شبکه‌ی عصبی کانولوشنال برای طبقه‌بندی هیستوگرام پیشنهاد می‌شود. برای ارزیابی و تحلیل روش پیشنهادی از دو پایگاه داده‌ی Aurora2 و TIMIT استفاده شده است. روش پیشنهادی روی پایگاه داده‌ی Aurora2 به دقت 79/93 درصد و روی پایگاه داده‌ی TIMIT به دقت 24/95 درصد دست یافته است که در مقایسه با روش‌های رقیب عملکرد بهتری داشته است.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Speaker Recognition Using Convolutional Neural Network and Neutrosophic

نویسندگان [English]

  • Sadegh Fadaei 1
  • Abdolreza Rashno 2
  • Abdolsamad Hamidi 2
1 Department of Electrical Engineering, Faculty of Engineering, Yasouj University, Yasouj, Iran
2 Department of Computer Engineering, Engineering Faculty, Lorestan University, Khorramabad, Iran
چکیده [English]

Speaker recognition is a process of recognizing persons based on their voice which is widely used in many applications. Although many researches have been performed in this domain, there are some challenges that have not been addressed yet. In this research, Neutrosophic (NS) theory and convolutional neural networks (CNN) are used to improve the accuracy of speaker recognition systems. To do this, at first, the spectrogram of the signal is created from the speech signal and then transferred to the NS domain. In the next step, the alpha correction operator is applied repeatedly until reaching constant entropy in subsequent iterations. Finally, a convolutional neural networks architecture is proposed to classify spectrograms in the NS domain. Two datasets TIMIT and Aurora2 are used to evaluate the effectiveness of the proposed method. The precision of the proposed method on two datasets TIMIT and Aurora2 are 93.79% and 95.24%, respectively, demonstrating that the proposed model outperforms competitive models.

کلیدواژه‌ها [English]

  • Spectrogram
  • Speaker recognition
  • Neutrosophic
  • Convolutional neural networks
[1] Muller, Christian, ed. Speaker Classification: Fundamentals, features, and methods. Springer-Verlag Berlin Heidelberg, 2007.
[2] Ajmera, Pawan K., and Raghunath S. Holambe. "Fractional Fourier transform based features for speaker recognition using support vector machine." Computers & Electrical Engineering 39, no. 2 (2013): 550-557.
[3] Rathor, Sandeep, and R. S. Jadon. "Text indpendent speaker recognition using wavelet cepstral coefficient and butter worth filter." In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-5. IEEE, 2017.
[4] Lei, Howard, Bernd T. Meyer, and Nikki Mirghafori. "Spectro-temporal Gabor features for speaker recognition." In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4241-4244. IEEE, 2012.
[5] Qi, Minhui, Yongbin Yu, Yifan Tang, QuanXin Deng, Feng Mai, and Nima Zhaxi. "Deep CNN with se block for speaker recognition." In 2020 Information Communication Technologies Conference (ICTC), pp. 240-244. IEEE, 2020.
[6] Zhang, Ming, Ling Zhang, and Heng-Da Cheng. "A neutrosophic approach to image segmentation based on watershed method." Signal processing 90, no. 5 (2010): 1510-1517.
[7] Heshmati, Abed, Maryam Gholami, and Abdolreza Rashno. "Scheme for unsupervised colour–texture image segmentation using neutrosophic set and non‐subsampled contourlet transform." IET Image Processing 10, no. 6 (2016): 464-473.
[8] Guo, Yanhui, Abdulkadir Şengür, and Jun Ye. "A novel image thresholding algorithm based on neutrosophic similarity score." Measurement 58 (2014): 175-186.
[9] Guo, Yanhui, and Abdulkadir Şengür. "A novel image edge detection algorithm based on neutrosophic set." Computers & Electrical Engineering 40, no. 8 (2014): 3-25.
[10] Guo, Yanhui, and Abdulkadir Sengur. "NCM: Neutrosophic c-means clustering algorithm." Pattern Recognition 48, no. 8 (2015): 2710-2724.
[11] Ohi, Abu Quwsar, Muhammad F. Mridha, Md Abdul Hamid, and Muhammad Mostafa Monowar. "Deep speaker recognition: Process, progress, and challenges." IEEE Access 9 (2021): 89619-89643.
[12] Kabir, Muhammad Mohsin, Muhammad F. Mridha, Jungpil Shin, Israt Jahan, and Abu Quwsar Ohi. "A survey of speaker recognition: Fundamental theories, recognition methods and opportunities." IEEE Access 9 (2021): 79236-79263.
[13] Keshet, Joseph, and Samy Bengio. Automatic speech and speaker recognition: Large Margin and Kernel Methods. John Willy and Sons, 2009.
[14] Hanifa, Rafizah Mohd, Khalid Isa, and Shamsul Mohamad. "A review on speaker recognition: Technology and challenges." Computers & Electrical Engineering 90 (2021): 107005.
[15] Bai, Zhongxin, and Xiao-Lei Zhang. "Speaker recognition based on deep learning: An overview." Neural Networks 140 (2021): 65-99.
[16] Bharath, K. P., and Rajesh Kumar. "Multitaper based MFCC feature extraction for robust speaker recognition system." In 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), vol. 1, pp. 1-5. IEEE, 2019.
[17] Ghalamiosgouei, Sina, and Masoud Geravanchizadeh. "Robust Speaker Identification Based on Binaural Masks." Speech Communication 132 (2021): 1-9.
[18] Liu, Zheli, Zhendong Wu, Tong Li, Jin Li, and Chao Shen. "GMM and CNN hybrid method for short utterance speaker recognition." IEEE Transactions on Industrial informatics 14, no. 7 (2018): 3244-3252.
[19] Han, Jae Hyun, Kang Min Bae, Seong Kwang Hong, Hyunsin Park, Jun-Hyuk Kwak, Hee Seung Wang, Daniel Juhyung Joe et al. "Machine learning-based self-powered acoustic sensor for speaker recognition." Nano Energy 53 (2018): 658-665.
[20] Sahidullah, Md, and Goutam Saha. "A novel windowing technique for efficient computation of MFCC for speaker recognition." IEEE signal processing letters 20, no. 2 (2012): 149-152.
[21] Chowdhury, Anurag, and Arun Ross. "Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals." IEEE transactions on information forensics and security 15 (2019): 1616-1629.
[22] Devi, Kharibam Jilenkumari, Nangbam Herojit Singh, and Khelchandra Thongam. "Automatic speaker recognition from speech signals using self organizing feature map and hybrid neural network." Microprocessors and Microsystems 79 (2020): 103264.
[23] Jahangir, Rashid, Ying Wah Teh, Nisar Ahmed Memon, Ghulam Mujtaba, Mahdi Zareei, Uzair Ishtiaq, Muhammad Zaheer Akhtar, and Ihsan Ali. "Text-independent speaker identification through feature fusion and deep neural network." IEEE Access 8 (2020): 32187-32202.
[24] Nunes, Joao Antônio Chagas, David Macêdo, and Cleber Zanchettin. "Additive margin sincnet for speaker recognition." In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1-5. IEEE, 2019.
[25] Moumin, Abdikarim Ali, and Smitha S. Kumar. "Automatic Speaker Recognition using Deep Neural Network Classifiers." In 2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM), pp. 282-286. IEEE, 2021.
[26] Chien, Jen-Tzung, and Kang-Ting Peng. "Neural adversarial learning for speaker recognition." Computer Speech & Language 58 (2019): 422-440.
[27] Zhang, Xingyu, Xia Zou, Meng Sun, Thomas Fang Zheng, Chong Jia, and Yimin Wang. "Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction." IEEE Access 7 (2019): 27874-27882.
[28] Dai, Meixiang, Guojun Dai, Yifan Wu, Yixing Xia, Fangyao Shen, and Hua Zhang. "An improved feature fusion for speaker recognition." In 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), pp. 183-187. IEEE, 2019.
[29] Schädler, Marc René, Bernd T. Meyer, and Birger Kollmeier. "Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition." The Journal of the Acoustical Society of America 131, no. 5 (2012): 4134-4151.
[30] Rashno, Elyas, Ahmad Akbari, and Babak Nasersharif. "A convolutional neural network model based on neutrosophy for noisy speech recognition." In 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 87-92. IEEE, 2019.
[31] Bahmaninezhad, Fahimeh, Chunlei Zhang, and John HL Hansen. "An investigation of domain adaptation in speaker embedding space for speaker recognition." Speech Communication 129 (2021): 7-16.
[32] Mesgarani, Nima, Malcolm Slaney, and Shihab A. Shamma. "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations." IEEE Transactions on audio, speech, and language processing 14, no. 3 (2006): 920-930.
[33] Ahmed, Ahmed Isam, John P. Chiverton, David L. Ndzi, and Victor M. Becerra. "Speaker recognition using PCA-based feature transformation." Speech Communication 110 (2019): 33-46.
[34] Xu, Jinwei, Shijie Li, Jingfei Jiang, and Yong Dou. "A simplified speaker recognition system based on FPGA platform." IEEE Access 8 (2019): 1507-1516.
[35] Chakroun, Rania, and Mondher Frikha. "Robust text-independent speaker recognition with short utterances using Gaussian mixture models." In 2020 International Wireless Communications and Mobile Computing (IWCMC), pp. 2204-2209. IEEE, 2020.
[36] Bian, Tengyue, Fangzhou Chen, and Li Xu. "Self-attention based speaker recognition using Cluster-Range Loss." Neurocomputing 368 (2019): 59-68.
[37] Avila, Anderson R., Douglas O’Shaughnessy, and Tiago H. Falk. "Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics." Speech Communication 132 (2021): 21-31.
[38] Lin, Ting, and Ye Zhang. "Speaker recognition based on long-term acoustic features with analysis sparse representation." IEEE Access 7 (2019): 87439-87447.
[39] Nunes, Joao Antônio Chagas, David Macêdo, and Cleber Zanchettin. "Am-mobilenet1d: A portable model for speaker recognition." In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1-8. IEEE, 2020.
[40] Govindan, Sumithra Manimegalai, Prakash Duraisamy, and Xiaohui Yuan. "Adaptive wavelet shrinkage for noise robust speaker recognition." Digital Signal Processing 33 (2014): 180-190.
[41] Guo, Yanhui, and Heng-Da Cheng. "New neutrosophic approach to image segmentation." Pattern Recognition 42, no. 5 (2009): 587-595.
]42[ زارع نظری، مرضیه، محسن سرداری زارچی، سیما عمادی، و هادی پورمحمدی. "چارچوبی برای استخراج آناتومی و طبقه بندی تصاویر پشه با رویکرد یادگیری عمیق". نشریه مدل­سازی در مهندسی 20،  70 (1401): 107-120.
]43[ عفتی، میثم، رحمت مدندوست، و زینب فلاح زرجو بازکیایی. "ارزیابی عملکرد مدل های شبکه عصبی مصنوعی، نروفازی و رگرسیون چند متغیره در پیش بینی مقاومت فشاری بتن به کمک روش بارنقطه ای". نشریه مدل­سازی در مهندسی 18، 62  (1399): 99-113.
]44[ ولایتی، محمدحسین. "ارزیابی قابلیت ضریب مشارکت ژنراتورها به منظور تعیین نوع نوسانات سیگنال کوچک سیستم قدرت با استفاده از روش‌های تحلیلی و پیش‌بینی همزمان آن‌ها با استفاده از شبکه عصبی". نشریه مدل­سازی در مهندسی 13، 42 (1394): 133-119.
]45[ سردشتی بیرجندی، مسلم، حسین رحمانی، و سعید فراهت. "کاربرد شبکه های عصبی عمیق در طبقه بندی تصاویر آسیب های شبکه فاضلاب و مشخص کردن مسیرهای بحرانی آنها". نشریه مدل­سازی در مهندسی 20، 70 (1401): 121-132.
]46[ حسینی، سیاوش، سعید ستایشی، غلامحسین روشنی، عبدالحمید زاهدی، و فرزین شماع. "افزایش کارآیی جریان سنج دوفازی با استفاده از روش های استخراج ویژگی حوزه ی فرکانس و شبکه عصبی در طیف خروجی آشکارساز". نشریه مدل­سازی در مهندسی، 19، 67 (1400).
[47] Hirsch, Hans-Günter, and David Pearce. "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions." In ASR2000-Automatic speech recognition: challenges for the new Millenium ISCA tutorial and research workshop (ITRW). 2000.
[48] TIMIT dataset, available online on: https://catalog.ldc.upenn.edu/LDC93S1. Last accessed at 14 September 2021.
[49] NOISEX-92 noise dataset, available online on: http://spib.linse.ufsc.br/noise.html. Last accessed at 14 September 2021.
]50[ رشنو، عبدالرضا، صادق فدایی، و عبدالصمد حمیدی. "تشخیص خودکار گوینده مبتنی بر ویژگی های استخراج شده از بانک فیلتر گابور و شبکه های عصبی کانولوشنال". نشریه مدل­سازی در مهندسی 21، 72 (1402): 49-67.