نوع مقاله : مقاله پژوهشی
نویسندگان
1 دانشکده فنی و مهندسی، دانشگاه لرستان، خرم آباد، ایران
2 گروه برق، دانشگاه لرستان/ خرم آباد/لرستان
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Speech Emotion Recognition (SER) is a significant field in speech signal processing and artificial intelligence, with broad applications in human-computer interaction, intelligent customer services, and emotional state detection. However, challenges such as the scarcity of diverse training data and the complexities of extracting effective features, limit the performance of SER systems. This paper presents a hybrid method based on Data Augmentation, a Bidirectional Long Short-Term Memory (BiLSTM) neural network, and the Random Forest algorithm to enhance the accuracy and reliability of the system. Initially, data augmentation techniques such as speed variation, noise addition, and pitch shifting are employed to generate synthetic samples. Subsequently, time-frequency features are extracted by the BiLSTM and passed to the Random Forest algorithm for final classification. This paper demonstrates that combining Data Augmentation with deep and traditional models can serve as a powerful approach to improving the accuracy and efficiency of SER systems. Evaluations of the proposed method on the expanded well-established EMODB database achieve an accuracy of 85.11%.
کلیدواژهها [English]