improving speech emotion recognition via gender classification

Document Type : Applied

Authors

Abstract

Speech emotion recognition is a relatively new field of research that could plays an important role in man-machine interaction. In this paper we use from two new spectral features for the automatic recognition of human affective information from speech. These features are extracted from the spectrogram of speech signal by image processing techniques. Also we study the effects of gender information on speech emotion recognition. Hierarchical SVM base classifiers are designed to classify speech signals according to their emotional states. Classifiers are optimized by the Fisher Discriminant Ratio (FDR) to classify the most separable classes at the upper nodes, which can reduce the classification error. The proposed algorithm tested on the well known Berlin database for the male and female speakers separately and in combination. The overall recognition rate of 43.4% is obtained for the coeducational speakers. The results show the 39.46% improvement when the gender information is used.

Keywords


 
[1] ElAyadi, M ., Kamel, M.S., Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, pp 572–587.
 
[2] Yang, B., Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing 90, pp 1415–1423.
 
[3] Monti, G., Meletti, S. (2015). Emotion recognition in temporal lobe epilepsy: A systematic review. Neuroscience and Biobehavioral Reviews 55 .pp 280–293.
 
[4] Wu, S., Falk, T.H.,  Chan, W.Y. (2011).  Automatic speech emotion recognition using modulation spectral features. Speech communication 53, pp 768–785.
 
[5] Harimi, A., AhmadyFard, A.R., Shahzadi, A., Yaghmaie, K. (2015). Anger or Joy? Emotion Recognition Using Nonlinear Dynamics of Speech. Applied Artificial Intelligence 29 .pp 675–696.
 
[6] Milton, A., Tamil Selvi, S. (2014). Class-specific multiple classifiers scheme to recognize emotions
from speech signals. Computer Speech and Language 28. pp 727–742.
 
[7] Bitouk, D., Verma, R., Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication 52, pp 613–625.
 
[8] Albornoz, E.M., Milone, D.H., Rufiner, H.L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language 25, pp 556–570.
 
[9] Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T. (2008). Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication 50, pp 487–503.
 
[10] Polzehl, T., Schmitt, A., Metze, F., Wagner, M. (2011). Anger recognition in speech using acoustic and linguistic cues. Speech Communication 53, pp 1198–1209.
 
[11] Pérez-Espinosa, H., Reyes-García, C.A., Villasenor-Pineda, L. (2011). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control 02.008.
 
[12] Kockmann, M., Burget, L., Cernocky, J. (2011). Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Communication 53, pp 1172–1185.
 
[13] Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach.Speech Communication 53, pp 1162–1171.
 
[14] Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., Elenius, K. (2011). Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech and Language 25, pp 84–104.
 
[15] Bozkurt, E., Erzin, E., Erdem, C.E., Erdem, A.T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication 53, pp 1186–1197.
 
[16] Vayrynen, E., Toivanen, J., Seppanen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication 53, pp 269–282.
 
[17] Ooi, C.S., Seng, K.P., Ang, L.M., Chew, L.W. (2014). A new approach of audio emotion recognition.  Expert Systems with Applications 41, pp 5858–5869.
 
[18] Ververidis, D., Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing 88, pp 2956–2970.
 
[19] Buck, R. (1988). Human motivation and emotion., Wiley, New York.
 
[20] Darwin, C. (1955). The expression of the emotions in man and animals. Philosophical Library Edition, London (reproduction of 1872 publication).
 
[21] Ekman, P., Friesen, W.V. (1975). Unmasking the face., Englewood Cliffs: Prentice Hall.
 
[22] Izard, C.E. (1977). Human emotions. Plenum Press, New York.
 
[23] Panksepp, J. (1988). Affective neuroscience: The foundations of human and animal emotion. Oxford University Press, New York.
 
[24] Pell, M.D., Monetta, L., Paulmann, S., Kotz, S.A. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior. 33. pp 107–120.
 
[25] Ross, E.D., Prodan, C.I., Monnot, M., (2007). Human facial expressions are organized functionally across the upper-lower facial axis. Neuroscientist. 13. pp 433–446.
 
[26] Ross, E.D., Monnot, M. (2011). Affective prosody: What do comprehension errors tell us about hemispheric lateralization of emotions, sex and aging effects, and the role of cognitive appraisal Neuropsychologia. 49. pp 866–877.
 
[27] Lewis, W., Michalson, L. (1983). Children’s emotions and moods: Developmental theory and measurement., Plenum Press, New York.
 
[28] Malatesta, C.Z., Kalnok, D. (1984). Emotional experience in younger and older adults. Journal of Gerontology. 39, pp 301–308.
 
[29] Engel, B. (2006). Healing Your Emotional Self. John Wiley & Sons, New Jersey.
 
[30] Engel, B. (2008). The Nice Girl Syndrome. John Wiley & Sons, New Jersey.
 
[31] Sanchez, F. (2006). A Thousand Moments of Solitude. Library of Congress Control Number: 2005911228, United States of America.
 
[32] Sherwood, L. (2010). HUMAN PHISIOLOGY From Cells to Systems, eight edition. Cengage Learning.  Library of Congress Control Number: 2011939366.
 
[33] Whittle, S., Yücel, M., Yap, M.B.H., Allen, N.B. (2011). Sex differences in the neural correlates of emotion: Evidence from neuroimaging. Biological Psychology. 87. pp 319– 333.
 
[34] Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B. (2005). A database of German emotional speech. Interspeech. pp 1517–1520.
 
[35] Shahzadi1, A., Ahmadyfard, A.R., Yaghmaie, K., Harimi, A. (2013). Recognition Of Emotion In Speech Using Spectral Patterns. Malaysian Journal of Computer Science 26(2), pp 140-158..
 

[36] Sreenivasa Rao, K., Ramu Reddy, V.,  Maity, S. (2015). Language Identification Using Spectral and Prosodic Features.  New York, Springer.

 
 
[38] Kotti, M., Kotropoulos, C. (2008). Gender classification in two Emotional Speech databases. 19th International Conference on Pattern Recognition (ICPR).
 
[39] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE, vol. 18, pp. 32-80.