The 6th International Conference on Pattern Recognition and Image Analysis

Audio-Visual Emotion Recognition Using K-Means Clustering and Spatio-Temporal CNN

Poster Presentation

Authors

¹Masoumeh Sharafi; ²Mohammadreza Yazdchi

; ³Javad Rasti

¹Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran (04436727824)

²Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran (031-37934032)

³Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran

Abstract

Emotion recognition is a challenging task due to the emotional gap between subjective feeling and low-level audio-visual characteristics. Thus, the development of a feasible approach for high-performance emotion recognition might enhance human-computer interaction. Deep learning methods have enhanced the performance of emotion recognition systems in comparison to other current methods. In this paper, a multimodal deep convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) network are proposed, which fuses the audio and visual cues in a deep model. The spatial and temporal features extracted from video frames are fused with short-term Fourier transform (STFT) extracted from audio signals. Finally, a Softmax classifier is used to classify inputs into seven groups: anger, disgust, fear, happiness, sadness, surprise, and neutral mode. The proposed model is evaluated on Surrey Audio-Visual Expressed Emotion (SAVEE) database with an accuracy of 95.48%.
Our experimental study reveals that the suggested method is more effective than existing algorithms in adapting to emotion recognition in this dataset.

Keywords

bidirectional long short-term memory; 3D-convolutional neural network; Deep Learning; emotion recognition; short-term fourier transform

Proceeding Title [Persian]

Audio-Visual Emotion Recognition Using K-Means Clustering and Spatio-Temporal CNN

Authors [Persian]

Abstract [Persian]

Keywords [Persian]

bidirectional long short-term memory، 3D-convolutional neural network، Deep Learning، emotion recognition، short-term fourier transform