Summary: | Human behavior is influenced by emotion and human expressed affective state through numerous channels using non-verbal communication; namely: facial expression, gestures, eye-gazing, body postures, as well as verbal communication. In verbal communication itself, there are lots of underlying information transmitted using acoustical features and the semantic meaning of the word/sentence used. Despite the evident complexity of such interaction, listener still can correctly perceive the propagated emotion conveyed by the interlocutor. This is due to the human cognitive functional ability to dissect and infer the information with high accuracy and then react accordingly with appropriate behavioral responses and feedbacks. Hence, this research work introduces novel technique in discriminating emotion to facilitate the understanding of speaker affective state, based on the hypothesis that emotion is propagated through speech and it can be quantified.
Speech emotion is a growing multi-disciplinary research field and is gaining greater momentum due to the increased need to improve on the quality of human computer interaction. Numerous researchers apply various feature extraction methods coupled with classifiers to produce acceptable accuracy performance. Nonetheless, the performance of such a system is bound to cultural influence which resulted in unpromising outcome once an unknown culture-influenced speech is introduced. Culture is always regarded as a trivial and inconsequential parameter that heeds minimal consideration in speech emotion recognition. Hence, in this work, the intricate relationship of cultural influence in term of intra-cultural and inter-cultural effects is studied in details. Two speech emotion datasets; of the NTU_American and NTU_Asian dataset representing the American and Asian culture influence to speech emotion respectively were collected and together with the standard Berlin speech emotion dataset were used to understand the speech emotion recognition system and the culture bias.
The work is then extended to investigate speaker affective state profiling using the Valence-Arousal (VA) analysis approach that enables visualization tool to be utilized for intra-cultural and inter-cultural assessments. The strength of this VA approach is that it is able to facilitate the observation of new finding as well as catering to dynamic data-driven affective space model generation that is able to empirically verify the psychologists’ agreement of the affective space model. This proposed approach is developed to complement the discrete-class classification system that is rigid and lacking the explainable components. The result shows huge potential for future practical applications of such analysis system; which enables researchers, engineers, scientists, psychologists, medical practitioner as well as intelligent system developer to visualize emotions from a common view point.
|