Multimodal recognition with deep learning: audio, image, and text
Abstract
Emotion detection is essential in many domains including affective computing, psychological assessment, and human computer interaction (HCI). It contrasts the study of emotion detection across text, image, and speech modalities to evaluate state-of-the-art approaches in each area and identify their benefits and shortcomings. We looked at present methods, datasets, and evaluation criteria by conducting a comprehensive literature review. In order to conduct our study, we collect data, clean it up, identify its characteristics and then use deep learning (DL) models. In our experiments we performed text-based emotion identification using long short-term memory (LSTM), term frequency-inverse document frequency (TF-IDF) vectorizer, and image-based emotion recognition using a convolutional neural network (CNN) algorithm. Contributing to the body of knowledge in emotion recognition, our study's results provide light on the inner workings of different modalities. Experimental findings validate the efficacy of the proposed method while also highlighting areas for improvement.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijres.v14.i1.pp254-264
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Reconfigurable and Embedded Systems (IJRES)
p-ISSN 2089-4864, e-ISSN 2722-2608
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).