Image and video based emotion recognition using deep learning

Arselan Ashraf

Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/10766

Title:	Image and video based emotion recognition using deep learning
Authors:	Arselan Ashraf
Supervisor:	Teddy Surya Gunawan, Ph.D Farah Diyana Abdul Rahman, Ph.D
Subject:	Deep learning (Machine learning) Emotion recognition -- Computer simulation
Year:	2021
Publisher:	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2021
Abstract in English:	Emotion recognition utilizing pictures, videos, or speech as input is considered an intriguing issue in the research field over certain years. The introduction of deep learning procedures like the Convolutional Neural Networks (CNN) has made emotion recognition achieve promising outcomes. Since human facial appearances are considered vital in understanding one’s feelings, many research studies have been carried out in this field. However, it still lacks in developing a visual-based emotion recognition model with good accuracy and uncertainty in determining influencing features, type, the number of emotions under consideration, and algorithms. This research is carried out to develop an image and video-based emotion recognition model using CNN for automatic feature extraction and classification. The optimum CNN configuration was found to be having three convolutional layers with max-pooling attached to each layer. The third convolutional layer was followed by a batch normalization layer connected with two fully connected layers. This CNN configuration was selected because it minimized the risk of overfitting along with produced a normalized output. Five emotions are considered for recognition: angry, happy, neutral, sad, and surprised, to compare with previous algorithms. The construction of the emotion recognition model is carried out on two datasets: an image dataset, namely “Warsaw Set of Emotional Facial Expression Pictures (WSEFEP)” and a video dataset, namely “Amsterdam Dynamic Facial Expression Set – Bath Intensity Variations (ADFES-BIV).” Different pre-processing steps have been carried over data samples, followed by the popular and efficient Viola-Jones algorithm for face detection. CNN has been used for feature extraction and classification. Evaluating results using confusion matrix, accuracy, F1-score, precision, and recall shows that video-based datasets obtained more promising results than image-based datasets. The recognition accuracy, F1 score, precision, and recall for the video dataset came out to be 99.38%, 99.22%, 99.4%, 99.38, and that of the image dataset came out to be 83.33%, 79.1%, 84.46%, 80%, respectively. The proposed algorithm has been benchmarked with two other CNN-based algorithms, and the accuracy performs better around 5.33% and 3.33%, respectively, for the image dataset, while 4.38% for the video dataset. The outcome of this research provides the productivity and usability of the proposed system in visual-based emotion recognition.
Call Number:	t Q 325.73 A781I 2021
Kullliyah:	Kulliyyah of Engineering
Programme:	Master of Science (Computer and Information Engineering)
URI:	http://studentrepo.iium.edu.my/handle/123456789/10766
Appears in Collections:	KOE Thesis

Files in This Item:

File	Description	Size	Format
t11100392662ArselanAshraf_24.pdf	24 pages file	421.3 kB	Adobe PDF	View/Open
t11100392662ArselanAshraf_SEC.pdf Restricted Access	Full text secured file	2.54 MB	Adobe PDF	View/Open Request a copy

Show full item record

Google Scholar^TM

Check

Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.

Files in This Item:

Google ScholarTM

Google Scholar^TM