Speech emotion recognition using deep neural networks

Qadri, Syed Asif Ahmad

Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/10140

Title:	Speech emotion recognition using deep neural networks
Authors:	Qadri, Syed Asif Ahmad
Supervisor:	Gunawan, Teddy Surya, PhD Hasmah Mansor, PhD
Subject:	Speech processing systems Signal processing -- Digital techniques Neural networks (Computer science)
Year:	Aug-2020
Publisher:	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2020
Abstract in English:	With the ever-increasing interest of research community in studying human- computer/human-human interactions, systems deducing and identifying emotional aspects of a speech signal has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intelligent analysis of human utterances to reality. Typically, a SER system focuses on extracting the features from speech signals such as pitch frequency, formant features, energy related and spectral features, tailing it with a classification quest to understand the underlying emotion. However, as of now there still exists a considerable amount of uncertainty arising from factors like, determining influencing features, development of hybrid algorithms, type and number of emotions and languages under consideration, etc. The key issues pivotal for successful SER system are driven by proper selection of proper emotional feature extraction techniques. In this research Mel- frequency Cepstral Coefficient (MFCC) and Teager Energy Operator (TEO) along with a new-fangled fusion of MFCC and TEO referred as Teager-MFCC (TMFCC) is examined over multilingual database consisting of English, German and Hindi languages. These datasets have been retrieved from authentic and widely adopted sources. The German corpus is the well-known Berlin Emo-DB, the Hindi corpus is Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC) and the English corpus is Toronto emotional speech set (TESS). Deep Neural Networks has been used for the classification of the different emotions considered viz., happy, sad, angry, and neutral. Evaluation results shows that MFCC with recognition rate of 87.8% outperforms TEO and TMFCC. With TEO and TMFCC configurations, the recognition rate has been found as 77.4% and 82.1% respectively. However, while considering energy-based emotions, contrasting results were fetched. TEO with recognition rate of 90.5% outperforms MFCC and TMFCC. With MFCC and TMFCC configurations, the recognition rate has been found as 83.7% and 86.7% respectively. The outcome of this research would assist information of a pragmatic emotional speech recognition implementation driven by wiser selection of underlying feature extraction techniques.
Call Number:	t TK 7882 S65 Q1S 2020
Kullliyah:	Kulliyyah of Engineering
Programme:	Master of Science (Computer and Information Engineering)
URI:	http://studentrepo.iium.edu.my/handle/123456789/10140
Appears in Collections:	ISTAC Thesis

Files in This Item:

File	Description	Size	Format
t11100418323SyedAsifAhmadQadri_24.pdf	24 pages file	411.39 kB	Adobe PDF	View/Open
t11100418323SyedAsifAhmadQadri_SEC.pdf Restricted Access	Full text secured file	1.72 MB	Adobe PDF	View/Open Request a copy

Show full item record

Page view(s)

134

checked on May 17, 2021

Download(s)

82

checked on May 17, 2021

Google Scholar^TM

Check

Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM