Speech emotion recognition using deep neural networks

Qadri, Syed Asif Ahmad

Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/10140

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Gunawan, Teddy Surya, PhD	en_US
dc.contributor.advisor	Hasmah Mansor, PhD	en_US
dc.contributor.author	Qadri, Syed Asif Ahmad	en_US
dc.date.accessioned	2020-12-22T02:07:58Z	-
dc.date.available	2020-12-22T02:07:58Z	-
dc.date.issued	2020-08	-
dc.identifier.uri	http://studentrepo.iium.edu.my/handle/123456789/10140	-
dc.description.abstract	With the ever-increasing interest of research community in studying human- computer/human-human interactions, systems deducing and identifying emotional aspects of a speech signal has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intelligent analysis of human utterances to reality. Typically, a SER system focuses on extracting the features from speech signals such as pitch frequency, formant features, energy related and spectral features, tailing it with a classification quest to understand the underlying emotion. However, as of now there still exists a considerable amount of uncertainty arising from factors like, determining influencing features, development of hybrid algorithms, type and number of emotions and languages under consideration, etc. The key issues pivotal for successful SER system are driven by proper selection of proper emotional feature extraction techniques. In this research Mel- frequency Cepstral Coefficient (MFCC) and Teager Energy Operator (TEO) along with a new-fangled fusion of MFCC and TEO referred as Teager-MFCC (TMFCC) is examined over multilingual database consisting of English, German and Hindi languages. These datasets have been retrieved from authentic and widely adopted sources. The German corpus is the well-known Berlin Emo-DB, the Hindi corpus is Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC) and the English corpus is Toronto emotional speech set (TESS). Deep Neural Networks has been used for the classification of the different emotions considered viz., happy, sad, angry, and neutral. Evaluation results shows that MFCC with recognition rate of 87.8% outperforms TEO and TMFCC. With TEO and TMFCC configurations, the recognition rate has been found as 77.4% and 82.1% respectively. However, while considering energy-based emotions, contrasting results were fetched. TEO with recognition rate of 90.5% outperforms MFCC and TMFCC. With MFCC and TMFCC configurations, the recognition rate has been found as 83.7% and 86.7% respectively. The outcome of this research would assist information of a pragmatic emotional speech recognition implementation driven by wiser selection of underlying feature extraction techniques.	en_US
dc.language.iso	en	en_US
dc.publisher	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2020	en_US
dc.subject.lcsh	Speech processing systems	en_US
dc.subject.lcsh	Signal processing -- Digital techniques	en_US
dc.subject.lcsh	Neural networks (Computer science)	en_US
dc.title	Speech emotion recognition using deep neural networks	en_US
dc.type	Master Thesis	en_US
dc.description.identity	t11100418323SyedAsifAhmadQadri	en_US
dc.description.identifier	Thesis : Speech emotion recoqnition using deep neural networks /by Syed Asif Ahmad Qadri	en_US
dc.description.kulliyah	Kulliyyah of Engineering	en_US
dc.description.programme	Master of Science (Computer and Information Engineering)	en_US
dc.description.abstractarabic	مع الاهتمام المتزايد لمجتمع الأبحاث بدراسة التفاعلات بين الإنسان والحاسوب / الإنسان والإنسان، برزت الأنظمة التي تستنبط وتعرف الجوانب العاطفية لإشارة الكلام كموضوع بحثي ساخن. أدى التعرف على عاطفة الكلام (SER) إلى تطوير التحليل الآلي والذكاء للألفاظ البشرية إلى واقع ملموس. عادةً ما يركز نظام SER على استخلاص الميزات من إشارات الكلام مثل تردد النغمة وميزات التكوين والميزات الطيفية ذات الصلة بالطاقة، وتكييفها مع تصنيف سعري لفهم المشاعر الكامنة. ومع ذلك، حتى الآن لا يزال هناك قدر كبير من عدم اليقين الناشئ عن عوامل مثل تحديد الخصائص المؤثرة وتطوير الخوارزميات المختلطة ونوع وعدد المشاعر واللغات قيد النظر، إلخ. تتمحور القضايا الرئيسية المحورية لنظام SER الناجح عن طريق الاختيار الصحيح لتقنيات استخراج الميزة العاطفية المناسبة. في هذا البحث، يتم فحص معامل ميلتر Cepstral (MFCC) ومشغل الطاقة Teager (TEO) مع اندماج جديد fangled من MFCC وTEO يشار إليه Teager-MFCC (TMFCC) في قاعدة بيانات متعددة اللغات تتكون من اللغات الإنجليزية والألمانية والهندية.تم تجميع مجموعات البيانات هذه من مصادر أصلية ومعتمدة على نطاق واسع. مجموعة البينات الألمانية هي برلين Emo-DB المعروف جدًا ، و مجموعة البينات الهندية هي المعهد الهندي للتكنولوجيا خراغبور محاكاة اللغة الهندية (IITKGP-SEHSC) ومجموعة اللغة الإنجليزية هي مجموعة الكلام العاطفي في تورونتو (TESS).تم استخدام الشبكات العصبية العميقة لتصنيف المشاعر المختلفة التي تم اعتبارها بمعنى: سعيدة وحزينة وغاضبة ومحايدة. تظهر نتائج التقييم أن MFCC مع معدل التعرف على 87.8 ٪ يتفوق TEO وTMFCC. مع تكوينات TEO وTMFCC، تم العثور على معدل التعرف بنسبة 77.4 ٪ و82.1 ٪ على نحو محترم. ومع ذلك، أثناء النظر في المشاعر القائمة على الطاقة، تم جلب نتائج متباينة. TEO مع معدل الاعتراف 90.5 ٪ يتفوق MFCC و TMFCC. مع تكوينات MFCC وTMFCC، تم العثور على معدل التعرف على 83.7٪ و86.7٪ على نحو متتابع .سوف تساعد نتائج هذا البحث في انشاء معلومات تنفيذ عملية للتعرف على العاطفة في الكلام بانتقاء الاختيار الأكثر حكمة بناء على تقنيات استخلاص الخصائص الأساسية للإشارات الصوتية.	en_US
dc.description.callnumber	t TK 7882 S65 Q1S 2020	en_US
dc.description.notes	Thesis (MSCIE)--International Islamic University Malaysia, 2020.	en_US
dc.description.physicaldescription	xiv, 112 leaves : colour illustrations ; 30cm.	en_US
item.openairetype	Master Thesis	-
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
Appears in Collections:	ISTAC Thesis

Files in This Item:

File	Description	Size	Format
t11100418323SyedAsifAhmadQadri_24.pdf	24 pages file	411.39 kB	Adobe PDF	View/Open
t11100418323SyedAsifAhmadQadri_SEC.pdf Restricted Access	Full text secured file	1.72 MB	Adobe PDF	View/Open Request a copy

Show simple item record

Page view(s)

134

checked on May 17, 2021

Download(s)

82

checked on May 17, 2021

Google Scholar^TM

Check

Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM