Speech-based depression recognition for Bahasa Malaysia speakers using deep learning models

Ezzi, Mugahed Al Ezzi Ahmed

Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/11007

Title:	Speech-based depression recognition for Bahasa Malaysia speakers using deep learning models
Authors:	Ezzi, Mugahed Al Ezzi Ahmed
Supervisor:	Nik Nur Wahidah Nik Hashim, Ph.D Hasan Firdaus Mohd Zaki, Ph.D
Subject:	Automatic speech recognition Deep learning (Machine learning)
Year:	2021
Publisher:	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2021
Abstract in English:	Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, family members, society, and the economy. Traditional clinical diagnosis methods are subjective, complicated, and require extensive participation of experts. Furthermore, the severe shortage in psychiatrists’ ratio per population in Malaysia imposes patients’ delay in seeking treatment and poor compliance to follow-up. On the other side, the social stigma of visiting psychiatric clinics also prevents patients from seeking early treatment. Automatic depression detection using speech signals is a promising depression biometric because it is fast, convenient, and non-invasive. However, current machine learning algorithms could not achieve high accuracy and robust results yet. Moreover, the existing researches and approaches have minimal support to Bahasa Malaysia. This research attempts to develop an end-to-end deep learning model to classify depression from Bahasa Malaysia speech using our dataset collected from clinically depressed and healthy Bahasa Malaysia speakers. The dataset was collected via an online platform using participants’ mobile phones to record their read and spontaneous speech and depression status. Depression status is identified by the Patient Health Questionnaire (PHQ-9), the Malay Beck Depression Inventory-II (Malay BDI-II), and subjects’ declaration of Major Depressive Disorder diagnosis by a trained clinician. The dataset consists of 42 and 11 depressed female and male participants, respectively, and 68 and 9 healthy female and male participants. However, this research study focuses on female data only due to data insufficient. We provided a detailed implementation of the deep learning model using two approaches: raw audio input and acoustic features input. Multiple combinations of speech types were analyzed using various deep neural network models. Additionally, an analysis of robust feature selection was carried out on the acoustic features input before proceeding to the deep learning models. After performing hyperparameters tuning, raw audio input from female read and female spontaneous speech combination using AttCRNN model achieved an accuracy of 91%. In comparison, robust acoustic features input from female spontaneous speech using RNN model achieved an accuracy of 85%. These results could be improved by providing a larger dataset. Besides, male and gender-independent models could be further studied.
Call Number:	t TK 7895 S65 E99S 2021
Kullliyah:	Kulliyyah of Engineering
Programme:	Master of Science in Engineering
URI:	http://studentrepo.iium.edu.my/handle/123456789/11007
Appears in Collections:	KOE Thesis

Files in This Item:

File	Description	Size	Format
t11100437199MugahedAlEzziAhmedEzzi_24.pdf	24 pages file	436.91 kB	Adobe PDF	View/Open
t11100437199MugahedAlEzziAhmedEzzi_SEC.pdf Restricted Access	Full text secured file	4.64 MB	Adobe PDF	View/Open Request a copy

Show full item record

Google Scholar^TM

Check

Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.

Files in This Item:

Google ScholarTM

Google Scholar^TM