Improvement of deep reinforcement models using extreme learning machine for autonomous agents in unstructured environment

Aldahoul, Nouar

Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/10691

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Zaw Zaw Htike, Ph.D	en_US
dc.contributor.advisor	Amir Akramin Shafie, Ph.D	en_US
dc.contributor.author	Aldahoul, Nouar	en_US
dc.date.accessioned	2021-11-23T03:29:57Z	-
dc.date.available	2021-11-23T03:29:57Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://studentrepo.iium.edu.my/handle/123456789/10691	-
dc.description.abstract	Creating an autonomous agent, that gets real observations such as sensory data and images from the surrounding environment and learns optimal sequential actions, has been considered as one of the main goals of Artificial General Intelligence (AGI). Deep (Hierarchical) Reinforcement Learning (HRL/DRL) can address this objective. Traditional deep reinforcement learning methods suffer from long learning and training time resulted from the need to fine-tune the weights iteratively in the network. This research investigates the previous problem by utilizing a random weights generation approach that is based on Extreme Learning Machine. This method benefits from the randomness of input weights and least square solution in output weights calculation to reduce the training time by an order of magnitude. Hierarchical ELM (H-ELM) and Local Receptive Field ELM (LRF-ELM) are recent versions of multilayer ELM to respectively learn and extract features by hierarchical learning scheme. They have outperformed other existing deep models in terms of learning time (speed). H-ELM’s architecture was found to be similar to gradient-based (GB) auto-encoder without weights fine-tuning. However, H-ELM gives higher learning speed compared to the GB autoencoder. Moreover, LRF-ELM was found as similar to Convolutional Neural Network (CNN) without weights fine-tuning. It has outperformed the traditional CNN in the term of learning time. Therefore, in this research, the proposed method, which combines RL with H-ELM or LRF-ELM, is an efficient solution to approximate the action-value function and learn an optimal policy directly from visual data (images) in a short time. In addition, this research proposed a novel method called Convolutional H-ELM (CH-ELM) which is a combination of pre-trained CNN and H-ELM. This method has outperformed either CNN or H-ELM in terms of accuracy and RMSE. The experimental results have been analyzed and evaluated in different applications such as target reaching arm, 2D maze navigation, slide puzzle game , objects sorting, and rock-paper-scissor game. The data samples have been trained and tested to investigate the robustness of the proposed systems. It was found that the proposed models can reduce the learning time by an order of magnitude in various tasks without degrading the performance. The big improvement in learning speed in the proposed method can neglect the slight drop in accuracy in few tasks compared to traditional methods. Therefore, the proposed method can balance the trade-off between learning speed and good performance. In addition, it is able to run on traditional CPUs that are available in the most of the low cost embedding systems.	en_US
dc.language.iso	en	en_US
dc.publisher	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2021	en_US
dc.subject.lcsh	Reinforcement learning	en_US
dc.subject.lcsh	Artificial intelligence -- Engineering applications	en_US
dc.title	Improvement of deep reinforcement models using extreme learning machine for autonomous agents in unstructured environment	en_US
dc.type	Doctoral Thesis	en_US
dc.description.identity	t11100393421NouarAldahoul	en_US
dc.description.identifier	Thesis : Improvement of deep reinforcement models using extreme learning machine for autonomous agents in unstructured environment /by Nouar Aldahoul	en_US
dc.description.kulliyah	Kulliyyah of Engineering	en_US
dc.description.programme	Doctor of Philosophy (Engineering)	en_US
dc.description.abstractarabic	تعد عملية ايجاد عميل ذاتي القرار قادر على رصد المشاهدات الحقيقية كبيانات الحساسات و الصور من البيئة المحيطة وتعلم سلسلة من الأفعال المثالية من أهم أهداف الذكاء العام الصنعي. استطاع التعلم المعزز الهرمي (العميق) تحقيق هذا الهدف. تعاني الطرق التقليدية للتعلم المعزز العميق من طول زمن التعلم والتدريب الناتج من الحاجة إلى توليف الأوزان بشكل متكرر في الشبكة. في هذا البحث تم دراسة هذه المشكلة بالاستفادة من مفهوم توليد الأوزان العشوائية القائم على خوارزمية ELM. هذه الطريقة تستفيد من عشوائية أوزان الدخل ومن الحل القائم على المربعات الصغرى في حساب أوزان الخرج لانقاص زمن التدريب عدد من المرات. تم الاستفادة من البنية الهرمية H-ELM و حقول الاستقبال المحلي ELM-LRFs وهما إصداران حديثان لشبكة ELM متعددة الطبقات ويتم فيهما تعلم الميزات أو استخراجها عن طريق التعلم الهرمي. هذه النماذج تفوقت على نماذج التعلم العميق الموجودة مسبقاً من خلال زمن التعلم (سرعة التعلم). إن بنية H-ELM تشبه المرمز الألي القائم على هبوط الانحدار (Gradient Descent based auto encoder ) ولكن بدون الحاجة إلى التوليف. ومع ذلك فإن البنية H-ELM تتمتع بسرعة تدريب أفضل مقارنة مع الأخير. كما تم استخدام حقول الاستقبال المحلي كبنية بديلة مشابهة للشبكات العصبونية الالتفافية CNN ولكن بدون الحاجة إلى توليف الأوزان وضبطها. وقد تم إثبات تفوقها على CNN من حيث سرعة التعلم. لذا فإن الطريقة المقترحة في هذا البحث والتي تعتمد على دمج التعلم المعزز مع H-ELM أو ELM-LRFs هي حل فعال لتقريب تابع قيم الأفعال (Action Value Function) و تعلم الاستراتيجية المثالية بشكل مباشر من المعطيات المرئية (الصور) كمدخل للنظام خلال زمن قصير. بالإضافة لما سبق تم في هذا البحث اقتراح طريقة جديدة تدعى CH-ELM و الذي تم فيه دمج الشبكة الالتفافية المدربة مسبقا مع الشبكة الهرمية العشوائية H-ELM و قد اثبت هذا النموذج تفوقه على كل من الشبكة الالتفافية CNN و شبكة H-ELM من حيث الدقة وجذر متوسط مربع الخطأ. تم في هذا البحث تحليل وتقييم النتائج التجريبية في تطبيقات مختلفة كتطبيق ذراع روبوتية يبحث عن الهدف وعميل في متاهة ثنائية البعد ولعبة البازل المتزحلق وفرز الأغراض المختلفة ولعبة حجر ورق مقص. تم تدريب وفحص عينات من البيانات للتأكد من متانة النظام المقترح. وجد أن النموذج المقترح قادر على انقاص زمن التعلم عدد من المرات في مهام مختلفة دون تراجع مستوى الأداء. إن التحسن الكبير في سرعة التعلم في الطريقة المقترحة سمح بإهمال التراجع الطفيف في الدقة في بعض المهام بالمقارنة مع الطرق التقليدية. لذلك فإن الطريقة المقترحة تستطيع موازنة المقايضة بين سرعة التعلم والاداء الجيد. بالإضافة الى أنها قابلة للتنفيذ على المعالجات التقليدية المتوفرة في معظم الأنظمة المضمنة منخفضة التكلفة.	en_US
dc.description.callnumber	t Q 325.6 A357I 2021	en_US
dc.description.notes	Thesis (Ph.D)--International Islamic University Malaysia, 2021.	en_US
dc.description.physicaldescription	xxii, 267 leaves : colour illustrations ; 30cm.	en_US
item.openairetype	Doctoral Thesis	-
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
Appears in Collections:	KOE Thesis

Files in This Item:

File	Description	Size	Format
t11100393421NouarAldahoul_24.pdf	24 pages file	501.05 kB	Adobe PDF	View/Open
t11100393421NouarAldahoul_SEC.pdf Restricted Access	Full text secured file	4.28 MB	Adobe PDF	View/Open Request a copy

Show simple item record

Google Scholar^TM

Check

Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.

Files in This Item:

Google ScholarTM

Google Scholar^TM