Please use this identifier to cite or link to this item: http://studentrepo.iium.edu.my/handle/123456789/11438
Title: Arabic text classification based on artificial bee colony algorithm and semantic relations
Authors: Hijazi, Musab Mustafa
Supervisor: Akram M. Zeki M Khedher, Ph.D
Amelia Ritahani Ismail, Ph.D
Subject: Semantic computing
Algorithm
Arabic language -- Semantics
Year: 2022
Publisher: Kuala Lumpur : Kulliyyah of Information and Communication Technology, International Islamic University Malaysia, 2022
Abstract in English: Documents contain a tremendous quantity of important human information. The use of automatic text classification is necessitated by the substantial increase in the volume of machine-readable documents for public or private access. Text classification is the process of categorizing or organizing documents into a predetermined set of classes. Western languages, namely English, have received a lot of attention, whereas the Arabic language has received far less attention. Arabic text categorization methods emerged spontaneously as a result of the vast volume of diverse textual material provided in Arabic on the internet. The selection of features is an essential step in text categorization. It is an important preprocessing approach for effective data analysis, in which just a subset of the original data features is chosen after eliminating noisy, unnecessary, or duplicated features. Bag of Words (BoWs) representation is considered the simplest representation of texts. Most Arabic researchers have been trying to find an accurate Arabic text classification based on the traditional Bag of Words (BoWs) for data representation which does not consider the semantic relationships between the words, such as synonymy and hypernyms. This research aims to build a model for Arabic text classification using the Artificial bee colony algorithm as a feature selection method and Arabic WordNet (AWN) as a lexical and semantic resource to utilize the semantic relationships between the words. The results of the research showed that the proposed Chi-square – Binary Artificial Bee Colony chi-BABC feature selection method was able to reduce the dimensionality of the feature set and at the same time improve the text classification. It was able to reduce approximately 89% of the original feature list size when the Naïve Bayes classifier was used as a fitness function. On the other hand, around 94% of the original feature list size was reduced by the proposed feature selection method when Support Vector Machines was utilized as a fitness function. The proposed FS method was evaluated using Support Vector Machine, C4.5 Decision tree, and Naïve Bayes. Experiments showed that the proposed FS improved the performance of Arabic Text Classification with superior results for SVM with 86.9% compared with 84.5, and 77.3 for NB, and C4.5 respectively. Furthermore, the proposed FS method was compared with PSO, ACO, and GA. The experiment results showed that the proposed method outperformed the others by having 86.9% compared with 84.7%, 83.4%, and 82.7 for PSO, ACO, and GA respectively. Finally, utilizing concepts and semantic relations between them enriches the text representation by adding more semantic meaning, improving the text classification performance. The text classification performance based on grouping methods was enhanced by 2% for category term relation and 2%, and 3% for related to and has holo member relations respectively. The best classification performance was when the holo member relation is part of combined relations. The superior text classification result was 81.2 for the combination of related-to with has holo member relations while the lowest result was 78.6 for the combination of has hyponym with category term relations.
Call Number: t QA 76.5913 H639A 2022
Kullliyah: Kulliyyah of Information and Communication Technology
Programme: Doctor of Philosophy in Computer Science
URI: http://studentrepo.iium.edu.my/handle/123456789/11438
Appears in Collections:KICT Thesis

Files in This Item:
File Description SizeFormat 
t11100480366 MusabMustafaHijazi_24.pdf24 pages file534.5 kBAdobe PDFView/Open
t11100480366 MusabMustafaHijazi_SEC.pdf
  Restricted Access
Full text secured file1.99 MBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check


Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated. Please give due acknowledgement and credits to the original authors and IIUM where applicable. No items shall be used for commercialization purposes except with written consent from the author.