Abdelwahab, Mohamed Yassin and Al Moaiad, Yazeed and Zainab Abu Bakar, (2023) Arabic text summarization using pre-processing methodologies and techniques. Asia-Pacific Journal of Information Technology and Multimedia, 12 (1). pp. 70-110. ISSN 2289-2192
|
PDF
766kB |
Official URL: https://www.ukm.my/apjitm/
Abstract
Recently, one of the problems that has arisen due to the amount of information and its availability on the web, is the increased need for effective and powerful tools to automatically summarize text. For English and European languages an intensive works has been done with high performance and nowadays they look forward to multi-document and multi-language summarization. However, Arabic language still suffers from the little attention and research done in this field. In our research we propose a model to automatically summarize Arabic text using text extraction. Various steps are involved in the approach: preprocessing text, extract set of features from sentences, classify sentence based on scoring method, ranking sentences and finally generate an extract summary. The main difference between our proposed system and other Arabic summarization systems are the consideration of semantics, entity objects such as names and places, and similarity factors in our proposed system. In recent years, text summarization has seen renewed interest, and has been experiencing an increasing number of research and products especially in English language. However, in Arabic language, little work and limited research have been done in this field. will be adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods. Finally, an experiment on the Essex Arabic Summaries Corpus (EASC) using the ROUGE-1 and ROUGE-2 metrics showed promising results in comparison with existing methods.
Item Type: | Article |
---|---|
Keywords: | Arabic text summarization; Machine learning; Natural language processing |
Journal: | Asia - Pasific Journal of Information Technology and Multimedia (Formerly Jurnal Teknologi Maklumat dan Multimedia) |
ID Code: | 22540 |
Deposited By: | Mr. Mohd Zukhairi Abdullah |
Deposited On: | 23 Nov 2023 01:24 |
Last Modified: | 23 Nov 2023 03:25 |
Repository Staff Only: item control page