Text Preprocessing Approaches in CNN for Disaster Reports Dataset

被引:1
作者
Arisha, Andriansyah Oktafiandi [1 ]
Hazriani [1 ]
Wabula, Yuyun [1 ]
机构
[1] Handayani Univ, Dept Comp Syst, Makassar, Indonesia
来源
2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC | 2023年
关键词
Text Preprocessing; CNN; Disaster; automatic; semi-automatic;
D O I
10.1109/ICAIIC57133.2023.10067109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study aims to compare the performance of the text-preprocessing methods namely automatic and semiautomatic preprocessing techniques in the CNN algorithm to carry out learning on disaster report dataset. The experimental results on the disaster dataset with a total of 200 records with the automatic text preprocessing technique produce an average accuracy of 0.81 and 1 with training data of 80:20 and 90:10. While in the optimize model that is semi-automatic text preprocessing approach (which is the author's proposed approach), the average accuracy obtained are 0.95 and 1 for dataset 80:20 and 90:10. The experimental results conclude that cleaning the dataset with the semi- automatic text preprocessing model can improve accuracy compared to the previous model. The proposed model will get convergence with 80:20 training data at epoch 20, batch size 5 and random state 34, while for dataset 90:10 the best convergence value at epoch 20-30.
引用
收藏
页码:216 / 220
页数:5
相关论文
共 31 条
[11]  
Dilo A., DATA MODELLING EMERG
[12]   When Stopword Lists Make the Difference [J].
Dolamic, Ljiljana ;
Savoy, Jacques .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (01) :200-203
[13]   Automatic Multilingual Stopwords Identification from Very Small Corpora [J].
Ferilli, Stefano .
ELECTRONICS, 2021, 10 (17)
[14]   Informative Tweet Classification of the Earthquake Disaster Situation In Indonesia [J].
Gata, Windu ;
Amsury, Fachri ;
Wardhani, Nia Kusuma ;
Sugiyarto, Ipin ;
Sulistyowati, Daning Nur ;
Saputra, Irwansyah .
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, ENGINEERING, AND DESIGN (ICCED), 2019,
[15]  
Gina B. U. S., MODUL 2 MANAJEMEN PE
[16]   Automatic Classification of Eyewitness Messages for Disaster Events Using Linguistic Rules and ML/AI Approaches [J].
Haider, Sajjad ;
Mahmood, Azhar ;
Khatoon, Shaheen ;
Alshamari, Majed ;
Afzal, Muhammad Tanvir .
APPLIED SCIENCES-BASEL, 2022, 12 (19)
[17]  
Kaplan R. M., METHOD TOKENIZING TE
[18]   Natural Language Processing for Disaster Management Using Conditional Random Fields [J].
Ketmaneechairat, Hathairat ;
Maliyaem, Maleerat .
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2020, 11 (02) :97-102
[19]   What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets [J].
Kitchin, Rob ;
McArdle, Gavin .
BIG DATA & SOCIETY, 2016, 3 (01) :1-10
[20]  
Li H., 2021, COMBINING SELFTRAINI