Automated Document Classification for News Article in Bahasa Indonesia based on Term Frequency Inverse Document Frequency (TF-IDF) Approach

被引：0

作者：

Hakim, An Aulia ^{[1
]}

Erwin, Alva ^{[1
]}

Eng, Kho I. ^{[1
]}

Galinium, Maulahikmah ^{[1
]}

Muliady, Wahyu ^{[1
]}

机构：

[1] Swiss German Univ, BSD, Fac Engn & Informat Tehcnol, Tangerang, Indonesia

来源：

2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE) | 2014年

关键词：

Text mining; Text Classification; TF-IDF approach;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier implements TF-IDF algorithm. TF-IDF is an algorithm that counts the word weight by considering frequency of the word (TF) and in how many files the word can be found (IDF). Since the IDF could see the in how many files a term can be found, it can control the weight of each word. When a word can be found in so many files, it will be considered as an unimportant word. TF-IDF has been proven to create a classifier that could classify news articles in Bahasa Indonesia in a high accuracy; 98.3%.

引用

页码：29 / 32

页数：4

共 9 条

[1] A Novel Feature Selection Approach Based on Document Frequency of Segmented Term Frequency
Zhou, Hongfang
Han, Shuang
Liu, Yibin
IEEE ACCESS, 2018, 6 : 53811 - 53821
[2] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
Endalie, Demeke
Haile, Getamesay
Abebe, Wondmagegn Taye
PEERJ COMPUTER SCIENCE, 2022, 8
[3] OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
Karpagalingam, Thirumoorthy
Karuppaiah, Muneeswaran
COMPUTING AND INFORMATICS, 2020, 39 (05) : 881 - 906
[4] Optimal feature subset selection based on combining document frequency and term frequency for text classification
Karpagalingam T.
Karuppaiah M.
1600, Slovak Academy of Sciences (39): : 881 - 906
[5] An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation
Kumar, Sachin
Sharma, Aditya
Reddy, B. Kartheek
Sachan, Shreyas
Jain, Vaibhav
Singh, Jagvinder
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (03) : 1341 - 1355
[6] Feature Selection Based on Term Frequency Reordering of Document Level
Zhou, Hongfang
Zhang, Yingjie
Liu, Hongjiang
Zhang, Yao
IEEE ACCESS, 2018, 6 : 51655 - 51668
[7] Enhancing classification effectiveness of Chinese news based on term frequency
Chan, Tzu-Yi
Chang, Yue-Shan
2017 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CLOUD AND SERVICE COMPUTING (SC2 2017), 2017, : 124 - 131
[8] Text Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloom's Taxonomy
Aninditya, Annisa
Hasibuan, Muhammad Azani
Sutoyo, Edi
2019 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND INTELLIGENCE SYSTEM (IOTAIS), 2019, : 112 - 117
[9] An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification
Nafis, Nur Syafiqah Mohd
Awang, Suryanti
IEEE ACCESS, 2021, 9 : 52177 - 52192

← 1 →