Automated Document Classification for News Article in Bahasa Indonesia based on Term Frequency Inverse Document Frequency (TF-IDF) Approach

被引:0
|
作者
Hakim, An Aulia [1 ]
Erwin, Alva [1 ]
Eng, Kho I. [1 ]
Galinium, Maulahikmah [1 ]
Muliady, Wahyu [1 ]
机构
[1] Swiss German Univ, BSD, Fac Engn & Informat Tehcnol, Tangerang, Indonesia
来源
2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE) | 2014年
关键词
Text mining; Text Classification; TF-IDF approach;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier implements TF-IDF algorithm. TF-IDF is an algorithm that counts the word weight by considering frequency of the word (TF) and in how many files the word can be found (IDF). Since the IDF could see the in how many files a term can be found, it can control the weight of each word. When a word can be found in so many files, it will be considered as an unimportant word. TF-IDF has been proven to create a classifier that could classify news articles in Bahasa Indonesia in a high accuracy; 98.3%.
引用
收藏
页码:29 / 32
页数:4
相关论文
共 9 条
  • [1] A Novel Feature Selection Approach Based on Document Frequency of Segmented Term Frequency
    Zhou, Hongfang
    Han, Shuang
    Liu, Yibin
    IEEE ACCESS, 2018, 6 : 53811 - 53821
  • [2] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
    Endalie, Demeke
    Haile, Getamesay
    Abebe, Wondmagegn Taye
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [3] OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    COMPUTING AND INFORMATICS, 2020, 39 (05) : 881 - 906
  • [4] Optimal feature subset selection based on combining document frequency and term frequency for text classification
    Karpagalingam T.
    Karuppaiah M.
    1600, Slovak Academy of Sciences (39): : 881 - 906
  • [5] An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation
    Kumar, Sachin
    Sharma, Aditya
    Reddy, B. Kartheek
    Sachan, Shreyas
    Jain, Vaibhav
    Singh, Jagvinder
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (03) : 1341 - 1355
  • [6] Feature Selection Based on Term Frequency Reordering of Document Level
    Zhou, Hongfang
    Zhang, Yingjie
    Liu, Hongjiang
    Zhang, Yao
    IEEE ACCESS, 2018, 6 : 51655 - 51668
  • [7] Enhancing classification effectiveness of Chinese news based on term frequency
    Chan, Tzu-Yi
    Chang, Yue-Shan
    2017 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CLOUD AND SERVICE COMPUTING (SC2 2017), 2017, : 124 - 131
  • [8] Text Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloom's Taxonomy
    Aninditya, Annisa
    Hasibuan, Muhammad Azani
    Sutoyo, Edi
    2019 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND INTELLIGENCE SYSTEM (IOTAIS), 2019, : 112 - 117
  • [9] An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification
    Nafis, Nur Syafiqah Mohd
    Awang, Suryanti
    IEEE ACCESS, 2021, 9 : 52177 - 52192