KNN with TF-IDF Based Framework for Text Categorization

被引:151
|
作者
Trstenjak, Bruno [1 ]
Mikac, Sasa [2 ]
Donko, Dzenana [3 ]
机构
[1] Medimurje Univ Appl Sci Cakovec, Dept Comp Engn, Cakovec, Croatia
[2] Fac Elect Engn & Comp Sci, Dept Comp Sci, Maribor, Slovenia
[3] Fac Elect Engn, Dept Comp Sci, Sarajevo, Bosnia & Herceg
来源
24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013 | 2014年 / 69卷
关键词
text documents classification; K-Nearest Neighbor; TF-IDF; framework; machine learning;
D O I
10.1016/j.proeng.2014.03.129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
KNN is a very popular algorithm for text classification. This paper presents the possibility of using KNN algorithm with TF-IDF method and framework for text classification. Framework enables classification according to various parameters, measurement and analysis of results. Evaluation of framework was focused on the speed and quality of classification. The results of testing showed the good and bad features of algorithm, providing guidance for the further development of similar frameworks. (C) 2014 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:1356 / 1364
页数:9
相关论文
共 50 条
  • [31] Continuous Speech Recognition with a TF-IDF Acoustic Model
    Zweig, Geoffrey
    Patrick Nguyen
    Droppo, Jasha
    Acero, Alex
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2858 - 2861
  • [32] Unsupervised sentence representations as word information series: Revisiting TF-IDF
    Arroyo-Fernandez, Ignacio
    Mendez-Cruz, Carlos-Francisco
    Sierra, Gerardo
    Torres-Moreno, Juan-Manuel
    Sidorov, Grigori
    COMPUTER SPEECH AND LANGUAGE, 2019, 56 : 107 - 129
  • [33] An information-theoretic perspective of tf-idf measures
    Aizawa, A
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 45 - 65
  • [34] TF-IDF based binary fingerprint search with vector quantization error compensation
    Park, Jihyun
    Kim, Junghyun
    Yoo, Wonyoung
    2015 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC), 2015, : 573 - 575
  • [35] Multi Words Quran and Hadith Searching Based on News Using TF-IDF
    Darwiyanto, Eko
    Pratama, Ganang Arief
    Widowati, Sri
    2016 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2016,
  • [36] Research on Keywords Variations in Linguistics Based on TF-IDF and N-gram
    Li Y.
    Wen X.
    Liu X.
    Journal of Computing and Information Technology, 2022, 30 (03) : 193 - 204
  • [37] Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score
    Kim, Hyun-Jin
    Baek, Ji-Won
    Chung, Kyungyong
    APPLIED SCIENCES-BASEL, 2020, 10 (13):
  • [38] A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN
    Mao, Teng
    Peng, Yuanyuan
    Hang, Yuru
    Zhang, Yangsen
    CHINESE LEXICAL SEMANTICS, CLSW 2018, 2018, 11173 : 509 - 518
  • [39] Introduction to Text Classification: Impact of Stemming and Comparing TF-IDF and Count Vectorization as Feature Extraction Technique
    Wendland, Andre
    Zenere, Marco
    Niemann, Joerg
    SYSTEMS, SOFTWARE AND SERVICES PROCESS IMPROVEMENT, EUROSPI 2021, 2021, 1442 : 289 - 300
  • [40] A Sentiment analysis-based hotel recommendation using TF-IDF Approach
    Mishra, Ram Krishn
    Urolagin, Siddhaling
    Jothi, Angel Arul J.
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND KNOWLEDGE ECONOMY (ICCIKE' 2019), 2019, : 811 - 815