Patterns of Using the Z-Score for Text Classification Purposes

被引:2
作者
Yatsko, V. A. [1 ]
机构
[1] Katanov Khakas State Univ, Abakan, Russia
基金
俄罗斯基础研究基金会;
关键词
automatic text classification; authorship attribution; genre classification; methods and algorithms; Z-score; Zipfian distribution; efficiency testing; Y-method; contrastive analysis;
D O I
10.3103/S0005105522050041
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes procedures of the use of the Z-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed Y-method demonstrated a higher Z-score efficiency for the solution of text classification purposes.
引用
收藏
页码:245 / 250
页数:6
相关论文
共 9 条
[1]  
Kathiresan V, 2012, 2012 INT C COMP COMM, P1, DOI [10.1109/ICCCI.2012.6158779, DOI 10.1109/ICCCI.2012.6158779]
[2]  
Kummer O., 2012, PROC CORIA, P273
[3]  
Liu B, 2004, PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, P425
[4]  
Mahinovs A., 2007, Text classification method review
[5]  
Pandey Amit, 2017, International Journal of Computer Network and Information Security, V9, P36, DOI 10.5815/ijcnis.2017.11.04
[6]  
sciencedirect.com, 2022, Z SCORE
[7]  
Westergaard D, 2018, SCORES TEXT MINING
[8]   The Problems and Methods of Automatic Text Document Classification [J].
Yatsko, V. A. .
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (06) :274-285
[9]  
Yatsko V.A, 2021, GRANI POZNANIYA, P52