FEATURE SELECTION AND CLASSIFICATION INTEGRATED METHOD FOR IDENTIFYING CITED TEXT SPANS FOR CITANCES ON IMBALANCED DATA

被引:0
|
作者
Yee, Jen-Yuan [1 ]
Tsai, Cheng-Jung [2 ]
Hsu, Tien-Yu [3 ]
Lin, Jung-Yi [4 ]
Cheng, Pei-Cheng [5 ]
机构
[1] Natl Museum Nat Sci, Visitor Serv, Dept Operat, Collect & Informat Management, Taichung 40453, Taiwan
[2] Natl Changhua Univ Educ, Grad Inst Stat & Informat Sci, Changhua 50007, Taiwan
[3] Natl Museum Nat Sci, Dept Sci Educ, Taichung 40453, Taiwan
[4] Hon Hai Precis IndCo Ltd Foxconn, IP Affairs Div, Taipei 11492, Taiwan
[5] Chien Hsin Univ Sci & Technol, Dept Informat Management, Taoyuan 32097, Taiwan
关键词
Citation analysis; cited text spans identification; feature selection; classification; class imbalance; performance evaluation; scientific paper summarization;
D O I
10.22452/mjcs.vol34no4.3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies in scientific paper summarization have explored a new form of structured summary for a reference paper by grouping all cited and citing sentences together by facet. This involves three main tasks: (1) identifying cited text spans for citances (i.e., citing sentences), (2) classifying their discourse facets, and (3) generating a structured summary from the cited text spans and citances. This paper focuses on the first task, and approaches the task as binary classification to distinguish relevant pairs of citances and reference sentences from irrelevant pairs. We propose a new method that integrates feature selection and classification techniques to enhance classification performance. The proposed method investigates combinations of six feature selection methods (chi(2)-Statistics, Information Gain, Gain Ratio, Relief-F, Significance Attribute Evaluation, and Symmetrical Uncertainty), and five classification algorithms (k-Nearest Neighbors, Decision Tree, Support Vector Machine, Naive Bayes, and Random Forest). Additionally, to address imbalanced data during training, we apply SMOTE (Synthetic Minority Over sampling Technique) to introduce synthetic biases towards the minority. Experiments are conducted using the CLSciSumm corpora to compare the effect of feature selection applied to classification. The results reveal the benefits of feature selection in significantly boosting performance of F-1 score metric, and show that our method is competitive to the state-of-the-art methods in the CL-SciSumm evaluations.
引用
收藏
页码:355 / 373
页数:19
相关论文
共 50 条
  • [31] A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data
    Pouramini, Jafar
    Minaei-Bidgoli, Behrouze
    Esmaeili, Mahdi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (08): : 3725 - 3748
  • [32] Evolutionary multistage multitasking method for feature selection in imbalanced data
    Ding, Weiping
    Yao, Hongcheng
    Huang, Jiashuang
    Hou, Tao
    Geng, Yu
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 92
  • [33] Weighted Gini Index Feature Selection Method for Imbalanced Data
    Liu, Haoyue
    Zhou, MengChu
    Lu, Xiaoyu Sean
    Yao, Cynthia
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [34] The Text Classification for Imbalanced Data Sets
    Li, Yanling
    Zhu, Yehang
    Yang, Ping
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
  • [35] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [36] Research on Feature Selection and kNN Classification Method in Chinese Text Classification
    Xiao Chao
    Wu Ping
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 956 - 962
  • [37] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [38] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [39] Evolutionary feature selection for imbalanced data
    Tusell Rey, Claudia C.
    Salinas Garcia, Viridiana
    Villuendas-Rey, Yenny
    2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [40] Class-index corpus-index measure: A novel feature selection method for imbalanced text data
    Parlak, Bekir
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (21):