Semi-structured document categorization with a semantic kernel

被引:17
|
作者
Aseervatham, Sujeevan [1 ]
Bennani, Younes [1 ]
机构
[1] Univ Paris 13, LIPN, UMR 7030, CNRS, F-93430 Villetaneuse, France
关键词
Mercer kernel; Support vector machine; Text categorization; Semantic similarity; Semi-structured data;
D O I
10.1016/j.patcog.2008.10.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. Furthermore, research in text categorization has mainly focused on "flat texts" whereas many documents are now semi-structured and especially under the XML format. In this paper, we propose a semantic kernel for semi-structured biomedical documents. The semantic meanings of words are extracted using the unified medical language system (UMLS) framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text documents. The results have shown that the semantic kernel outperforms the linear kernel and the naive Bayes classifier. Moreover, this kernel was ranked in the top 10 of the best algorithms among 44 classification methods at the 2007 Computational Medicine Center (CMC) Medical NLP International Challenge. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2067 / 2076
页数:10
相关论文
共 50 条
  • [1] A Semantic Kernel for semi-structured documents
    Aseervatham, Sujeevan
    Viennet, Emmanuel
    Bennani, Younes
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 403 - 408
  • [2] Exploiting structural information for semi-structured document categorization
    Bratko, A
    Filipic, B
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) : 679 - 694
  • [3] Lexical semantic SLVM for semi-structured document classification
    Wang, Luda
    Long, Jun
    Li, Zude
    He, Ye
    Journal of Information and Computational Science, 2015, 12 (01): : 307 - 316
  • [4] Semantic annotation of semi-structured documents
    Ranganathan, Girish R.
    Biletskiy, Yevgen
    Kaltchenko, Alexey
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 877 - +
  • [5] A semi-structured document model for text mining
    Yang, JW
    Chen, XO
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 603 - 610
  • [6] A semi-structured document model for text mining
    Jianwu Yang
    Xiaoou Chen
    Journal of Computer Science and Technology, 2002, 17 : 603 - 610
  • [7] Semi-structured document image matching and recognition
    Augereau, Olivier
    Journet, Nicholas
    Domenger, Jean-Philippe
    DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
  • [8] List data extraction in semi-structured document
    Xu, H
    Li, JZ
    Xu, P
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 584 - 585
  • [9] Compositional Semantic Parsing on Semi-Structured Tables
    Pasupat, Panupong
    Liang, Percy
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1470 - 1480
  • [10] Managing Semantic Evolutions in Semi-Structured Data
    Siqueira Nepomuceno, Pedro Ivo
    Braghetto, Kelly Rosa
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2023, PT I, 2023, 14146 : 179 - 185