Semantic Text Segment Classification of Structured Technical Content

被引:2
|
作者
Hoellig, Julian [1 ]
Dufter, Philipp [2 ]
Geierhos, Michaela [1 ]
Ziegler, Wolfgang [3 ]
Schuetze, Hinrich [2 ]
机构
[1] Bundeswehr Univ Munich, Res Inst CODE, Neubiberg, Germany
[2] Ludwig Maximilians Univ Munchen, Ctr Language & Informat Proc, Munich, Germany
[3] Karlsruhe Univ Appl Sci, Informat Management & Media, Karlsruhe, Germany
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021) | 2021年 / 12801卷
基金
欧洲研究理事会;
关键词
Semantic text classification; Context features; Technical documentation;
D O I
10.1007/978-3-030-80599-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic tagging in technical documentation is an important but error-prone process, with the objective to produce highly structured content for automated processing and standardized information delivery. Benefits thereof are consistent and didactically optimized documents, supported by professional and automatic styling for multiple target media. Using machine learning to automate the validation of the tagging process is a novel approach, for which a new, high-quality dataset is provided in ready-to-use training, validation and test sets. In a series of experiments, we classified ten different semantic text segment types using both traditional and deep learning models. The experiments show partial success, with a high accuracy but relatively low macro-average performance. This can be attributed to a mix of a strong class imbalance, and high semantic and linguistic similarity among certain text types. By creating a set of context features, the model performances increased significantly. Although the data was collected to serve a specific use case, further valuable research can be performed in the areas of document engineering, class imbalance reduction, and semantic text classification.
引用
收藏
页码:165 / 177
页数:13
相关论文
共 50 条
  • [1] Academic text classification based on lexical-semantic content
    Venegas, Rene
    REVISTA SIGNOS, 2007, 40 (63): : 239 - 271
  • [2] Semantic Text Compression for Classification
    Kutay, Emrecan
    Yener, Aylin
    2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 1368 - 1373
  • [3] Semantic Enrichment of Text Representation with Wikipedia for Text Classification
    Yamakawa, Hiroki
    Peng, Jing
    Feldman, Anna
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [4] Boosting for text classification with semantic features
    Bloehdorn, Stephan
    Hotho, Andreas
    ADVANCES IN WEB MINING AND WEB USAGE ANALYSIS, 2006, 3932 : 149 - 166
  • [5] Semantic Conceptual Primitives Computing in Text Classification
    Zhang, Quan
    Yuan, Yi
    Wei, Xiangfeng
    Chi, Zhejie
    Cong, Peimin
    Du, Yihua
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 215 - 218
  • [6] Text classification for DAG-structured categories
    Nguyen, CD
    Dung, TA
    Cao, TH
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 290 - 300
  • [7] Text Classification Based on Title Semantic Information
    Liu, YunXiang
    Xu, Qi
    Wang, ChunYa
    2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 29 - 33
  • [8] Semantic text classification of emergent disease reports
    Zhang, Yi
    Liu, Bing
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 629 - +
  • [9] Combined syntactic and semantic kernels for text classification
    Bloehdorn, Stephan
    Moschitti, Alessandro
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 307 - +
  • [10] Semantic Clustering for a Functional Text Classification Task
    Lippincott, Thomas
    Passonneau, Rebecca
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 509 - +