Semantic Text Segment Classification of Structured Technical Content

被引:2
|
作者
Hoellig, Julian [1 ]
Dufter, Philipp [2 ]
Geierhos, Michaela [1 ]
Ziegler, Wolfgang [3 ]
Schuetze, Hinrich [2 ]
机构
[1] Bundeswehr Univ Munich, Res Inst CODE, Neubiberg, Germany
[2] Ludwig Maximilians Univ Munchen, Ctr Language & Informat Proc, Munich, Germany
[3] Karlsruhe Univ Appl Sci, Informat Management & Media, Karlsruhe, Germany
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021) | 2021年 / 12801卷
基金
欧洲研究理事会;
关键词
Semantic text classification; Context features; Technical documentation;
D O I
10.1007/978-3-030-80599-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic tagging in technical documentation is an important but error-prone process, with the objective to produce highly structured content for automated processing and standardized information delivery. Benefits thereof are consistent and didactically optimized documents, supported by professional and automatic styling for multiple target media. Using machine learning to automate the validation of the tagging process is a novel approach, for which a new, high-quality dataset is provided in ready-to-use training, validation and test sets. In a series of experiments, we classified ten different semantic text segment types using both traditional and deep learning models. The experiments show partial success, with a high accuracy but relatively low macro-average performance. This can be attributed to a mix of a strong class imbalance, and high semantic and linguistic similarity among certain text types. By creating a set of context features, the model performances increased significantly. Although the data was collected to serve a specific use case, further valuable research can be performed in the areas of document engineering, class imbalance reduction, and semantic text classification.
引用
收藏
页码:165 / 177
页数:13
相关论文
共 50 条
  • [21] The Research of Semantic Kernel in SVM for Chinese Text Classification
    Mai Fanjin
    Huang Ling
    Tan Jing
    Wang Xinzheng
    IIP'17: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2017,
  • [22] Neural Network Agents for Learning Semantic Text Classification
    Stefan Wermter
    Information Retrieval, 2000, 3 : 87 - 103
  • [23] Combining Lexical and Semantic Features for Short Text Classification
    Yang, Lili
    Li, Chunping
    Ding, Qiang
    Li, Li
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 78 - 86
  • [24] Semantic similarity metric and its application in text classification
    Zhang, Pei-ying
    PROGRESS IN CIVIL ENGINEERING, PTS 1-4, 2012, 170-173 : 3711 - 3714
  • [25] Text Classification via Learning Semantic Dependency and Association
    Zhu, Guanqi
    Tao, Hanqing
    Wu, Han
    Chen, Liyi
    Liu, Ye
    Liu, Qi
    Chen, Enhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [26] A Short Text Classification Algorithm Based on Semantic Extension
    Zhou, Yajian
    Deng, Dingpeng
    Chi, Junhui
    CHINESE JOURNAL OF ELECTRONICS, 2021, 30 (01) : 153 - 159
  • [27] Semantic and Morphological Information Guided Chinese Text Classification
    Song, Jiayu
    Xu, Qinghua
    Liu, Wei
    Zu, Yueran
    Chen, Mengdong
    MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 14 - 26
  • [28] Semantic dictionary based method for short text classification
    Tang, Hao-Jin
    Yan, Dan-Feng
    Tian, Yuan
    Journal of China Universities of Posts and Telecommunications, 2013, 20 (SUPPL. 1): : 15 - 19
  • [29] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [30] Semantic matching for text classification with complex class descriptions
    de Silva, Brian M.
    Huang, Kuan-Wen
    Lee, Gwang Gook
    Hovsepian, Karen
    Xu, Yan
    Shen, Mingwei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7654 - 7680