Semantic Text Segment Classification of Structured Technical Content

被引:2
|
作者
Hoellig, Julian [1 ]
Dufter, Philipp [2 ]
Geierhos, Michaela [1 ]
Ziegler, Wolfgang [3 ]
Schuetze, Hinrich [2 ]
机构
[1] Bundeswehr Univ Munich, Res Inst CODE, Neubiberg, Germany
[2] Ludwig Maximilians Univ Munchen, Ctr Language & Informat Proc, Munich, Germany
[3] Karlsruhe Univ Appl Sci, Informat Management & Media, Karlsruhe, Germany
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021) | 2021年 / 12801卷
基金
欧洲研究理事会;
关键词
Semantic text classification; Context features; Technical documentation;
D O I
10.1007/978-3-030-80599-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic tagging in technical documentation is an important but error-prone process, with the objective to produce highly structured content for automated processing and standardized information delivery. Benefits thereof are consistent and didactically optimized documents, supported by professional and automatic styling for multiple target media. Using machine learning to automate the validation of the tagging process is a novel approach, for which a new, high-quality dataset is provided in ready-to-use training, validation and test sets. In a series of experiments, we classified ten different semantic text segment types using both traditional and deep learning models. The experiments show partial success, with a high accuracy but relatively low macro-average performance. This can be attributed to a mix of a strong class imbalance, and high semantic and linguistic similarity among certain text types. By creating a set of context features, the model performances increased significantly. Although the data was collected to serve a specific use case, further valuable research can be performed in the areas of document engineering, class imbalance reduction, and semantic text classification.
引用
收藏
页码:165 / 177
页数:13
相关论文
共 50 条
  • [31] THE APPLICATION OF LATENT SEMANTIC INDEXING AND ONTOLOGY IN TEXT CLASSIFICATION
    Yang, Xi-Quan
    Sun, Na
    Sun, Tie-Li
    Cao, Xue-Ya
    Zheng, Xiao-Juan
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4491 - 4499
  • [32] Characterization and classification of semantic image-text relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2020, 9 (01) : 31 - 45
  • [33] A Short Text Classification Algorithm Based on Semantic Extension
    ZHOU Yajian
    DENG Dingpeng
    CHI Junhui
    Chinese Journal of Electronics, 2021, 30 (01) : 153 - 159
  • [34] A Knowledge-Based Semantic Kernel for Text Classification
    Nasir, Jamal Abdul
    Karim, Asim
    Tsatsaronis, George
    Varlamis, Iraklis
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 261 - +
  • [35] Study of Chinese Text Classification Algorithm on Semantic Web
    Yin, Shiqun
    Wang, Fang
    Qiu, Yuhui
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION WORKSHOP: IITA 2008 WORKSHOPS, PROCEEDINGS, 2008, : 387 - 390
  • [36] An Approach Based on Semantic Relationship Embeddings for Text Classification
    Laura Lezama-Sanchez, Ana
    Tovar Vidal, Mireya
    Reyes-Ortiz, Jose A.
    MATHEMATICS, 2022, 10 (21)
  • [37] Semantic Role-based Representations in Text Classification
    Sinoara, Roberta A.
    Rossi, Rafael G.
    Rezende, Solange O.
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2313 - 2318
  • [38] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [39] Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity
    Song, Xin
    Huang, Jia
    Zhou, Jing-min
    Chen, Xi
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 187 - +
  • [40] Learning Semantic Text Features for Web Text-Aided Image Classification
    Wang, Dongzhe
    Mao, Kezhi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) : 2985 - 2996