Semantic Text Segment Classification of Structured Technical Content

被引:2
|
作者
Hoellig, Julian [1 ]
Dufter, Philipp [2 ]
Geierhos, Michaela [1 ]
Ziegler, Wolfgang [3 ]
Schuetze, Hinrich [2 ]
机构
[1] Bundeswehr Univ Munich, Res Inst CODE, Neubiberg, Germany
[2] Ludwig Maximilians Univ Munchen, Ctr Language & Informat Proc, Munich, Germany
[3] Karlsruhe Univ Appl Sci, Informat Management & Media, Karlsruhe, Germany
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021) | 2021年 / 12801卷
基金
欧洲研究理事会;
关键词
Semantic text classification; Context features; Technical documentation;
D O I
10.1007/978-3-030-80599-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic tagging in technical documentation is an important but error-prone process, with the objective to produce highly structured content for automated processing and standardized information delivery. Benefits thereof are consistent and didactically optimized documents, supported by professional and automatic styling for multiple target media. Using machine learning to automate the validation of the tagging process is a novel approach, for which a new, high-quality dataset is provided in ready-to-use training, validation and test sets. In a series of experiments, we classified ten different semantic text segment types using both traditional and deep learning models. The experiments show partial success, with a high accuracy but relatively low macro-average performance. This can be attributed to a mix of a strong class imbalance, and high semantic and linguistic similarity among certain text types. By creating a set of context features, the model performances increased significantly. Although the data was collected to serve a specific use case, further valuable research can be performed in the areas of document engineering, class imbalance reduction, and semantic text classification.
引用
收藏
页码:165 / 177
页数:13
相关论文
共 50 条
  • [41] Automatic Classification of Semantic Content of Classroom Dialogue
    Song, Yu
    Lei, Shunwei
    Hao, Tianyong
    Lan, Zixin
    Ding, Ying
    JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 2021, 59 (03) : 496 - 521
  • [42] Text Content Analysis and Classification of Elementary Textbooks
    Ho, Zih-Ping
    Lin, Shu-Yen
    Sung, Yao-Ting
    2013 INTERNATIONAL CONFERENCE ON EDUCATION AND EDUCATIONAL RESEARCH (EER 2013), 2013, 1 : 325 - 328
  • [43] Tree-structured Curriculum Learning based on Semantic Similarity of Text
    Han, Sanggyu
    Myaeng, Sung-Hyon
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 971 - 976
  • [44] Learning to separate text content and style for classification
    Zhang, Dell
    Lee, Wee Sun
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 79 - 91
  • [45] Classification of handprinted Kanji characters by the structured segment matching method
    Yamashita, Yoshiyuki
    Higuchi, Koichi
    Yamada, Youichi
    Haga, Yunosuke
    PATTERN RECOGNITION LETTERS, 1983, 1 (5-6) : 475 - 479
  • [46] Classification method combining text content and label guided text encoding
    Wang Y.
    Zhou Y.
    Xu T.
    Shi Y.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49 (08): : 92 - 97
  • [47] Automated classification of content components in technical communication
    Oevermann, Jan
    Ziegler, Wolfgang
    COMPUTATIONAL INTELLIGENCE, 2018, 34 (01) : 30 - 48
  • [48] Granular Computing Techniques for Classification and Semantic Characterization of Structured Data
    Bianchi, Filippo Maria
    Scardapane, Simone
    Rizzi, Antonello
    Uncini, Aurelio
    Sadeghian, Alireza
    COGNITIVE COMPUTATION, 2016, 8 (03) : 442 - 461
  • [49] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
    Albitar, Shereen
    Fournier, Sebastien
    Espinasse, Bernard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
  • [50] Granular Computing Techniques for Classification and Semantic Characterization of Structured Data
    Filippo Maria Bianchi
    Simone Scardapane
    Antonello Rizzi
    Aurelio Uncini
    Alireza Sadeghian
    Cognitive Computation, 2016, 8 : 442 - 461