Span-based model for overlapping entity recognition and multi-relations classification in the food domain

被引:3
作者
Zhang, Mengqi [1 ,2 ]
Ma, Lei [1 ,2 ]
Ren, Yanzhao [3 ]
Zhang, Ganggang [4 ]
Liu, Xinliang [1 ,2 ]
机构
[1] Beijing Technol & Business Univ, Sch E Business & Logist, Beijing 100048, Peoples R China
[2] Beijing Technol & Business Univ, Natl Engn Lab Agriprod Qual Traceabil, Beijing 100048, Peoples R China
[3] Beijing Technol & Business Univ, Sch Comp Sci & Engn, Beijing 100048, Peoples R China
[4] Capital Normal Univ, Digital Campus Construct Ctr, Beijing 100048, Peoples R China
关键词
information extraction; span-based approach; overlapping entity recognition; category marker; multi-relations classification; entity attributes;
D O I
10.3934/mbe.2022240
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Information extraction (IE) is an important part of the entire knowledge graph lifecycle. In the food domain, extracting information such as ingredient and cooking method from Chinese recipes is crucial to safety risk analysis and identification of ingredient. In comparison with English, due to the complex structure, the richness of information in word combination, and lack of tense, Chinese IE is much more challenging. This dilemma is particularly prominent in the food domain with high-density knowledge, imprecise syntactic structure. However, existing IE methods focus only on the features of entities in a sentence, such as context and position, and ignore features of the entity itself and the influence of self attributes on prediction of inter entity relationship. To solve the problems of overlapping entity recognition and multi-relations classification in the food domain, we propose a span-based model known as SpIE for IE. The SpIE uses the span representation for each possible candidate entity to capture span-level features, which transforms named entity recognition (NER) into a classification mission. Besides, SpIE feeds extra information about the entity into the relation classification (RC) model by considering the effect of entity's attributes (both the entity mention and entity type) on the relationship between entity pairs. We apply SpIE on two datasets and observe that SpIE significantly outperforms the previous neural approaches due to capture the feature of overlapping entity and entity attributes, and it remains very competitive in general IE.
引用
收藏
页码:5134 / 5152
页数:19
相关论文
共 39 条
[1]   Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents [J].
Al-Zaidy, Rabah A. ;
Caragea, Cornelia ;
Giles, C. Lee .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :2551-2557
[2]  
[Anonymous], 2007, Technical Report
[3]  
Bekoulis G, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2830
[4]   Joint entity recognition and relation extraction as a multi-head selection problem [J].
Bekoulis, Giannis ;
Deleu, Johannes ;
Demeester, Thomas ;
Develder, Chris .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 :34-45
[5]  
[曹明宇 Cao Mingyu], 2019, [计算机研究与发展, Journal of Computer Research and Development], V56, P1432
[6]   Pre-Training With Whole Word Masking for Chinese BERT [J].
Cui, Yiming ;
Che, Wanxiang ;
Liu, Ting ;
Qin, Bing ;
Yang, Ziqing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3504-3514
[7]  
Nguyen DQ, 2019, LECT NOTES COMPUT SC, V11437, P729, DOI 10.1007/978-3-030-15712-8_47
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Dixit K, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5308
[10]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174