FoodChem: A food-chemical relation extraction model

被引:5
作者
Cenikj, Gjorgjina [1 ,2 ]
Seljak, Barbara Korousic [2 ]
Eftimov, Tome [2 ]
机构
[1] Jozef Stefan Int Postgrad Sch, Ljubljana, Slovenia
[2] Jozef Stefan Inst, Comp Syst Dept, Ljubljana, Slovenia
来源
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021) | 2021年
关键词
relation extraction; information extraction; food-chemical relations;
D O I
10.1109/SSCI50451.2021.9660161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present FoodChem, a new Relation Extraction (RE) model for identifying chemicals present in the composition of food entities, based on textual information provided in biomedical peer-reviewed scientific literature. The RE task is treated as a binary classification problem, aimed at identifying whether the contains relation exists between a food-chemical entity pair. This is accomplished by fine-tuning BERT, BioBERT and RoBERTa transformer models. For evaluation purposes, a novel dataset with annotated contains relations in food-chemical entity pairs is generated, in a golden and silver version. The models are integrated into a voting scheme in order to produce the silver version of the dataset which we use for augmenting the individual models, while the manually annotated golden version is used for their evaluation. Out of the three evaluated models, the BioBERT model achieves the best results, with a macro averaged F1 score of 0.902 in the unbalanced augmentation setting.
引用
收藏
页数:8
相关论文
共 45 条
  • [1] Gut Microbioma Population: An Indicator Really Sensible to Any Change in Age, Diet, Metabolic Syndrome, and Life-Style
    Annalisa, Noce
    Alessio, Tarantino
    Claudette, Tsague Djoutsop
    Erald, Vasili
    Antonino, De Lorenzo
    Nicola, Di Daniele
    [J]. MEDIATORS OF INFLAMMATION, 2014, 2014
  • [2] BuTTER: BidirecTional LSTM for Food Named-Entity Recognition
    Cenikj, Gjorgjina
    Popovski, Gorjan
    Stojanov, Riste
    Seljak, Barbara Korousic
    Eftimov, Tome
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3550 - 3556
  • [3] Cenikj Gjorgjina, 2021, P 20 WORKSH BIOM LAN, P30, DOI DOI 10.18653/V1/2021.BIONLP-1.4
  • [4] Collovini S, 2016, LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1908
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] DietRx, DIETRX IND
  • [7] Ekaba Bisong, 2019, Google Colaboratory, P59, DOI [10.1007/978-1-4842-4470-8_7, DOI 10.1007/978-1-4842-4470-87, 10.1007/978-1-4842-4470-87]
  • [8] KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences
    Ernst, Patrick
    Siu, Amy
    Weikum, Gerhard
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [9] The NCBI Taxonomy database
    Federhen, Scott
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D136 - D143
  • [10] foodb.ca, FOOD DAT