Biomedical Named Entity Recognition Based on Self-supervised Deep Belief Network

被引:1
作者
Zhang Yajun [1 ]
Liu Zongtian [2 ]
Zhou Wen [2 ]
机构
[1] Shanghai Inst Precis Measurement & Test, Shanghai 201109, Peoples R China
[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomedical named entity recognition; Feature selection; Feature vector mapping; Threshold judgement; Self-supervision;
D O I
10.1049/cje.2020.03.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Named entity recognition is a fundamental and crucial issue of biomedical data mining. For effectively solving this issue, we propose a novel approach based on Deep belief network (DBN). We select nine entity features, and construct feature vector mapping tables by the recognition contribution degree of different values of them. Using the mapping tables, we transform words in biomedical texts to feature vectors. The DBN will identify entities by reducing dimensions of vector data. The extensive experimental results reveal that the novel approach has achieved excellent recognition performance, with 69.96% maximum value ofF-measure on GENIA 3.02 testing corpus. We propose a self-supervised DBN, which can decide whether to add supervised fine-tuning or not according to the recognition performance of each layer, can overcome the errors propagation problem, while the complexity of model is limited. Test analysis shows that the new DBN improves recognition performance, theF-measure increases to 72.12%.
引用
收藏
页码:455 / 462
页数:8
相关论文
共 20 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2008, BMC BIOINFORMATICS
  • [3] [Anonymous], 2009, Deep Boltzmann Machines
  • [4] Anwar M.W., 2015, Int. J. Hybrid Inf. Technol, V8, P279
  • [5] Protein names and how to find them
    Franzén, K
    Eriksson, G
    Olsson, F
    Asker, L
    Lidén, P
    Cöster, J
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) : 49 - 61
  • [6] Guoan C., 1999, J SE U, V10, P517
  • [7] ProMiner: rule-based protein and gene entity recognition
    Hanisch, D
    Fundel, K
    Mevissen, HT
    Zimmer, R
    Fluck, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [8] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [9] Training products of experts by minimizing contrastive divergence
    Hinton, GE
    [J]. NEURAL COMPUTATION, 2002, 14 (08) : 1771 - 1800
  • [10] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554