Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text

被引:1
作者
Holat, Pierre [1 ]
Tomeh, Nadi [1 ]
Charnois, Thierry [1 ]
Battistelli, Delphine [2 ]
Jaulent, Marie-Christine [3 ]
Metivier, Jean-Philippe [4 ]
机构
[1] Univ Paris 13, LIPN, Sorbonne Paris Cite, Paris, France
[2] Univ Paris Ouest Nanterre La Def, MoDyCo, Paris, France
[3] INSERM, Paris, France
[4] Univ Caen Basse Normandie, GREYC, Caen, France
来源
ADVANCES IN INTELLIGENT DATA ANALYSIS XV | 2016年 / 9897卷
关键词
Information extraction; Pattern mining; CRF; Symptoms recognition; Biomedical texts; EXTRACTION;
D O I
10.1007/978-3-319-46349-0_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore, existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.
引用
收藏
页码:192 / 203
页数:12
相关论文
共 19 条
  • [1] Agrawal R., 1995, P 11 INT C DAT ENG T, V3, P3
  • [2] [Anonymous], 2005, P 43 ANN M ASS COMP, DOI DOI 10.3115/1219840.1219885
  • [3] Sequence Mining under Multiple Constraints
    Bechet, Nicolas
    Cellier, Peggy
    Charnois, Thierry
    Cremilleux, Bruno
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 908 - 914
  • [4] Cohen KB, 2010, CH CRC MACH LEARN PA, P605
  • [5] NCBI disease corpus: A resource for disease name recognition and concept normalization
    Dogan, Rezarta Islamaj
    Leaman, Robert
    Lu, Zhiyong
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 47 : 1 - 10
  • [6] DISTRIBUTIONAL STRUCTURE
    Harris, Zellig S.
    [J]. WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 1954, 10 (2-3): : 146 - 162
  • [7] Kokkinakis D., 2006, P 2 INT S SEM MIN BI
  • [8] Lafferty J.D., 2001, 2014 P 18 INT C MACH
  • [9] Leaman R., 2009, Proc. 3rd Int. Symp. Lang. Biol. Med, P82
  • [10] Martin L., 2014, P BIONLP 2014, P107