RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research

被引:5
作者
Schiappa, Renaud [1 ]
Contu, Sara [1 ]
Culie, Dorian [2 ]
Thamphya, Brice [1 ]
Chateau, Yann [1 ]
Gal, Jocelyn [1 ]
Bailleux, Caroline [3 ]
Haudebourg, Juliette [4 ]
Ferrero, Jean-Marc [4 ]
Barranger, Emmanuel [3 ]
Chamorey, Emmanuel [1 ]
机构
[1] Univ Cote dAzur, Dept Epidemiol Biostat & Hlth Data, Ctr Antoine Lacassagne, Nice, France
[2] Univ Cote dAzur, Univ Inst Face & Neck, Cervico Facial Oncol Surg Dept, Nice, France
[3] Univ Cote dAzur, Dept Med Oncol, Ctr Antoine Lacassagne, Nice, France
[4] Univ Cote dAzur, Ctr Antoine Lacassagne, Anat & Pathol Cytol Lab, Nice, France
来源
JCO CLINICAL CANCER INFORMATICS | 2022年 / 6卷
关键词
D O I
10.1200/CCI.21.00199
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSEElectronic medical records are a valuable source of information about patients' clinical status but are often free-text documents that require laborious manual review to be exploited. Techniques from computer science have been investigated, but the literature has marginally focused on non-English language texts. We developed RUBY, a tool designed in collaboration with IBM-France to automatically structure clinical information from French medical records of patients with breast cancer.MATERIALS AND METHODSRUBY, which exploits state-of-the-art Named Entity Recognition models combined with keyword extraction and postprocessing rules, was applied on clinical texts. We investigated the precision of RUBY in extracting the target information.RESULTSRUBY has an average precision of 92.8% for the Surgery report, 92.7% for the Pathology report, 98.1% for the Biopsy report, and 81.8% for the Consultation report.CONCLUSIONThese results show that the automatic approach has the potential to effectively extract clinical knowledge from an extensive set of electronic medical records, reducing the manual effort required and saving a significant amount of time. A deeper semantic analysis and further understanding of the context in the text, as well as training on a larger and more recent set of reports, including those containing highly variable entities and the use of ontologies, could further improve the results.
引用
收藏
页数:10
相关论文
共 21 条
  • [1] Alawad Mohammed, 2018, 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), P218, DOI 10.1109/BHI.2018.8333408
  • [2] Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
    Alawad, Mohammed
    Gao, Shang
    Qiu, John X.
    Yoon, Hong Jun
    Christian, J. Blair
    Penberthy, Lynne
    Mumphrey, Brent
    Wu, Xiao-Cheng
    Coyle, Linda
    Tourassi, Georgia
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) : 89 - 98
  • [3] [Anonymous], US HEALTHC LEAD EXP
  • [4] Labeling for Big Data in radiation oncology: The Radiation Oncology Structures ontology
    Bibault, Jean-Emmanuel
    Zapletal, Eric
    Rance, Bastien
    Giraud, Philippe
    Burgun, Anita
    [J]. PLOS ONE, 2018, 13 (01):
  • [5] Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model
    Coden, Anni
    Savova, Guergana
    Sominsky, Igor
    Tanenblatt, Michael
    Masanz, James
    Schuler, Karin
    Cooper, James
    Guan, Wei
    de Groen, Piet C.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 937 - 949
  • [6] Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records
    Forsyth, Alexander W.
    Barzilay, Regina
    Hughes, Kevin S.
    Lui, Dickson
    Lorenz, Karl A.
    Enzinger, Andrea
    Tulsky, James A.
    Lindvall, Charlotta
    [J]. JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2018, 55 (06) : 1492 - 1499
  • [7] Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research
    Hanauer, David A.
    Barnholtz-Sloan, Jill S.
    Beno, Mark F.
    Del Fiol, Guilherme
    Durbin, Eric B.
    Gologorskaya, Oksana
    Harris, Daniel
    Harnett, Brett
    Kawamoto, Kensaku
    May, Benjamin
    Meeks, Eric
    Pfaff, Emily
    Weiss, Janie
    Zheng, Kai
    [J]. JCO CLINICAL CANCER INFORMATICS, 2020, 4 : 454 - 463
  • [8] Honnibal M., 2017, SPACY 2 NATURAL LANG
  • [9] Huang ZH, 2015, Arxiv, DOI [arXiv:1508.01991, DOI 10.48550/ARXIV.1508.01991]
  • [10] A Survey on Deep Learning for Named Entity Recognition
    Li, Jing
    Sun, Aixin
    Han, Jianglei
    Li, Chenliang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 50 - 70