RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records

被引:3
作者
Cai, Jie [1 ]
Chen, Shenglin [1 ]
Guo, Siyun [1 ]
Wang, Suidong [1 ]
Li, Lintong [1 ]
Liu, Xiaotong [1 ]
Zheng, Keming [1 ]
Liu, Yudong [1 ]
Chen, Shiling [1 ]
机构
[1] Southern Med Univ, Nanfang Hosp, Ctr Reprod Med, Dept Gynecol & Obstet, Guangzhou 510515, Peoples R China
关键词
Diminished ovarian reserve; Electronic medical records; Natural language processing; Ovarian reserve; Premature ovarian failure; Premature ovarian insufficiency; REGULAR EXPRESSIONS; TEXT; INFORMATION; EXTRACTION; RESERVE; INSUFFICIENCY; INFERTILITY; MANAGEMENT; FAILURE;
D O I
10.1186/s12911-023-02239-8
中图分类号
R-058 [];
学科分类号
摘要
BackgroundThe ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital's electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload.MethodsWe presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation.ResultsThe overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518-0.9884) and ultrasonographic measures (F-score 0.9472-0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63.ConclusionA Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers.
引用
收藏
页数:13
相关论文
共 48 条
  • [1] An overview of MetaMap: historical perspective and recent advances
    Aronson, Alan R.
    Lang, Francois-Michel
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) : 229 - 236
  • [2] The diagnosis of male infertility: an analysis of the evidence to support the development of global WHO guidance-challenges and future research opportunities
    Barratt, Christopher L. R.
    Bjorndahl, Lars
    De Jonge, Christopher J.
    Lamb, Dolores J.
    Osorio Martini, Francisco
    McLachlan, Robert
    Oates, Robert D.
    van der Poel, Sheryl
    St John, Bianca
    Sigman, Mark
    Sokol, Rebecca
    Tournaye, Herman
    [J]. HUMAN REPRODUCTION UPDATE, 2017, 23 (06) : 660 - 680
  • [3] Active Learning of Regular Expressions for Entity Extraction
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (03) : 1067 - 1080
  • [4] Can a Machine Replace Humans in Building Regular Expressions? A Case Study
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE INTELLIGENT SYSTEMS, 2016, 31 (06) : 15 - 21
  • [5] Inference of Regular Expressions for Text Extraction from Examples
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (05) : 1217 - 1230
  • [6] Learning Text Patterns Using Separate-and-Conquer Genetic Programming
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. GENETIC PROGRAMMING (EUROGP 2015), 2015, 9025 : 16 - 27
  • [7] Automatic Synthesis of Regular Expressions from Examples
    Bartoli, Alberto
    Davanzo, Giorgio
    De Lorenzo, Andrea
    Medvet, Eric
    Sorio, Enrico
    [J]. COMPUTER, 2014, 47 (12) : 72 - 80
  • [8] Premature Ovarian Insufficiency: Past, Present, and Future
    Chon, Seung Joo
    Umair, Zobia
    Yoon, Mee-Sup
    [J]. FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2021, 9
  • [9] Diminished ovarian reserve, premature ovarian failure, poor ovarian responder-a plea for universal definitions
    Cohen, J.
    Chabbert-Buffet, N.
    Darai, E.
    [J]. JOURNAL OF ASSISTED REPRODUCTION AND GENETICS, 2015, 32 (12) : 1709 - 1712
  • [10] Denny Joshua C, 2009, Int J Med Inform, V78 Suppl 1, pS34, DOI 10.1016/j.ijmedinf.2008.09.001