Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records

被引:31
作者
Zheng, Le [1 ,2 ]
Wang, Yue [2 ,3 ]
Hao, Shiying [2 ]
Shin, Andrew Y. [2 ]
Jin, Bo [4 ]
Ngo, Anh D. [4 ]
Jackson-Browne, Medina S. [4 ]
Feller, Daniel J. [4 ]
Fu, Tianyun [4 ]
Zhang, Karena [2 ]
Zhou, Xin [5 ]
Zhu, Chunqing [4 ]
Dai, Dorothy [4 ]
Yu, Yunxian [6 ]
Zheng, Gang [3 ]
Li, Yu-Ming [5 ]
McElhinney, Doff B. [2 ]
Culver, Devore S. [7 ]
Alfreds, Shaun T. [7 ]
Stearns, Frank [4 ]
Sylvester, Karl G. [2 ]
Widen, Eric [4 ]
Ling, Xuefeng Bruce [2 ,6 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Stanford Univ, S370 Grant Bldg, Stanford, CA 94305 USA
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[4] HBI Solut Inc, Palo Alto, CA USA
[5] Pingjin Hosp Heart Ctr, Tianjin Key Lab Cardiovasc Remodeling & Target Or, Tianjin, Peoples R China
[6] Zhejiang Univ, Sch Med, Hangzhou, Zhejiang, Peoples R China
[7] HealthInfoNet, Portland, ME USA
关键词
electronic medical record; natural language processing; diabetes mellitus; data mining; RISK SCORE; HYPERTENSION; DISEASE; OBESITY;
D O I
10.2196/medinform.6328
中图分类号
R-058 [];
学科分类号
摘要
Background: Diabetes case finding based on structured medical records does not fully identify diabetic patients whose medical histories related to diabetes are available in the form of free text. Manual chart reviews have been used but involve high labor costs and long latency. Objective: This study developed and tested a Web-based diabetes case finding algorithm using both structured and unstructured electronic medical records (EMRs). Methods: This study was based on the health information exchange (HIE) EMR database that covers almost all health facilities in the state of Maine, United States. Using narrative clinical notes, a Web-based natural language processing (NLP) case finding algorithm was retrospectively (July 1, 2012, to June 30, 2013) developed with a random subset of HIE-associated facilities, which was then blind tested with the remaining facilities. The NLP-based algorithm was subsequently integrated into the HIE database and validated prospectively (July 1, 2013, to June 30, 2014). Results: Of the 935,891 patients in the prospective cohort, 64,168 diabetes cases were identified using diagnosis codes alone. Our NLP-based case finding algorithm prospectively found an additional 5756 uncodified cases (5756/64,168, 8.97% increase) with a positive predictive value of .90. Of the 21,720 diabetic patients identified by both methods, 6616 patients (6616/21,720, 30.46%) were identified by the NLP-based algorithm before a diabetes diagnosis was noted in the structured EMR (mean time difference = 48 days). Conclusions: The online NLP algorithm was effective in identifying uncodified diabetes cases in real time, leading to a significant improvement in diabetes case finding. The successful integration of the NLP-based case finding algorithm into the Maine HIE database indicates a strong potential for application of this novel method to achieve a more complete ascertainment of diagnoses of diabetes mellitus.
引用
收藏
页码:38 / 50
页数:13
相关论文
共 47 条
  • [1] A risk score for predicting incident diabetes in the Thai population
    Aekplakorn, Wichai
    Bunnag, Pongamorn
    Woodward, Mark
    Sritara, Piyamitr
    Cheepudomwit, Sayan
    Yamwong, Sukit
    Yipintsoi, Tada
    Rajatanavin, Rajata
    [J]. DIABETES CARE, 2006, 29 (08) : 1872 - 1877
  • [2] [Anonymous], 2016, Diabetes Care, V39, pS13, DOI DOI 10.2337/DC16-ER09
  • [3] Optimum BMI Cut Points to Screen Asian Americans for Type 2 Diabetes
    Araneta, Maria Rosario G.
    Kanaya, Alka M.
    Hsu, William C.
    Chang, Healani K.
    Grandinetti, Andrew
    Boyko, Edward J.
    Hayashi, Tomoshige
    Kahn, Steven E.
    Leonetti, Donna L.
    McNeely, Marguerite J.
    Onishi, Yukiko
    Sato, Kyoko K.
    Fujimoto, Wilfred Y.
    [J]. DIABETES CARE, 2015, 38 (05) : 814 - 820
  • [4] Predicting Diabetes: Clinical, Biological, and Genetic Approaches Data from the Epidemiological Study on the Insulin Resistance Syndrome (DESIR)
    Balkau, Beverley
    Lange, Celine
    Fezeu, Leopold
    Tichet, Jean
    De Lauzon-Guillain, Blandine
    Cernichow, Sebastien
    Fumeron, Frederic
    Froguel, Philippe
    Vaxillaire, Martine
    Cauchi, Stephane
    Ducimetiere, Pierre
    Eschwege, Eveline
    [J]. DIABETES CARE, 2008, 31 (10) : 2056 - 2061
  • [5] Appropriate body-mass index for Asian populations and its implications for policy and intervention strategies
    Barba, C
    Cavalli-Sforza, T
    Cutter, J
    Darnton-Hill, I
    Deurenberg, P
    Deurenberg-Yap, M
    Gill, T
    James, P
    Ko, G
    Miu, AH
    Kosulwat, V
    Kumanyika, S
    Kurpad, A
    Mascie-Taylor, N
    Moon, HK
    Nishida, C
    Noor, MI
    Reddy, KS
    Rush, E
    Schultz, JT
    Seidell, J
    Stevens, J
    Swinburn, B
    Tan, K
    Weisell, R
    Wu, ZS
    Yajnik, CS
    Yoshiike, N
    Zimmet, P
    [J]. LANCET, 2004, 363 (9403) : 157 - 163
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence
    Carrell, David S.
    Halgrim, Scott
    Diem-Thy Tran
    Buist, Diana S. M.
    Chubak, Jessica
    Chapman, Wendy W.
    Savova, Guergana
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 179 (06) : 749 - 758
  • [8] Centers for Disease Control and Prevention, 2011, NAT DIAB FACT SHEET
  • [9] Centers for Disease Control and Prevention (CDC), 2014, NAT DIAB STAT REP ET
  • [10] Extending the NegEx Lexicon for Multiple Languages
    Chapman, Wendy W.
    Hillert, Dieter
    Velupillai, Sumithra
    Kvist, Maria
    Skeppstedt, Maria
    Chapman, Brian E.
    Conway, Mike
    Tharp, Melissa
    Mowery, Danielle L.
    Deleger, Louise
    [J]. MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 677 - 681