Biomedical heterogeneous data categorization and schema mapping toward data integration

被引:1
作者
Deshpande, Priya [1 ]
Rasin, Alexander [2 ]
Tchoua, Roselyne [2 ]
Furst, Jacob [2 ]
Raicu, Daniela [2 ]
Schinkel, Michiel [3 ]
Trivedi, Hari [4 ]
Antani, Sameer [5 ]
机构
[1] Marquette Univ, Milwaukee, WI 53233 USA
[2] DePaul Univ, Chicago, IL USA
[3] Univ Amsterdam, Ctr Expt & Mol Med CEMM, Amsterdam, Netherlands
[4] Emory Univ, Atlanta, GA USA
[5] Natl Lib Med, NIH, Bethesda, MD USA
来源
FRONTIERS IN BIG DATA | 2023年 / 6卷
基金
美国国家卫生研究院;
关键词
data categorization; data integration; datasets; heterogeneous data; schema mapping; semantic similarity; unstructured data; CLUSTERING METHOD;
D O I
10.3389/fdata.2023.1173038
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.
引用
收藏
页数:13
相关论文
共 24 条
[1]  
Amini A., 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2011), P1652, DOI 10.1109/FSKD.2011.6019867
[2]  
Armato S, 2017, MED INFORMAT HEALTHC, P10
[3]   Impact of artificial intelligence on radiology: a EuroAIM survey among members of the European Society of Radiology [J].
Brkljacic, Boris ;
Derchi, Lorenzo E. ;
Hamm, Bernd ;
Fuchsjager, Michael ;
Krestin, Gabriel ;
Dewey, Marc ;
Parizel, Paul ;
Clark, Jonathan ;
Codari, Marina ;
Melazzini, Luca ;
Morozov, Sergey P. ;
van Kuijk, Cornelis C. ;
Sconfienza, Luca M. ;
Sardanelli, Francesco .
INSIGHTS INTO IMAGING, 2019, 10 (01)
[4]   EXTENSIONS TO THE CART ALGORITHM [J].
CRAWFORD, SL .
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1989, 31 (02) :197-217
[5]  
Deshpande P., 2019, 2019 KNOWLEDGE DISCO, DOI [10.5220/0008166603720383, DOI 10.5220/0008166603720383]
[6]   Enhancing Recall Using Data Cleaning for Biomedical Big Data [J].
Deshpande, Priya ;
Rasin, Alexander ;
Tchoua, Roselyne ;
Furst, Jacob ;
Raicu, Daniela A. ;
Antani, Sameer .
2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, :265-270
[7]   Ontology-Based Radiology Teaching File Summarization, Coverage, and Integration [J].
Deshpande, Priya ;
Rasin, Alexander ;
Son, Jun ;
Kim, Sungmin ;
Brown, Eli ;
Furst, Jacob ;
Raicu, Daniela S. ;
Montner, Steven M. ;
Armato, Samuel G., III .
JOURNAL OF DIGITAL IMAGING, 2020, 33 (03) :797-813
[8]   DiiS: A Biomedical Data Access Framework for Aiding Data Driven Research Supporting FAIR Principles [J].
Deshpande, Priya ;
Rasin, Alexander ;
Furst, Jacob ;
Raicu, Daniela ;
Antani, Sameer .
DATA, 2019, 4 (02)
[9]  
Deshpande P, 2018, 2018 IEEE LIFE SCIENCES CONFERENCE (LSC), P195, DOI 10.1109/LSC.2018.8572185
[10]   What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions [J].
Gagalova, Kristina K. ;
Elizalde, M. Angelica Leon ;
Portales-Casamar, Elodie ;
Gorges, Matthias .
JMIR FORMATIVE RESEARCH, 2020, 4 (08)