An Information Retrieval Approach for Text Mining of Medical Records Based on Graph Descriptor

被引:1
作者
Dudko, Alexander [1 ]
Endrjukaite, Tatiana [2 ]
Kiyoki, Yasushi [1 ]
机构
[1] Keio Univ, Grad Sch Media & Governance, Tokyo, Japan
[2] Transport & Telecommun Inst, Res Dept, Riga, Latvia
来源
INFORMATION MODELLING AND KNOWLEDGE BASES XXX | 2019年 / 312卷
关键词
information retrieval; graph descriptor; text mining; summary generation; ambiguity resolution;
D O I
10.3233/978-1-61499-933-1-334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a new method of data retrieval from free text documents in medical domain. Proposed approach creates the document summary and highlights most important keywords in the text. To achieve this result we process the document natural language text and build a descriptor as an internal representation of the document. This descriptor is a graph with concepts, relations between them, and concept points as a metric of relevance. By means of points in the descriptor the approach performs ambiguity resolution, selects most relevant concepts to display in the summary, and votes for keywords highlighting in the text. Besides the direct representation of identified information in the summary, this work proposes a way to provide extended summary by using additional knowledge about relations between medications, procedures, diseases and anatomy. The described approach helps to speed up analysis and decision making processes by means of providing aggregated summary for a document and highlighting most meaningful parts of the document's text. Experiment results demonstrate that automatic summary generation and keywords highlighting can be successfully performed by the proposed approach to achieve meaningful and highly relevant results.
引用
收藏
页码:334 / 352
页数:19
相关论文
共 12 条
[1]  
[Anonymous], SIMPL SIMPL SUMM TOO
[2]  
CARBONELL J, 1998, P ACM SIGIR MELB AUS
[3]   GoPubMed: Exploring PubMed with the gene ontology [J].
Doms, A ;
Schroeder, M .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W783-W786
[4]   A literature network of human genes for high-throughput analysis of gene expression [J].
Tor-Kristian Jenssen ;
Astrid Lægreid ;
Jan Komorowski ;
Eivind Hovig .
Nature Genetics, 2001, 28 (1) :21-28
[5]  
Lin Hui, 2012, P C UNC ART INT CAT
[6]  
Mihalcea Rada, 2004, P INT C EMP METH NAT
[7]   Versatile question answering systems: Seeing in synthesis [J].
Mittal S. ;
Mittal A. .
International Journal of Intelligent Information and Database Systems, 2011, 5 (02) :119-142
[8]  
Molla D., 2015, LANG RESOUR EVAL, P1
[9]  
Ren Pengjie, 2017, P 40 INT ACM SIGIR T
[10]   Query-oriented evidence extraction to support evidence-based medicine practice [J].
Sarker, Abeed ;
Molla, Diego ;
Paris, Cecile .
JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 59 :169-184