Sentence Similarity Measures Revisited: Ranking Sentences in Pubmed Documents

被引:7
作者
Chen, Qingyu [1 ]
Kim, Sun [1 ]
Wilbur, W. John [1 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, 8600 Rockville Pike, Bethesda, MD 20892 USA
来源
ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS | 2018年
关键词
Natural language processing; biomedicine; textual similarity;
D O I
10.1145/3233547.3233640
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
While various measures are available for computing sentence similarity, few studies have examined their performance in the biomedical domain. Motivated by BIOSSES, an earlier study for biomedical sentence similarity, we here explore the effectiveness of multiple similarity measures via sentence ranking in PubMed abstracts. Ranking sentences is a crucial component for text summarization and biocuration evidence attribution. Applied to the "natural language processing" and "computational biology" datasets, our experimental results show that the off-the-shelf measures for sentence similarity may not be effective for ranking sentences. Neither lexical nor semantic measures provided more than 0.60 NDCG scores at the top 1 ranked document. It necessitates the development of a large-scale benchmark set and more effective measures.
引用
收藏
页码:531 / 532
页数:2
相关论文
共 13 条
[1]  
Cer D., 2017, 11 INT WORKSH SEM EV, P1
[2]  
Chandu K., 2017, BIONLP 2017, P58, DOI [DOI 10.18653/V1/W17-2307, 10.18653/v1/W17-2307]
[3]  
Chen Qingyu., 2017, Emu, V6, P52
[4]   Text mining for the biocuration workflow [J].
Hirschman, Lynette ;
Burns, Gully A. P. C. ;
Krallinger, Martin ;
Arighi, Cecilia ;
Cohen, K. Bretonnel ;
Valencia, Alfonso ;
Wu, Cathy H. ;
Chatr-Aryamontri, Andrew ;
Dowell, Karen G. ;
Huala, Eva ;
Lourenco, Analia ;
Nash, Robert ;
Veuthey, Anne-Lise ;
Wiegers, Thomas ;
Winter, Andrew G. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[5]   Cumulated gain-based evaluation of IR techniques [J].
Järvelin, K ;
Kekäläinen, J .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2002, 20 (04) :422-446
[6]  
Kim S, 2012, PROC INT SYMP POWER, P185, DOI 10.1109/ISPSD.2012.6229054
[7]  
Le Q., 2014, DISTRIBUTED REPRESEN, DOI DOI 10.1145/2740908.2742760
[8]  
Nomoto T., 2016, P JOINT WORKSH BIBL, P168
[9]   BELTracker: evidence sentence retrieval for BEL statements [J].
Rastegar-Mojarad, Majid ;
Elayavilli, Ravikumar Komandur ;
Liu, Hongfang .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[10]   A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering [J].
Sarrouti, Mourad ;
Ouatik El Alaoui, Said .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 68 :96-103