Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

被引:5
|
作者
Wang, Yu [1 ]
Li, Jinchao [1 ]
Naumann, Tristan [1 ]
Xiong, Chenyan [1 ]
Cheng, Hao [1 ]
Tinn, Robert [1 ]
Wong, Cliff [1 ]
Usuyama, Naoto [1 ]
Rogahn, Richard [1 ]
Shen, Zhihong [1 ]
Qin, Yang [1 ]
Horvitz, Eric [1 ]
Bennett, Paul N. [1 ]
Gao, Jianfeng [1 ]
Poon, Hoifung [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Domain-specific pretraining; Search; Biomedical; NLP; COVID-19;
D O I
10.1145/3447548.3469053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.
引用
收藏
页码:3717 / 3725
页数:9
相关论文
共 50 条
  • [1] A Domain-specific Biomedical Literature Search Engine
    Lu, Richard
    Wu, Chieh-Chen
    Li, Yu-Chuan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 131 : A1 - A1
  • [2] Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
    Gu Y.
    Tinn R.
    Cheng H.
    Lucas M.
    Usuyama N.
    Liu X.
    Naumann T.
    Gao J.
    Poon H.
    ACM Transactions on Computing for Healthcare, 2022, 3 (01):
  • [3] BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines
    Jacome, Alberto G.
    Fdez-Riverola, Florentino
    Lourenco, Analia
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 131 : 63 - 77
  • [4] Quality and relevance of domain-specific search: A case study in mental health
    Tang, TT
    Craswell, N
    Hawking, D
    Griffiths, K
    Christensen, H
    INFORMATION RETRIEVAL, 2006, 9 (02): : 207 - 225
  • [5] Quality and relevance of domain-specific search: A case study in mental health
    Thanh Tin Tang
    Nick Craswell
    David Hawking
    Kathy Griffiths
    Helen Christensen
    Information Retrieval, 2006, 9 : 207 - 225
  • [6] Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text
    Sajadi, Armin
    Milios, Evangelos E.
    Keselj, Vlado
    Janssen, Jeannette C. M.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 347 - 360
  • [7] Seed Selection for Domain-Specific Search
    Priyatam, Pattisapu Nikhil
    Dubey, Ajay
    Perumal, Krish
    Praneeth, Sai
    Kakadia, Dharmesh
    Varma, Vasudeva
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 923 - 928
  • [8] Personalized Domain-specific Search Engine
    Zhang, Lei
    Peng, Yong
    Meng, Xiangwu
    Guo, Jie
    2008 6TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, VOLS 1-3, 2008, : 1241 - 1246
  • [9] ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research
    Kamdar, Maulik R.
    Zeginis, Dimitris
    Hasnain, Ali
    Decker, Stefan
    Deus, Helena F.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 47 : 112 - 130
  • [10] Concept Models for Domain-Specific Search
    Meij, Edgar
    de Rijke, Maarten
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 207 - 214