Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

被引:5
|
作者
Wang, Yu [1 ]
Li, Jinchao [1 ]
Naumann, Tristan [1 ]
Xiong, Chenyan [1 ]
Cheng, Hao [1 ]
Tinn, Robert [1 ]
Wong, Cliff [1 ]
Usuyama, Naoto [1 ]
Rogahn, Richard [1 ]
Shen, Zhihong [1 ]
Qin, Yang [1 ]
Horvitz, Eric [1 ]
Bennett, Paul N. [1 ]
Gao, Jianfeng [1 ]
Poon, Hoifung [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Domain-specific pretraining; Search; Biomedical; NLP; COVID-19;
D O I
10.1145/3447548.3469053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.
引用
收藏
页码:3717 / 3725
页数:9
相关论文
共 50 条
  • [31] Medical Information Retrieval An Instance of Domain-Specific Search
    Hanbury, Allan
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1191 - 1192
  • [32] Towards Domain-Specific Semantic Relatedness: A Case Study from Geography
    Sen, Shilad
    Johnson, Isaac
    Harper, Rebecca
    Mai, Huy
    Olsen, Samuel Horlbeck
    Mathers, Benjamin
    Vonessen, Laura Souza
    Wright, Matthew
    Hecht, Brent
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 2362 - 2370
  • [33] eCharacterizing Common and Domain-Specific Package Bugs: A Case Study on Ubuntu
    Ren, Xiaoxue
    Huang, Qiao
    Xia, Xin
    Xing, Zhenchang
    Bao, Lingfeng
    Lo, David
    2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2018, : 426 - 431
  • [34] Design of domain-specific search model for meta-search engine on internet
    Wang, Zheng
    Wang, Qing
    Wang, Ding-Wei
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (05): : 1218 - 1223
  • [35] Meta-mode search: Using XPath to search domain-specific models
    Sudarsan, R
    Gray, J
    SERP '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH AND PRACTICE, VOLS 1 AND 2, 2005, : 168 - 174
  • [36] Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation
    Toddenroth, Dennis
    HEALTHCARE OF THE FUTURE 2022, 2022, 292 : 23 - 27
  • [37] Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts
    Trieu, Hai-Long
    Nguyen, Nhung T. H.
    Miwa, Makoto
    Ananiadou, Sophia
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2018), 2018, : 183 - 188
  • [38] Application of domain-specific search method in Meta-Search Engine on Internet
    Wang, Zheng
    Wang, Qing
    Wang, DingWei
    2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 2078 - +
  • [39] Domain-specific queries and Web search personalization: some investigations
    Van Tien Hoang
    Spognardi, Angelo
    Tiezzi, Francesco
    Petrocchi, Marinella
    De Nicola, Rocco
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2015, (188): : 51 - 58
  • [40] Research on Domain-Specific Knowledge Graph Based on the RoBERTa-wwm-ext Pretraining Model
    Liu, Xingli
    Zhao, Wei
    Ma, Haiqun
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022