Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

被引:5
|
作者
Wang, Yu [1 ]
Li, Jinchao [1 ]
Naumann, Tristan [1 ]
Xiong, Chenyan [1 ]
Cheng, Hao [1 ]
Tinn, Robert [1 ]
Wong, Cliff [1 ]
Usuyama, Naoto [1 ]
Rogahn, Richard [1 ]
Shen, Zhihong [1 ]
Qin, Yang [1 ]
Horvitz, Eric [1 ]
Bennett, Paul N. [1 ]
Gao, Jianfeng [1 ]
Poon, Hoifung [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Domain-specific pretraining; Search; Biomedical; NLP; COVID-19;
D O I
10.1145/3447548.3469053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.
引用
收藏
页码:3717 / 3725
页数:9
相关论文
共 50 条
  • [41] pybool_ir: A Toolkit for Domain-Specific Search Experiments
    Scells, Harrisen
    Potthast, Martin
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3190 - 3194
  • [42] Generation of classifier for domain-specific hidden web search interface
    Yuan, WC
    Zuo, WL
    Xu, QY
    PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 657 - 660
  • [43] LEARNING TO SEARCH - FROM WEAK METHODS TO DOMAIN-SPECIFIC HEURISTICS
    LANGLEY, P
    COGNITIVE SCIENCE, 1985, 9 (02) : 217 - 260
  • [44] A dichotomic search algorithm for mining and learning in domain-specific logics
    Ferré, S
    King, RD
    FUNDAMENTA INFORMATICAE, 2005, 66 (1-2) : 1 - 32
  • [45] Text classification based filters for a domain-specific search engine
    Schmidt, Sebastian
    Schnitzer, Steffen
    Rensing, Christoph
    COMPUTERS IN INDUSTRY, 2016, 78 : 70 - 79
  • [46] A framework for domain-specific search engine: Design pattern perspective
    Zhang, JL
    Qu, WM
    Du, L
    Sun, YF
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3881 - 3886
  • [47] A machine learning approach to building domain-specific search engines
    McCallum, A
    Nigam, K
    Rennie, J
    Seymore, K
    IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 662 - 667
  • [48] A meta-engine for building domain-specific search engines
    Kejriwal, Mayank
    SOFTWARE IMPACTS, 2021, 7
  • [49] DOMAIN-SPECIFIC PRETRAINING OF DEEP LEARNING SYSTEMS IN GASTROINTESTINAL ENDOSCOPY IMPROVES PERFORMANCE OVER CURRENT STATE-OF-THE-ART PRETRAINING METHODS
    Fockens, Kiki
    Boers, Tim G.
    Jukema, Jelmer
    Jong, Martijn R.
    Kusters, Koen C.
    Van der Putten, Joost
    Struyvenberg, Maarten
    Pouw, Roos E.
    Duits, Lucas C.
    Weusten, Bas L.
    Herrero, L. Alvarez
    Houben, Martin H.
    Nagengast, Wouter B.
    Westerhof, Jessie
    Alkhalaf, A.
    Mallant-Hent, Rosalie
    Scholten, Pieter
    Ragunath, Krish
    Seewald, Stefan
    Elbe, Peter
    Baldaque-Silva, Francisco
    Barret, Maximilien
    Fernandez-Sordo, Jacobo Ortiz
    Villarejo, Guiomar Moral
    Pech, Oliver
    Beyna, Torsten
    Van der Sommen, Fons
    De With, P. H. N.
    De Groof, Jeroen
    Bergman, Jacques
    GASTROENTEROLOGY, 2023, 164 (06) : S215 - S215
  • [50] Multi-stage domain-specific pretraining for improved detection and localization of Barrett's neoplasia: A comprehensive clinically validated study
    van der Putten, Joost
    de Groof, Jeroen
    Struyvenberg, Maarten
    Boers, Tim
    Fockens, Kiki
    Curvers, Wouter
    Schoon, Erik
    Bergman, Jacques
    van der Sommen, Fons
    de With, Peter H. N.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 107