Rethinking domain adaptation for machine learning over clinical language

被引:17
作者
Laparra, Egoitz [1 ]
Bethard, Steven [1 ]
Miller, Timothy A. [2 ,3 ]
机构
[1] Univ Arizona, Sch Informat, Tucson, AZ USA
[2] Boston Childrens Hosp, Computat Hlth Informat Program, Landmark Ctr 5516-7,Mail Stop BCH3187, Boston, MA 02115 USA
[3] Harvard Med Sch, Dept Pediat, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
machine learning; natural language processing; domain adaptation; shared resources;
D O I
10.1093/jamiaopen/ooaa010
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously understudied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 34 条
[1]  
[Anonymous], 2011, P ICML
[2]  
Axelrod Amittai, 2011, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, P355
[3]  
Bethard S, 2017, P 11 INT WORKSH SEM, P565, DOI DOI 10.18653/V1/S17-2093
[4]  
Blitzer J., 2006, P C EMP METH NAT LAN, P120, DOI DOI 10.3115/1610075.1610094
[5]  
Chen M., 2012, P 29 INT COF INT C M, P1627, DOI 10.5555/3042573.3042781
[6]  
Daume III Hal, 2007, P 45 ANN M ASS COMPU, P256
[7]  
Dredze Mark, 2007, EMNLP CONLL
[8]  
Finkel JennyR., 2009, NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P602, DOI DOI 10.3115/1620754.1620842
[9]  
Fraser KC, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3659
[10]  
Ganin Y, 2016, J MACH LEARN RES, V17