The impact of site-specific digital histology signatures on deep learning model accuracy and bias

被引:123
作者
Howard, Frederick M. [1 ]
Dolezal, James [1 ]
Kochanny, Sara [1 ]
Schulte, Jefree [2 ]
Chen, Heather [2 ]
Heij, Lara [3 ,4 ]
Huo, Dezheng [5 ,6 ]
Nanda, Rita [1 ,6 ]
Olopade, Olufunmilayo I. [1 ,6 ]
Kather, Jakob N. [7 ,8 ,9 ]
Cipriani, Nicole [2 ,6 ]
Grossman, Robert L. [1 ,6 ]
Pearson, Alexander T. [1 ,6 ]
机构
[1] Univ Chicago, Dept Med, Sect Hematol Oncol, 5841 S Maryland Ave, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Pathol, 5841 S Maryland Ave, Chicago, IL 60637 USA
[3] Univ Hosp RWTH Aachen, Dept Surg & Transplantat, Aachen, Germany
[4] Univ Hosp RWTH Aachen, Inst Pathol, Aachen, Germany
[5] Univ Chicago, Dept Publ Hlth Sci, Chicago, IL 60637 USA
[6] Univ Chicago Comprehens Canc Ctr, Chicago, IL USA
[7] Univ Hosp RWTH Aachen, Dept Med 3, Aachen, Germany
[8] Univ Leeds, Leeds Inst Med Res St Jamess, Pathol & Data Analyt, Leeds, W Yorkshire, England
[9] Univ Heidelberg Hosp, Natl Ctr Tumor Dis, Med Oncol, Heidelberg, Germany
关键词
COMPREHENSIVE GENOMIC CHARACTERIZATION; OPERATING CHARACTERISTIC CURVES; BREAST-CANCER; MITOSIS DETECTION; HEALTH-CARE; HISTOPATHOLOGY; ANCESTRY; RESOURCE; BIOLOGY; AREAS;
D O I
10.1038/s41467-021-24698-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site. Deep learning models have been trained on The Cancer Genome Atlas to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. Here, the authors demonstrate that site-specific histologic signatures can lead to biased estimates of accuracy for such models, and propose a method to minimize such bias.
引用
收藏
页数:13
相关论文
共 81 条
  • [41] Population Differences in Breast Cancer: Survey in Indigenous African Women Reveals Over-Representation of Triple-Negative Breast Cancer
    Huo, Dezheng
    Ikpatt, Francis
    Khramtsov, Andrey
    Dangou, Jean-Marie
    Nanda, Rita
    Dignam, James
    Zhang, Bifeng
    Grushko, Tatyana
    Zhang, Chunling
    Oluwasola, Olayiwola
    Malaka, David
    Malami, Sani
    Odetunde, Abayomi
    Adeoye, Adewumi O.
    Iyare, Festus
    Falusi, Adeyinka
    Perou, Charles M.
    Olopade, Olufunmilayo I.
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2009, 27 (27) : 4515 - 4521
  • [42] IBM, 2017, IBM ILOG CPLEX 12 10
  • [43] Deep Learning Models for Histopathological Classification of Gastric and Colonic Epithelial Tumours
    Iizuka, Osamu
    Kanavati, Fahdi
    Kato, Kei
    Rambeau, Michael
    Arihiro, Koji
    Tsuneki, Masayuki
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [44] Proliferation in African breast cancer: Biology and prognostication in Nigerian breast cancer material
    Ikpatt, OF
    Kuopio, T
    Collan, Y
    [J]. MODERN PATHOLOGY, 2002, 15 (08) : 783 - 789
  • [45] A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival
    Jaber, Mustafa I.
    Song, Bing
    Taylor, Clive
    Vaske, Charles J.
    Benz, Stephen C.
    Rabizadeh, Shahrooz
    Soon-Shiong, Patrick
    Szeto, Christopher W.
    [J]. BREAST CANCER RESEARCH, 2020, 22 (01)
  • [46] A machine learning-based prognostic predictor for stage III colon cancer
    Jiang, Dan
    Liao, Junhua
    Duan, Haihan
    Wu, Qingbin
    Owen, Gemma
    Shu, Chang
    Chen, Liangyin
    He, Yanjun
    Wu, Ziqian
    He, Du
    Zhang, Wenyan
    Wang, Ziqiang
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [47] Pan-cancer image-based detection of clinically actionable genetic alterations
    Kather, Jakob Nikolas
    Heij, Lara R.
    Grabsch, Heike I.
    Loeffler, Chiara
    Echle, Amelie
    Muti, Hannah Sophie
    Krause, Jeremias
    Niehues, Jan M.
    Sommer, Kai A. J.
    Bankhead, Peter
    Kooreman, Loes F. S.
    Schulte, Jefree J.
    Cipriani, Nicole A.
    Buelow, Roman D.
    Boor, Peter
    Ortiz-Bruechle, Nadina
    Hanby, Andrew M.
    Speirs, Valerie
    Kochanny, Sara
    Patnaik, Akash
    Srisuwananukorn, Andrew
    Brenner, Hermann
    Hoffmeister, Michael
    van den Brandt, Piet A.
    Jaeger, Dirk
    Trautwein, Christian
    Pearson, Alexander T.
    Luedde, Tom
    [J]. NATURE CANCER, 2020, 1 (08) : 789 - +
  • [48] Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer
    Kather, Jakob Nikolas
    Pearson, Alexander T.
    Halama, Niels
    Jaeger, Dirk
    Krause, Jeremias
    Loosen, Sven H.
    Marx, Alexander
    Boor, Peter
    Tacke, Frank
    Neumann, Ulf Peter
    Grabsch, Heike I.
    Yoshikawa, Takaki
    Brenner, Hermann
    Chang-Claude, Jenny
    Hoffmeister, Michael
    Trautwein, Christian
    Luedde, Tom
    [J]. NATURE MEDICINE, 2019, 25 (07) : 1054 - +
  • [49] Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study
    Kather, Jakob Nikolas
    Krisam, Johannes
    Charoentong, Pornpimol
    Luedde, Tom
    Herpel, Esther
    Weis, Cleo-Aron
    Gaiser, Timo
    Marx, Alexander
    Valous, Nektarios A.
    Ferber, Dyke
    Jansen, Lina
    Reyes-Aldasoro, Constantino Carlos
    Zoernig, Inka
    Jaeger, Dirk
    Brenner, Hermann
    Chang-Claude, Jenny
    Hoffmeister, Michael
    Halama, Niels
    [J]. PLOS MEDICINE, 2019, 16 (01)
  • [50] Kather JN., 2019, bioRxiv, P690206, DOI [10.1101/690206, DOI 10.1101/690206]