BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images

被引:40
作者
Brancati, Nadia [1 ]
Anniciello, Anna Maria [2 ]
Pati, Pushpak [3 ,4 ]
Riccio, Daniel [1 ,5 ]
Scognamiglio, Giosue [2 ]
Jaume, Guillaume [3 ,6 ]
De Pietro, Giuseppe [1 ]
Di Bonito, Maurizio [2 ]
Foncubierta, Antonio [3 ]
Botti, Gerardo [2 ]
Gabrani, Maria [3 ]
Feroce, Florinda [2 ]
Frucci, Maria [1 ]
机构
[1] ICAR CNR, Inst High Performance Comp & Networking Res Counc, 111 Via Pietro Castellino, I-80131 Naples, Italy
[2] IRCCS, Natl Canc Inst, Fdn Pascale, 53 Via Mariano Semmola, I-80131 Naples, Italy
[3] IBM Res, Saumerstr 4, CH-8803 Zurich, Switzerland
[4] ETH, Ramistr 101, CH-8092 Zurich, Switzerland
[5] Univ Naples Federico II, Dept Elect Engn & Informat Technol, Via Claudio 21, I-80125 Naples, Italy
[6] EPFL Rte Cantonale, CH-1015 Lausanne, Switzerland
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2022年 / 2022卷
关键词
PATHOLOGY;
D O I
10.1093/database/baac093
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Breast cancer is the most commonly diagnosed cancer and registers the highest number of deaths for women. Advances in diagnostic activities combined with large-scale screening policies have significantly lowered the mortality rates for breast cancer patients. However, the manual inspection of tissue slides by pathologists is cumbersome, time-consuming and is subject to significant inter- and intra-observer variability. Recently, the advent of whole-slide scanning systems has empowered the rapid digitization of pathology slides and enabled the development of Artificial Intelligence (AI)-assisted digital workflows. However, AI techniques, especially Deep Learning, require a large amount of high-quality annotated data to learn from. Constructing such task-specific datasets poses several challenges, such as data-acquisition level constraints, time-consuming and expensive annotations and anonymization of patient information. In this paper, we introduce the BReAst Carcinoma Subtyping (BRACS) dataset, a large cohort of annotated Hematoxylin and Eosin (H&E)-stained images to advance AI development in the automatic characterization of breast lesions. BRACS contains 547 Whole-Slide Images (WSIs) and 4539 Regions Of Interest (ROIs) extracted from the WSIs. Each WSI and respective ROIs are annotated by the consensus of three board-certified pathologists into different lesion categories. Specifically, BRACS includes three lesion types, i.e., benign, malignant and atypical, which are further subtyped into seven categories. It is, to the best of our knowledge, the largest annotated dataset for breast cancer subtyping both at WSI and ROI levels. Furthermore, by including the understudied atypical lesions, BRACS offers a unique opportunity for leveraging AI to better understand their characteristics. We encourage AI practitioners to develop and evaluate novel algorithms on the BRACS dataset to further breast cancer diagnosis and patient care.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] Araujo T, 2022, BACH GRAND CHALLENGE
  • [2] Classification of breast cancer histology images using Convolutional Neural Networks
    Araujo, Teresa
    Aresta, Guilherme
    Castro, Eduardo
    Rouco, Jose
    Aguiar, Paulo
    Eloy, Catarina
    Polonia, Antonio
    Campilho, Aurelio
    [J]. PLOS ONE, 2017, 12 (06):
  • [3] BACH: Grand challenge on breast cancer histology images
    Aresta, Guilherme
    Araujo, Teresa
    Kwok, Scotty
    Chennamsetty, Sai Saketh
    Safwan, Mohammed
    Alex, Varghese
    Marami, Bahram
    Prastawa, Marcel
    Chan, Monica
    Donovan, Michael
    Fernandez, Gerardo
    Zeineh, Jack
    Kohl, Matthias
    Walz, Christoph
    Ludwig, Florian
    Braunewell, Stefan
    Baust, Maximilian
    Quoc Dang Vu
    Minh Nguyen Nhat To
    Kim, Eal
    Kwak, Jin Tae
    Galal, Sameh
    Sanchez-Freire, Veronica
    Brancati, Nadia
    Frucci, Maria
    Riccio, Daniel
    Wang, Yaqi
    Sun, Lingling
    Ma, Kaiqiang
    Fang, Jiannan
    Kone, Ismael
    Boulmane, Lahsen
    Campilho, Aurelio
    Eloy, Catarina
    Polonia, Antonio
    Aguiar, Paulo
    [J]. MEDICAL IMAGE ANALYSIS, 2019, 56 : 122 - 139
  • [4] Asif A, 2021, Arxiv, DOI arXiv:2112.09496
  • [5] QuPath: Open source software for digital pathology image analysis
    Bankhead, Peter
    Loughrey, Maurice B.
    Fernandez, Jose A.
    Dombrowski, Yvonne
    Mcart, Darragh G.
    Dunne, Philip D.
    McQuaid, Stephen
    Gray, Ronan T.
    Murray, Liam J.
    Coleman, Helen G.
    James, Jacqueline A.
    Salto-Tellez, Manuel
    Hamilton, Peter W.
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [6] Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks
    Bardou, Dalal
    Zhang, Kun
    Ahmad, Sayed Mohammad
    [J]. IEEE ACCESS, 2018, 6 : 24680 - 24693
  • [7] Bejnordi B.E, 2022, CAMELYON
  • [8] Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer
    Bejnordi, Babak Ehteshami
    Veta, Mitko
    van Diest, Paul Johannes
    van Ginneken, Bram
    Karssemeijer, Nico
    Litjens, Geert
    van der Laak, Jeroen A. W. M.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (22): : 2199 - 2210
  • [9] BreakHis based breast cancer automatic diagnosis using deep learning: Taxonomy, survey and insights
    Benhammou, Yassir
    Achchab, Boujemaa
    Herrera, Francisco
    Tabik, Siham
    [J]. NEUROCOMPUTING, 2020, 375 : 9 - 24
  • [10] Brancati N, 2022, BRACS BREAST CARCINO