DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

被引:14
|
作者
Bertucci D. [1 ]
Hamid M.M. [1 ]
Anand Y. [1 ]
Ruangrotsakun A. [1 ]
Tabatabai D. [1 ]
Perez M. [1 ]
Kahng M. [1 ]
机构
[1] Oregon State University, United States
关键词
data-centric AI; error analysis; image data; treemaps; visual analytics; Visualization for machine learning;
D O I
10.1109/TVCG.2022.3209425
中图分类号
学科分类号
摘要
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. © 2022 IEEE.
引用
收藏
页码:320 / 330
页数:10
相关论文
共 50 条
  • [21] Privacy preserving learning machine for large scale datasets
    Liu, Zhong-Bao
    Wang, Shi-Tong
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2013, 42 (02): : 272 - 276
  • [22] A Lightweight Framework for Fast Image Retrieval on Large-Scale Image Datasets
    Chen, Renhai
    Li, Wenwen
    Rao, Guozheng
    Feng, Zhiyong
    2020 9TH IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2020), 2020, : 42 - 47
  • [23] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [24] Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets
    Matejka, Justin
    Glueck, Michael
    Bradner, Erin
    Hashemi, Ali
    Grossman, Tovi
    Fitzmaurice, George
    PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,
  • [25] Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets
    Billot, Benjamin
    Magdamo, Colin
    Cheng, You
    Arnold, Steven E.
    Das, Sudeshna
    Iglesias, Juan Eugenio
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (09)
  • [26] Early Detection of Heart Disease Using Advances of Machine Learning for Large-Scale Patient Datasets
    Shah, Syed Ammad Ali
    Saleh, Ayat Hama
    Ebrahimian, Mahsa
    Kashef, Rasha
    2022 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2022, : 274 - 280
  • [27] Attention graph: Learning effective visual features for large-scale image classification
    Cui, Xuelian
    Zhang, Zhanjie
    Zhang, Tao
    Yang, Zhuoqun
    Yang, Jie
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2022, 16
  • [28] Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset 'Machine Learning-Readiness'
    Hart, Philip
    He, Lijun
    Wang, Tianyi
    Kumar, Vijay S.
    Aggour, Kareem
    Subramanian, Arun
    Yan, Weizhong
    IEEE OPEN ACCESS JOURNAL OF POWER AND ENERGY, 2022, 9 : 386 - 397
  • [29] Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment
    Zhou, Todd
    Jiao, Hong
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2023, 83 (04) : 831 - 854
  • [30] Datasets, tasks, and training methods for large-scale hypergraph learning
    Kim, Sunwoo
    Lee, Dongjin
    Kim, Yul
    Park, Jungho
    Hwang, Taeho
    Shin, Kijung
    DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (06) : 2216 - 2254