DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

被引:14
|
作者
Bertucci D. [1 ]
Hamid M.M. [1 ]
Anand Y. [1 ]
Ruangrotsakun A. [1 ]
Tabatabai D. [1 ]
Perez M. [1 ]
Kahng M. [1 ]
机构
[1] Oregon State University, United States
关键词
data-centric AI; error analysis; image data; treemaps; visual analytics; Visualization for machine learning;
D O I
10.1109/TVCG.2022.3209425
中图分类号
学科分类号
摘要
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. © 2022 IEEE.
引用
收藏
页码:320 / 330
页数:10
相关论文
共 50 条
  • [31] Learning Bayesian Network Structure from Large-scale Datasets
    Hong, Yu
    Xia, Xiaoling
    Le, Jiajin
    Zhou, Xiangdong
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 258 - 264
  • [32] Learning From Noisy Large-Scale Datasets With Minimal Supervision
    Veit, Andreas
    Alldrin, Neil
    Chechik, Gal
    Krasin, Ivan
    Gupta, Abhinav
    Belongie, Serge
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6575 - 6583
  • [33] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [34] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [35] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [36] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
  • [37] Datasets, tasks, and training methods for large-scale hypergraph learning
    Sunwoo Kim
    Dongjin Lee
    Yul Kim
    Jungho Park
    Taeho Hwang
    Kijung Shin
    Data Mining and Knowledge Discovery, 2023, 37 : 2216 - 2254
  • [38] MMSVC: An efficient unsupervised learning approach for large-scale datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    NEUROCOMPUTING, 2012, 98 : 114 - 122
  • [39] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
  • [40] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9