DEEPLENS: Interactive Out-of-distribution Data Detection in NLP Models

被引:1
作者
Song, Da [1 ]
Wang, Zhijie [1 ]
Huang, Yuheng [1 ]
Ma, Lei [1 ,2 ]
Zhang, Tianyi [3 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Univ Tokyo, Tokyo, Japan
[3] Purdue Univ, W Lafayette, IN USA
来源
PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023 | 2023年
基金
加拿大自然科学与工程研究理事会;
关键词
Interactive Visualization; Out-of-distribution Detection; Machine Learning; NLP;
D O I
10.1145/3544548.3580741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DEEPLENS, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DEEPLENS that has no interaction or visualization support.
引用
收藏
页数:17
相关论文
共 63 条
  • [1] Alammar J., 2021, P 59 ANN M ASS COMP
  • [2] Guidelines for Human-AI Interaction
    Amershi, Saleema
    Weld, Dan
    Vorvoreanu, Mihaela
    Fourney, Adam
    Nushi, Besmira
    Collisson, Penny
    Suh, Jina
    Iqbal, Shamsi
    Bennett, Paul N.
    Inkpen, Kori
    Teevan, Jaime
    Kikin-Gil, Ruth
    Horvitz, Eric
    [J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [3] Amodei D., 2016, Concrete Problems in AI Safety
  • [4] Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640
  • [5] [Anonymous], 2017, INT C MACH LEARN, DOI DOI 10.1109/DSC.2017.89
  • [6] Arora U, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P10687
  • [7] Bibas K, 2021, ADV NEUR IN
  • [8] Bird S., 2009, NATURAL LANGUAGE PRO
  • [9] Bitterwolf J., 2020, Advances in Neural Information Processing Systems, V33, P16085
  • [10] Bocklisch T., 2017, ARXIV171205181