Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?

被引:79
作者
Chen, Jieshan [1 ]
Xie, Mulong [1 ]
Xing, Zhenchang [1 ,3 ]
Chen, Chunyang [2 ]
Xu, Xiwei [3 ]
Zhu, Liming [3 ,5 ]
Li, Guoqiang [4 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Monash Univ, Clayton, Vic, Australia
[3] CSIRO, Data61, Canberra, ACT, Australia
[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[5] Univ New South Wales, Kensington, NSW, Australia
来源
PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20) | 2020年
关键词
Object Detection; User Interface; Deep Learning; Computer Vision; IMAGES;
D O I
10.1145/3368089.3409691
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Detecting Graphical User Interface (GUI) elements in GUI images is a domain-specific object detection task. It supports many software engineering tasks, such as GUI animation and testing, GUI search and code generation. Existing studies for GUI element detection directly borrow the mature methods from computer vision (CV) domain, including old fashioned ones that rely on traditional image processing features (e.g., canny edge, contours), and deep learning models that learn to detect from large-scale GUI data. Unfortunately, these CV methods are not originally designed with the awareness of the unique characteristics of GUIs and GUI elements and the high localization accuracy of the GUI element detection task. We conduct the first large-scale empirical study of seven representative GUI element detection methods on over 50k GUI images to understand the capabilities, limitations and effective designs of these methods. This study not only sheds the light on the technical challenges to be addressed but also informs the design of new GUI element detection methods. We accordingly design a new GUI-specific old-fashioned method for non-text GUI element detection which adopts a novel top-down coarse-to-fine strategy, and incorporate it with the mature deep learning model for GUI text detection. Our evaluation on 25,000 GUI images shows that our method significantly advances the start-of-the-art performance in GUI element detection.
引用
收藏
页码:1202 / 1214
页数:13
相关论文
共 52 条
  • [1] [Anonymous], 2011, P 24 ANN ACM S USER, DOI 10
  • [2] [Anonymous], 2018, TSE 18
  • [3] [Anonymous], 2016, Applied Computer Science
  • [4] Banovic N, 2012, UIST'12: PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, P83
  • [5] ActivitySpace: A Remembrance Framework to Support Interapplication Information Needs
    Bao, Lingfeng
    Ye, Deheng
    Xing, Zhenchang
    Xia, Xin
    Wang, Xinyu
    [J]. 2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2015, : 864 - 869
  • [6] scvRipper: Video Scraping Tool for Modeling Developers' Behavior Using Interaction Data
    Bao, Lingfeng
    Li, Jing
    Xing, Zhenchang
    Wang, Xinyu
    Zhou, Bo
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 673 - 676
  • [7] Bernal-Cardenas Carlos, 2020, 42 INT C SOFTW ENG I
  • [8] Robust Relational Layout Synthesis from Examples for Android
    Bielik, Pavol
    Fischer, Marc
    Vechev, Martin
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2018, 2
  • [9] Bridge Karl, 2018, WINDOWS ACCESSIBILIT