Towards an Automated Classification of Spreadsheets

被引:0
|
作者
Mendes, Jorge [1 ,2 ]
Do, Kha N. [3 ]
Saraiva, Joao [1 ,2 ]
机构
[1] INESC TEC, HASLab, Oporto, Portugal
[2] Univ Minho, HASLab, Braga, Portugal
[3] Vietnam Natl Univ, Univ Sci, Ho Chi Minh, Vietnam
来源
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016) | 2016年 / 9946卷
关键词
Spreadsheets; Data mining; Classification;
D O I
10.1007/978-3-319-50230-4_26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database. We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 50 条
  • [1] Towards an Automated Classification of Software Libraries
    Auch M.
    Balluff M.
    Mandl P.
    Wolff C.
    SN Computer Science, 5 (4)
  • [2] Towards automated bone fracture classification
    Funk, MW
    El-Kwae, EA
    Kellam, JF
    MEDICAL IMAGING: 2001: IMAGE PROCESSING, PTS 1-3, 2001, 4322 : 755 - 765
  • [3] Towards Multilingual Automated Classification Systems
    Musaev, Aibek
    Pu, Calton
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2333 - 2337
  • [4] Automated test case generation for spreadsheets
    Fisher, M
    Cao, MM
    Rothermel, G
    Cook, CR
    Burnett, MM
    ICSE 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 2002, : 141 - 151
  • [5] Towards the automated classification system of worn surfaces
    Wolski, M.
    Woloszynski, T.
    Stachowiak, G. W.
    Podsiadlo, P.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART J-JOURNAL OF ENGINEERING TRIBOLOGY, 2020, 234 (08) : 1265 - 1274
  • [6] Automated Refactoring of Nested-IF Formulae in Spreadsheets
    Zhang, Jie
    Han, Shi
    Hao, Dan
    Zhang, Lu
    Zhang, Dongmei
    ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 833 - 838
  • [7] FINANCIAL PROJECTIONS - AUTOMATED MODELS WITH ELECTRONIC SPREADSHEETS
    WALLACH, W
    SMALL BUSINESS COMPUTERS, 1984, 8 (02): : 38 - &
  • [8] Automated Repair of Data Faults in Templated Spreadsheets
    Wang, Xiaoyan
    Yu, Quan
    Yang, Guowei
    2018 25TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2018), 2018, : 705 - 706
  • [9] Towards rapid and automated vulnerability classification of concrete buildings
    Iturburu, Lissette
    Kwannandar, Jean
    Dyke, Shirley J.
    Liu, Xiaoyu
    Zhang, Xin
    Ramirez, Julio
    EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION, 2023, 22 (02) : 309 - 332
  • [10] Towards the development of an automated wear particle classification system
    Stachowiak, G. W.
    Podsiadlo, P.
    TRIBOLOGY INTERNATIONAL, 2006, 39 (12) : 1615 - 1623