Towards an Automated Classification of Spreadsheets

被引:0
作者
Mendes, Jorge [1 ,2 ]
Do, Kha N. [3 ]
Saraiva, Joao [1 ,2 ]
机构
[1] INESC TEC, HASLab, Oporto, Portugal
[2] Univ Minho, HASLab, Braga, Portugal
[3] Vietnam Natl Univ, Univ Sci, Ho Chi Minh, Vietnam
来源
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016) | 2016年 / 9946卷
关键词
Spreadsheets; Data mining; Classification;
D O I
10.1007/978-3-319-50230-4_26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database. We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 50 条
[31]   Mutation Operators for Spreadsheets [J].
Abraham, Robin ;
Erwig, Martin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2009, 35 (01) :94-108
[32]   Technology-Assisted Review for Spreadsheets and Noisy Text [J].
O'Halloran, Tom ;
McManus, Bronagh ;
Harbison, Andrew ;
Grossman, Maura R. ;
Cormack, Gordon V. .
PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
[33]   Towards automated requirements prioritization and triage [J].
Duan, Chuan ;
Laurent, Paula ;
Cleland-Huang, Jane ;
Kwiatkowski, Charles .
REQUIREMENTS ENGINEERING, 2009, 14 (02) :73-89
[34]   Towards automated requirements prioritization and triage [J].
Chuan Duan ;
Paula Laurent ;
Jane Cleland-Huang ;
Charles Kwiatkowski .
Requirements Engineering, 2009, 14 :73-89
[35]   Automated Assignment of Proteoform Classification Levels [J].
Rolfs, Zach ;
Smith, Lloyd M. .
JOURNAL OF PROTEOME RESEARCH, 2021, 20 (08) :4101-4105
[36]   Proteomic applications of automated GPCR classification [J].
Davies, Matthew N. ;
Gloriam, David E. ;
Secker, Andrew ;
Freitas, Alexa A. ;
Mendao, Miguel ;
Timmis, Jon ;
Flower, Darren R. .
PROTEOMICS, 2007, 7 (16) :2800-2814
[37]   Automated Koos Classification of Vestibular Schwannoma [J].
Kujawa, Aaron ;
Dorent, Reuben ;
Connor, Steve ;
Oviedova, Anna ;
Okasha, Mohamed ;
Grishchuk, Diana ;
Ourselin, Sebastien ;
Paddick, Ian ;
Kitchen, Neil ;
Vercauteren, Tom ;
Shapey, Jonathan .
FRONTIERS IN RADIOLOGY, 2022, 2
[38]   Automated classification of construction project documents [J].
Caldas, CH ;
Soibelman, L ;
Han, JW .
JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2002, 16 (04) :234-243
[39]   A Method for Automated Gait Pattern Classification [J].
Silva, Nicoli ;
Rudek, Marcelo ;
Canciglieri Junior, Osiris ;
Manffra, Elisangela F. ;
Carvalho, Deborah ;
Bichinho, Gerson ;
Steinmetz, Jean-Paul .
TRANSDISCIPLINARY ENGINEERING: CROSSING BOUNDARIES, 2016, 4 :553-562
[40]   Automated classification of an Environmental Sensitivity Index [J].
Schiller, H ;
Van Bernem, C ;
Krasemann, HL .
ENVIRONMENTAL MONITORING AND ASSESSMENT, 2005, 110 (1-3) :291-299