Towards an Automated Classification of Spreadsheets

被引:0
作者
Mendes, Jorge [1 ,2 ]
Do, Kha N. [3 ]
Saraiva, Joao [1 ,2 ]
机构
[1] INESC TEC, HASLab, Oporto, Portugal
[2] Univ Minho, HASLab, Braga, Portugal
[3] Vietnam Natl Univ, Univ Sci, Ho Chi Minh, Vietnam
来源
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016) | 2016年 / 9946卷
关键词
Spreadsheets; Data mining; Classification;
D O I
10.1007/978-3-319-50230-4_26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database. We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 50 条
[21]   A cellular neural network based method for classification of magnetic resonance images:: Towards an automated detection of hippocampal sclerosis [J].
Doehler, Florian ;
Mormann, Florian ;
Weber, Bernd ;
Elger, Christian E. ;
Lehnertz, Klaus .
JOURNAL OF NEUROSCIENCE METHODS, 2008, 170 (02) :324-331
[22]   Teaching simulation with spreadsheets [J].
Pecherska, J ;
Merkuryev, Y .
Simulation in Wider Europe, 2005, :440-445
[23]   A methodology for testing spreadsheets [J].
Rothermel, G ;
Burnett, M ;
Li, LI ;
Dupuis, C ;
Sheretov, A .
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2001, 10 (01) :110-147
[24]   Data Organization in Spreadsheets [J].
Broman, Karl W. ;
Woo, Kara H. .
AMERICAN STATISTICIAN, 2018, 72 (01) :2-10
[25]   Optimisation and modelling with spreadsheets [J].
Jablonsky, Josef .
SOR'07: PROCEEDINGS OF THE 9TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2007, :151-158
[26]   Spreadsheets as DSS extensions [J].
Homocianu, Daniel ;
Airinei, Dinu .
INNOVATION AND SUSTAINABLE COMPETITIVE ADVANTAGE: FROM REGIONAL DEVELOPMENT TO WORLD ECONOMIES, VOLS 1-5, 2012, :2950-2959
[27]   Spreadsheets and Bulgarian goats [J].
Sugden, Steve .
INTERNATIONAL JOURNAL OF MATHEMATICAL EDUCATION IN SCIENCE AND TECHNOLOGY, 2012, 43 (07) :953-963
[28]   Model inference for spreadsheets [J].
Jácome Cunha ;
Martin Erwig ;
Jorge Mendes ;
João Saraiva .
Automated Software Engineering, 2016, 23 :361-392
[29]   Mutation Operators for Spreadsheets [J].
Abraham, Robin ;
Erwig, Martin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2009, 35 (01) :94-108
[30]   Model inference for spreadsheets [J].
Cunha, Jacome ;
Erwig, Martin ;
Mendes, Jorge ;
Saraiva, Joao .
AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (03) :361-392