Towards an Automated Classification of Spreadsheets

被引:0
作者
Mendes, Jorge [1 ,2 ]
Do, Kha N. [3 ]
Saraiva, Joao [1 ,2 ]
机构
[1] INESC TEC, HASLab, Oporto, Portugal
[2] Univ Minho, HASLab, Braga, Portugal
[3] Vietnam Natl Univ, Univ Sci, Ho Chi Minh, Vietnam
来源
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016) | 2016年 / 9946卷
关键词
Spreadsheets; Data mining; Classification;
D O I
10.1007/978-3-319-50230-4_26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database. We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 50 条
[11]   Automatic Diabetic Retinopathy Detection through Ensemble Classification Techniques Automated Diabetic Retionapthy Classification [J].
GeethaRamani, R. ;
Shanthamalar, Jeslin J. ;
Lakshmi, B. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, :936-939
[12]   Spreadsheets in business [J].
Pemberton, JD ;
Robson, AJ .
INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2000, 100 (8-9) :379-388
[13]   Automated Classification of Pathology Reports [J].
Oleynik, Michel ;
Finger, Marcelo ;
Patrao, Diogo F. C. .
MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 :1040-1040
[14]   Towards Applicability of Information Communication Technologies in Automated Disease Detection [J].
Zamani, Abu Sarwar ;
Rajput, Seema H. ;
Kaur, Harjeet ;
Meenakshi ;
Bangare, Sunil L. ;
Ray, Samrat .
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (03) :395-402
[15]   Automated Identity Document Classification [J].
Bhatlawande, Shripad ;
Shilaskar, Swati ;
Gupta, Divyam ;
Dupare, Prashik ;
Ghode, Rutvik .
COMMUNICATION AND INTELLIGENT SYSTEMS, VOL 3, ICCIS 2023, 2024, 969 :431-446
[16]   Automated Classification of Pointed Sources [J].
Zhang, Yanxia ;
Zhao, Yongheng ;
Zheng, Hongwen .
SOFTWARE AND CYBERINFRASTRUCTURE FOR ASTRONOMY, 2010, 7740
[17]   Towards a Classification of Aspects [J].
Yang, Chunhua ;
Wang, Haiyang .
PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, :508-+
[18]   Towards a Classification of Things [J].
El Bekri, Mohamed ;
Diouri, Ouafaa ;
Tioutiou, Abdelaali .
PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI), 2016, :1243-1246
[19]   Towards a theory of classification [J].
Elliott, George A. .
ADVANCES IN MATHEMATICS, 2010, 223 (01) :30-48
[20]   Astronomy Classification: Towards a Faceted Classification Scheme [J].
Quinlan, Emma ;
Rafferty, Pauline .
KNOWLEDGE ORGANIZATION, 2019, 46 (04) :260-278