Towards an Automated Classification of Spreadsheets

被引:0
作者
Mendes, Jorge [1 ,2 ]
Do, Kha N. [3 ]
Saraiva, Joao [1 ,2 ]
机构
[1] INESC TEC, HASLab, Oporto, Portugal
[2] Univ Minho, HASLab, Braga, Portugal
[3] Vietnam Natl Univ, Univ Sci, Ho Chi Minh, Vietnam
来源
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016) | 2016年 / 9946卷
关键词
Spreadsheets; Data mining; Classification;
D O I
10.1007/978-3-319-50230-4_26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database. We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 50 条
  • [11] Spreadsheets in business
    Pemberton, JD
    Robson, AJ
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2000, 100 (8-9) : 379 - 388
  • [12] Automatic Diabetic Retinopathy Detection through Ensemble Classification Techniques Automated Diabetic Retionapthy Classification
    GeethaRamani, R.
    Shanthamalar, Jeslin J.
    Lakshmi, B.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 936 - 939
  • [13] Automated Classification of Pointed Sources
    Zhang, Yanxia
    Zhao, Yongheng
    Zheng, Hongwen
    SOFTWARE AND CYBERINFRASTRUCTURE FOR ASTRONOMY, 2010, 7740
  • [14] Automated Identity Document Classification
    Bhatlawande, Shripad
    Shilaskar, Swati
    Gupta, Divyam
    Dupare, Prashik
    Ghode, Rutvik
    COMMUNICATION AND INTELLIGENT SYSTEMS, VOL 3, ICCIS 2023, 2024, 969 : 431 - 446
  • [15] Towards Applicability of Information Communication Technologies in Automated Disease Detection
    Zamani, Abu Sarwar
    Rajput, Seema H.
    Kaur, Harjeet
    Meenakshi
    Bangare, Sunil L.
    Ray, Samrat
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (03): : 395 - 402
  • [16] Automated Classification of Pathology Reports
    Oleynik, Michel
    Finger, Marcelo
    Patrao, Diogo F. C.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1040 - 1040
  • [17] Towards a Classification of Aspects
    Yang, Chunhua
    Wang, Haiyang
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 508 - +
  • [18] Towards a Classification of Things
    El Bekri, Mohamed
    Diouri, Ouafaa
    Tioutiou, Abdelaali
    PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI), 2016, : 1243 - 1246
  • [19] Towards a theory of classification
    Elliott, George A.
    ADVANCES IN MATHEMATICS, 2010, 223 (01) : 30 - 48
  • [20] Astronomy Classification: Towards a Faceted Classification Scheme
    Quinlan, Emma
    Rafferty, Pauline
    KNOWLEDGE ORGANIZATION, 2019, 46 (04): : 260 - 278