Improving Information Systems Sustainability by Applying Machine Learning to Detect and Reduce Data Waste

被引:1
作者
Savarimuthu, Bastin Tony Roy [1 ]
Corbett, Jacqueline [2 ]
Yasir, Muhammad [1 ]
Lakshmi, Vijaya [3 ]
机构
[1] Univ Otago, Informat Sci, Dunedin, New Zealand
[2] Univ Laval, Management Informat Syst, Fac Business Adm, Quebec City, PQ, Canada
[3] Univ Laval, Management Informat Syst, Quebec City, PQ, Canada
来源
COMMUNICATIONS OF THE ASSOCIATION FOR INFORMATION SYSTEMS | 2023年 / 53卷
基金
加拿大自然科学与工程研究理事会;
关键词
Data Waste; Information Systems; Information Management; Sustainability; Machine Learning; Deep Learning; Reviews; DESIGN SCIENCE RESEARCH; ONLINE REVIEWS; METHODOLOGY; MANAGEMENT; ENERGY;
D O I
10.17705/1CAIS.05308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data are key building blocks for creating information value. However, information systems are increasingly plagued with useless, waste data that can impede their effective use and threaten sustainability objectives. Using a constructive design science approach, this work first, defines digital data waste. Then, it develops an ensemble artifact comprising two components. The first component comprises 13 machine learning models for detecting data waste. Applying these to 35,576 online reviews in two domains reveals data waste of 1.9% for restaurant reviews compared to 35.8% for app reviews. Machine learning can accurately identify 83% to 99.8% of data waste; deep learning models are particularly promising, with accuracy ranging from 96.4% to 99.8%. The second component comprises a sustainability cost calculator to quantify the social, economic, and environmental benefits of reducing data waste. Eliminating 5948 useless reviews in the sample would result in saving 6.9 person hours, $2.93 in server, middleware and client costs, and 9.52 kg of carbon emissions. Extrapolating these results to reviews on the internet shows substantially greater savings. This work contributes to design knowledge relating to sustainable information systems by highlighting the new class of problem of data waste and by designing approaches for addressing this problem.
引用
收藏
页码:189 / 213
页数:27
相关论文
共 119 条
  • [91] A design science research methodology for Information Systems Research
    Peffers, Ken
    Tuunanen, Tuure
    Rothenberger, Marcus A.
    Chatterjee, Samir
    [J]. JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2007, 24 (03) : 45 - 77
  • [92] Design science research genres: introduction to the special issue on exemplars and criteria for applicable design science research
    Peffers, Ken
    Tuunanen, Tuure
    Niehaves, Bjoern
    [J]. EUROPEAN JOURNAL OF INFORMATION SYSTEMS, 2018, 27 (02) : 129 - 139
  • [93] Posani L, 2019, Arxiv, DOI [arXiv:1803.06973, 10.48550/ARXIV.1803.06973, DOI 10.48550/ARXIV.1803.06973]
  • [94] Prakash V. S., 2013, P 6 INT C CLOUD COMP
  • [95] Mining customer requirements from online reviews: A product improvement perspective
    Qi, Jiayin
    Zhang, Zhenping
    Jeon, Seongmin
    Zhou, Yanquan
    [J]. INFORMATION & MANAGEMENT, 2016, 53 (08) : 951 - 963
  • [96] Romero D., 2018, P INT C ADV PRODUCTI
  • [97] A SURVEY OF DECISION TREE CLASSIFIER METHODOLOGY
    SAFAVIAN, SR
    LANDGREBE, D
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1991, 21 (03): : 660 - 674
  • [98] Savarimuthu B. T. R., 2020, P INT C INF SYST ICI
  • [99] Feature selection for medical diagnosis : Evaluation for cardiovascular diseases
    Shilaskar, Swati
    Ghatol, Ashok
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (10) : 4146 - 4153
  • [100] The environmental footprint of data centers in the United States
    Siddik, Md Abu Bakar
    Shehabi, Arman
    Marston, Landon
    [J]. ENVIRONMENTAL RESEARCH LETTERS, 2021, 16 (06)