Enhancing Ovarian Tumor Dataset Analysis Through Data Mining Preprocessing Techniques

被引:0
作者
Shetty, Roopashri [1 ]
Geetha, M. [1 ]
Dinesh Acharya, U. [1 ]
Shyamala, G. [2 ]
机构
[1] Manipal Acad Higher Educ, Manipal Inst Technol, Dept Comp Sci & Engn, Manipal 576104, Karnataka, India
[2] Manipal Acad Higher Educ, Kasturba Med Coll, Dept Obstet & Gynaecol, Manipal 576104, Karnataka, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Data mining; Tumors; Ovarian cancer; Imputation; Feature extraction; Cleaning; Accuracy; Classification algorithms; Supervised learning; Medical diagnosis; classification; data mining; preprocessing; supervised learning technique; ALGORITHM; CLASSIFICATION; IMPUTATION; SELECTION;
D O I
10.1109/ACCESS.2024.3450520
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The early detection and treatment of ovarian cancer face considerable hurdles due to its complexity and lethal nature. Because of its high death rates and heterogeneity, ovarian cancer poses a significant challenge to oncology. In-depth study of ovarian tumor datasets is crucial to improve the knowledge on this complicated illness and to develop new diagnostic and treatment approaches. The accuracy of the information utilized for training and analysis has a substantial impact on how well computer models predict and comprehend ovarian cancer. Data mining methods mostly rely on the quality of data. Hence, in order to improve the accuracy and dependability of ensuing studies, this work is carried out to examine the critical preprocessing methods that are used on ovarian tumor dataset. A novel ovarian tumor dataset is collected and this raw dataset has missing values, incomplete data, noisy data, redundant data and outliers and these anomalies degrade the performance of mining results. In this study, we explore the application of data mining preprocessing methods to enhance the analysis of ovarian tumor datasets. Through the use of methods like feature selection, data cleaning, normalization, and dimensionality reduction, we aim to improve the quality of the data, and make it easier to find significant patterns and biomarkers linked to ovarian cancer. The work emphasizes the importance of preprocessing in maximizing the potential of ovarian tumor datasets and expanding the field's understanding of this debilitating illness in order to improve detection and treatment process. Preprocessing performance indicators namely accuracy, sensitivity, and specificity are used to assess the efficiency. It is found that, after preprocessing of the dataset, an accuracy of 88% is achieved when classified as benign or malignant using Logistic Regression. Upon applying every feature selection technique on the dataset, it is evident that features obtained through Recursive Feature Elimination technique and feature importance yield greater accuracy of 92% when classified with respect to Logistic Regression and Support Vector Machine. It is expected that the knowledge gathered from these preprocessing techniques result in more precise and trustworthy computer models, which could enhance patient outcomes in the field of ovarian cancer.
引用
收藏
页码:122300 / 122312
页数:13
相关论文
共 50 条
  • [31] Assessment of Imbalanced Dataset in Alzheimer's disease Prediction using Data Mining Techniques
    Bonab, F. Rahbari
    Dezaje, M.
    Nourazarian, A. R.
    Kkhatoni, M. Asghari
    Asl, M. R. Kandovani
    [J]. INTERNATIONAL JOURNAL OF ADVANCED BIOTECHNOLOGY AND RESEARCH, 2016, 7 (04): : 1969 - 1975
  • [32] Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset
    Caballero, Pablo
    Gonzalez-Abril, Luis
    Ortega, Juan A.
    Simon-Soro, Aurea
    [J]. ALGORITHMS, 2024, 17 (03)
  • [33] Dataset Designing of Software Architectures Styles for Analysis through Data Mining Clustering Algorithms
    Khan, Qadeem
    Qamar, Usman
    Butt, Wasi Haider
    Rehman, Saad
    [J]. PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 400 - 405
  • [34] Detection of Adverse Drug Events through Data Mining Techniques
    Tripathy, Amiya Kumar
    Joshi, Nilakshi
    Kale, Harshal
    Durando, Mario
    Carvalho, Loyal
    [J]. 2015 INTERNATIONAL CONFERENCE ON TECHNOLOGY FOR SUSTAINABLE DEVELOPMENT (ICTSD-2015), 2015,
  • [35] A Comparative Analysis of Data Mining Techniques on Breast Cancer Diagnosis Data using WEKA Toolbox
    Alshammari, Majdah
    Mezher, Mohammad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (08) : 224 - 229
  • [36] An Empirical Study on applying Data Mining Techniques for the Analysis and Prediction of Heart Disease
    Sivagowry, S.
    Durairaj, M.
    Persia, A.
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2013, : 265 - 270
  • [37] CHURN ANALYSIS AND CUSTOMER SEGMENTATION OF A COSMETICS BRAND USING DATA MINING TECHNIQUES
    Kizilkaya Aydogan, Emel
    Gencer, Cevriye
    Akbulut, Sinem
    [J]. SIGMA JOURNAL OF ENGINEERING AND NATURAL SCIENCES-SIGMA MUHENDISLIK VE FEN BILIMLERI DERGISI, 2008, 26 (01): : 43 - 57
  • [38] Disease diagnosis through data mining techniques
    Sidiq, Umar
    Aaqib, Syed Mutahar
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 275 - 280
  • [39] Analysis of an energy efficient building design through data mining approach
    Kim, Hyunjoo
    Stump, Annette
    Kim, Wooyoung
    [J]. AUTOMATION IN CONSTRUCTION, 2011, 20 (01) : 37 - 43
  • [40] Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection
    Ata, Oguz
    Hazim, Layth
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (02): : 618 - 626