Enhancing Ovarian Tumor Dataset Analysis Through Data Mining Preprocessing Techniques

被引：0

作者：

Shetty, Roopashri ^{[1
]}

Geetha, M. ^{[1
]}

Dinesh Acharya, U. ^{[1
]}

Shyamala, G. ^{[2
]}

机构：

[1] Manipal Acad Higher Educ, Manipal Inst Technol, Dept Comp Sci & Engn, Manipal 576104, Karnataka, India

[2] Manipal Acad Higher Educ, Kasturba Med Coll, Dept Obstet & Gynaecol, Manipal 576104, Karnataka, India

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Data mining; Tumors; Ovarian cancer; Imputation; Feature extraction; Cleaning; Accuracy; Classification algorithms; Supervised learning; Medical diagnosis; classification; data mining; preprocessing; supervised learning technique; ALGORITHM; CLASSIFICATION; IMPUTATION; SELECTION;

D O I：

10.1109/ACCESS.2024.3450520

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The early detection and treatment of ovarian cancer face considerable hurdles due to its complexity and lethal nature. Because of its high death rates and heterogeneity, ovarian cancer poses a significant challenge to oncology. In-depth study of ovarian tumor datasets is crucial to improve the knowledge on this complicated illness and to develop new diagnostic and treatment approaches. The accuracy of the information utilized for training and analysis has a substantial impact on how well computer models predict and comprehend ovarian cancer. Data mining methods mostly rely on the quality of data. Hence, in order to improve the accuracy and dependability of ensuing studies, this work is carried out to examine the critical preprocessing methods that are used on ovarian tumor dataset. A novel ovarian tumor dataset is collected and this raw dataset has missing values, incomplete data, noisy data, redundant data and outliers and these anomalies degrade the performance of mining results. In this study, we explore the application of data mining preprocessing methods to enhance the analysis of ovarian tumor datasets. Through the use of methods like feature selection, data cleaning, normalization, and dimensionality reduction, we aim to improve the quality of the data, and make it easier to find significant patterns and biomarkers linked to ovarian cancer. The work emphasizes the importance of preprocessing in maximizing the potential of ovarian tumor datasets and expanding the field's understanding of this debilitating illness in order to improve detection and treatment process. Preprocessing performance indicators namely accuracy, sensitivity, and specificity are used to assess the efficiency. It is found that, after preprocessing of the dataset, an accuracy of 88% is achieved when classified as benign or malignant using Logistic Regression. Upon applying every feature selection technique on the dataset, it is evident that features obtained through Recursive Feature Elimination technique and feature importance yield greater accuracy of 92% when classified with respect to Logistic Regression and Support Vector Machine. It is expected that the knowledge gathered from these preprocessing techniques result in more precise and trustworthy computer models, which could enhance patient outcomes in the field of ovarian cancer.

引用

页码：122300 / 122312

页数：13

共 50 条

[1] Data Mining Techniques for Iraqi Biochemical Dataset Analysis
Sameer, Sarah
Behadili, Suhad Faisal
BAGHDAD SCIENCE JOURNAL, 2022, 19 (02) : 385 - 398
[2] A comprehensive review on data preprocessing techniques in data analysis
Cetin, Volkan
Yildiz, Oktay
PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2022, 28 (02): : 299 - 312
[3] Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
BaniMustafa, Ahmed
ISECURE-ISC INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2019, 11 (03): : 79 - 89
[4] PRODUCTS DATASET ANALYSIS USING DATA MINING TECHNIQUES
Jaleel, Hanan Qassim
Stephan, Jane Jaleel
Naji, Sinan A.
JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2021, 16 (05): : 3880 - 3906
[5] Analysis of Data Mining Techniques For Healthcare Decision Support System Using Liver Disorder Dataset
Baitharu, Tapas Ranjan
Pani, Subhendu Kumar
INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016), 2016, 85 : 862 - 870
[6] Enhancing Moroccan Dialect Sentiment Analysis Through Optimized Preprocessing and Transfer Learning Techniques
Matrane, Yassir
Benabbou, Faouzia
Ellaky, Zineb
IEEE ACCESS, 2024, 12 : 187756 - 187777
[7] Sample selection algorithms for credit risk modelling through data mining techniques
Protopapadakis, Eftychios
Niklis, Dimitrios
Doumpos, Michalis
Doulamis, Anastasios
Zopounidis, Constantin
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2019, 11 (02) : 103 - 128
[8] Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques
Sanchez-Aguayo, Marco
Urquiza-Aguiar, Luis
Estrada-Jimenez, Jose
APPLIED SCIENCES-BASEL, 2022, 12 (07):
[9] A data preprocessing framework for students' outcome prediction by data mining techniques
Danubianu, Mirela
2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 836 - 841
[10] Analysis of Popular Techniques Used in Educational Data Mining
Gupta, Satinder Bal
Yadav, Raj Kumar
Shivani
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2020, 11 (02): : 137 - 162

← 1 2 3 4 5 →