Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier

被引:57
|
作者
Yu, Daping [1 ,2 ]
Liu, Zhidong [1 ,2 ]
Su, Chongyu [1 ,2 ]
Han, Yi [1 ,2 ]
Duan, XinChun [1 ,2 ]
Zhang, Rui [1 ,2 ]
Liu, Xiaoshuang [3 ]
Yang, Yang [4 ]
Xu, Shaofa [1 ,2 ]
机构
[1] Capital Med Univ, Beijing Chest Hosp, Thorac Surg Dept, Area 1st,9 Compound,Beiguan St, Beijing, Peoples R China
[2] Beijing TB & Thorac Tumor Res Inst, Area 1st,9 Compound,Beiguan St, Beijing, Peoples R China
[3] Ping An Hlth Technol, Beijing, Peoples R China
[4] Beijing Gencode Diagnost Lab, Beijing, Peoples R China
关键词
cfDNA; CNV; early diagnosis; lung cancer; XGBoost; CIRCULATING-TUMOR DNA; HEPATOCELLULAR-CARCINOMA; RISK-FACTORS;
D O I
10.1111/1759-7714.13204
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy number variation (CNV) is a common genetic change in tumor genomes, and many studies have indicated that CNV-derived cell-free DNA (cfDNA) from plasma could be feasible as a biomarker for cancer diagnosis. Methods In this study, we determined the possibility of using chromosomal arm-level CNV from cfDNA as a biomarker for lung cancer diagnosis in a small cohort of 40 patients and 41 healthy controls. Arm-level CNV distributions were analyzed based on z score, and the machine-learning algorithm Extreme Gradient Boosting (XGBoost) was applied for cancer prediction. Results The results showed that amplifications tended to emerge on chromosomes 3q, 8q, 12p, and 7q. Deletions were frequently detected on chromosomes 22q, 3p, 5q, 16q, 10q, and 15q. Upon applying a trained XGBoost classifier, specificity and sensitivity of 100% were finally achieved in the test group (12 patients and 13 healthy controls). In addition, five-fold cross-validation proved the stability of the model. Finally, our results suggested that the integration of four arm-level CNVs and the concentration of cfDNA into the trained XGBoost classifier provides a potential method for detecting lung cancer. Conclusion Our results suggested that the integration of four arm-level CNVs and the concentration from of cfDNA integrated withinto the trained XGBoost classifier could become provides a potentially method for detecting lung cancer detection. Key points Significant findings of the study: Healthy individuals have different arm-level CNV profiles from cancer patients. Amplifications tend to emerge on chromosome 3q, 8q, 12p, 7q and deletions tend to emerge on chromosome 22q, 3p, 5q, 16q, 10q, 15q. What this study adds: CfDNA concentration, arm 10q, 3q, 8q, 3p, and 22q are key features for prediction. Trained XGBoost classifier is a potential method for lung cancer detection.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 50 条
  • [41] VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction
    Bakasa, Wilson
    Viriri, Serestina
    JOURNAL OF IMAGING, 2023, 9 (07)
  • [42] DRG-Net: Diabetic Retinopathy Grading Network Using Graph Learning with Extreme Gradient Boosting Classifier
    Poranki, Venkata Kotam Raju
    Rao, B. Srinivasa
    Informatica (Slovenia), 2024, 48 (02): : 171 - 184
  • [43] COOBoostR: An Extreme Gradient Boosting-Based Tool for Robust Tissue or Cell-of-Origin Prediction of Tumors
    Yang, Sungmin
    Ha, Kyungsik
    Song, Woojeung
    Fujita, Masashi
    Kuebler, Kirsten
    Polak, Paz
    Hiyama, Eiso
    Nakagawa, Hidewaki
    Kim, Hong-Gee
    Lee, Hwajin
    LIFE-BASEL, 2023, 13 (01):
  • [44] Estimating Express Train Preference of Urban Railway Passengers Based on Extreme Gradient Boosting (XGBoost) using Smart Card Data
    Lee, Eun Hak
    Kim, Kyoungtae
    Kho, Seung-Young
    Kim, Dong-Kyu
    Cho, Shin-Hyung
    TRANSPORTATION RESEARCH RECORD, 2021, 2675 (11) : 64 - 76
  • [45] Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost)
    Zhong, Ruizhi
    Johnson, Raymond, Jr.
    Chen, Zhongwei
    INTERNATIONAL JOURNAL OF COAL GEOLOGY, 2020, 220
  • [46] Development of an Efficient Network Intrusion Detection Model Using Extreme Gradient Boosting (XGBoost) on the UNSW-NB15 Dataset
    Husain, Anwar
    Salem, Ahmed
    Jim, Carol
    Dimitoglou, George
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [47] A Proactive Attack Detection for Heating, Ventilation, and Air Conditioning (HVAC) System Using Explainable Extreme Gradient Boosting Model (XGBoost)
    Khan, Irfan Ullah
    Aslam, Nida
    AlShedayed, Rana
    AlFrayan, Dina
    AlEssa, Rand
    AlShuail, Noura A.
    Al Safwan, Alhawra
    SENSORS, 2022, 22 (23)
  • [48] Copy number variation, increased gene expression, and molecular mechanisms of neurofascin in lung cancer
    Erdem, Johanna Samulin
    Arnoldussen, Yke Jildouw
    Skaug, Vidar
    Haugen, Aage
    Zienolddiny, Shanbeh
    MOLECULAR CARCINOGENESIS, 2017, 56 (09) : 2076 - 2085
  • [49] A functional copy number variation in the WWOX gene is associated with lung cancer risk in Chinese
    Yang, Lei
    Liu, Bin
    Huang, Binfang
    Deng, Jieqiong
    Li, Hongbin
    Yu, Bolan
    Qiu, Fuman
    Cheng, Mei
    Wang, Hui
    Yang, Rongrong
    Yang, Xiaorong
    Zhou, Yifeng
    Lu, Jiachun
    HUMAN MOLECULAR GENETICS, 2013, 22 (09) : 1886 - 1894
  • [50] Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting
    Hao Wang
    Chuyao Liu
    Lei Deng
    Scientific Reports, 8