Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier

被引:57
|
作者
Yu, Daping [1 ,2 ]
Liu, Zhidong [1 ,2 ]
Su, Chongyu [1 ,2 ]
Han, Yi [1 ,2 ]
Duan, XinChun [1 ,2 ]
Zhang, Rui [1 ,2 ]
Liu, Xiaoshuang [3 ]
Yang, Yang [4 ]
Xu, Shaofa [1 ,2 ]
机构
[1] Capital Med Univ, Beijing Chest Hosp, Thorac Surg Dept, Area 1st,9 Compound,Beiguan St, Beijing, Peoples R China
[2] Beijing TB & Thorac Tumor Res Inst, Area 1st,9 Compound,Beiguan St, Beijing, Peoples R China
[3] Ping An Hlth Technol, Beijing, Peoples R China
[4] Beijing Gencode Diagnost Lab, Beijing, Peoples R China
关键词
cfDNA; CNV; early diagnosis; lung cancer; XGBoost; CIRCULATING-TUMOR DNA; HEPATOCELLULAR-CARCINOMA; RISK-FACTORS;
D O I
10.1111/1759-7714.13204
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy number variation (CNV) is a common genetic change in tumor genomes, and many studies have indicated that CNV-derived cell-free DNA (cfDNA) from plasma could be feasible as a biomarker for cancer diagnosis. Methods In this study, we determined the possibility of using chromosomal arm-level CNV from cfDNA as a biomarker for lung cancer diagnosis in a small cohort of 40 patients and 41 healthy controls. Arm-level CNV distributions were analyzed based on z score, and the machine-learning algorithm Extreme Gradient Boosting (XGBoost) was applied for cancer prediction. Results The results showed that amplifications tended to emerge on chromosomes 3q, 8q, 12p, and 7q. Deletions were frequently detected on chromosomes 22q, 3p, 5q, 16q, 10q, and 15q. Upon applying a trained XGBoost classifier, specificity and sensitivity of 100% were finally achieved in the test group (12 patients and 13 healthy controls). In addition, five-fold cross-validation proved the stability of the model. Finally, our results suggested that the integration of four arm-level CNVs and the concentration of cfDNA into the trained XGBoost classifier provides a potential method for detecting lung cancer. Conclusion Our results suggested that the integration of four arm-level CNVs and the concentration from of cfDNA integrated withinto the trained XGBoost classifier could become provides a potentially method for detecting lung cancer detection. Key points Significant findings of the study: Healthy individuals have different arm-level CNV profiles from cancer patients. Amplifications tend to emerge on chromosome 3q, 8q, 12p, 7q and deletions tend to emerge on chromosome 22q, 3p, 5q, 16q, 10q, 15q. What this study adds: CfDNA concentration, arm 10q, 3q, 8q, 3p, and 22q are key features for prediction. Trained XGBoost classifier is a potential method for lung cancer detection.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 50 条
  • [31] Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost)
    Taskin Kavzoglu
    Alihan Teke
    Arabian Journal for Science and Engineering, 2022, 47 : 7367 - 7385
  • [32] Combined Analysis with Copy Number Variation Identifies Risk Loci in Lung Cancer
    Li, Xinlei
    Chen, Xianfeng
    Hu, Guohong
    Liu, Yang
    Zhang, Zhenguo
    Wang, Ping
    Zhou, You
    Yi, Xianfu
    Zhang, Jie
    Zhu, Yufei
    Wei, Zejun
    Yuan, Fei
    Zhao, Guoping
    Zhu, Jun
    Hu, Landian
    Kong, Xiangyin
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [33] Molecular atlas of copy number variation(CNV) in lung cancer with brain metastases
    Zhang, X.
    Han, T.
    Guo, D.
    Kong, R.
    Chen, S.
    Ding, R.
    Deng, W.
    Bu, F.
    ANNALS OF ONCOLOGY, 2023, 34 : S266 - S266
  • [34] Difference of Copy number variation in the blood between patients with lung cancer and control
    Heo, Jeongwon
    Heo, Yeonjeong
    Hong, Yoonki
    Han, Seon-Sook
    Cheong, Hyun Sub
    Kim, Woo Jin
    EUROPEAN RESPIRATORY JOURNAL, 2019, 54
  • [35] Deep Neural Network and Extreme Gradient Boosting Based Hybrid Classifier for Improved Prediction of Protein-Protein Interaction
    Mahapatra, Satyajit
    Gupta, Vivek Raj
    Sahu, Sitanshu Sekhar
    Panda, Ganapati
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 155 - 165
  • [36] Improvement of orbit prediction accuracy using extreme gradient boosting and principal component analysis
    Zhai, Min
    Huyan, Zongbo
    Hu, Yuanyuan
    Jiang, Yu
    Li, Hengnian
    OPEN ASTRONOMY, 2022, 31 (01) : 229 - 243
  • [37] Click through Rate Effectiveness Prediction on Mobile Ads Using Extreme Gradient Boosting
    Moneera, AlAli
    Maram, AlQahtani
    Azizah, AlJuried
    AlOnizan, Taghareed
    Alboqaytah, Dalia
    Aslam, Nida
    Khan, Irfan Ullah
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (02): : 1681 - 1696
  • [38] Multiclassification Prediction of Clay Sensitivity Using Extreme Gradient Boosting Based on Imbalanced Dataset
    Ma, Tao
    Wu, Lizhou
    Zhu, Shuairun
    Zhu, Hongzhou
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [39] Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages
    Sanders, Will
    Li, Dongfeng
    Li, Wenzhao
    Fang, Zheng N.
    WATER, 2022, 14 (05)
  • [40] Measuring distance using ultra-wideband radio technology enhanced by extreme gradient boosting decision tree (XGBoost)
    Liu, Yiming
    Liu, Lin
    Yang, Liu
    Hao, Li
    Bao, Yi
    AUTOMATION IN CONSTRUCTION, 2021, 126