Evaluation of prediction and classification performances in different machine learning models for patient-specific quality assurance of head-and-neck VMAT plans

被引:15
作者
Kusunoki, Terufumi [1 ,2 ]
Hatanaka, Shogo [2 ]
Hariu, Masatsugu [2 ]
Kusano, Yohsuke [1 ]
Yoshida, Daisaku [1 ,3 ]
Katoh, Hiroyuki [3 ]
Shimbo, Munefumi [2 ]
Takahashi, Takeo [2 ]
机构
[1] Kanagawa Canc Ctr, Sect Med Phys & Engn, Yokohama, Kanagawa, Japan
[2] Saitama Med Univ, Saitama Med Ctr, Dept Radiat Oncol, Kawagoe, Saitama, Japan
[3] Kanagawa Canc Ctr, Dept Radiat Oncol, Yokohama, Kanagawa, Japan
关键词
classification; machine learning models; prediction; quality assurance; VMAT; VOLUMETRIC-MODULATED ARC; GAMMA PASSING RATE; RADIATION-THERAPY; IMRT; COMPLEXITY; TRANSMISSION; RADIOTHERAPY; DOSIMETRY; INDEX; BEAM;
D O I
10.1002/mp.15393
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose The purpose of this study is to evaluate the prediction and classification performances of the gamma passing rate (GPR) for different machine learning models and to select the best model for achieving machine learning-based patient-specific quality assurance (PSQA). Methods The measurement verification of 356 head-and-neck volumetric modulated arc therapy plans was performed using a diode array phantom (Delta4 Phantom), and GPR values at 2%/2 mm with global normalization and 3%/2 mm with local normalization were calculated. Machine learning models, including ridge regression (RIDGE), random forest (RF), support vector regression (SVR), and stacked generalization (STACKING), were used to predict the GPR. Each machine learning model was trained using 260 plans, and the prediction accuracy was evaluated using the remaining 96 plans. The prediction error between the measured and predicted GPR was evaluated. For the classification evaluation, the lower control limit for the measured GPR and lower control limit for predicted GPR (LCLp) was defined to identify whether the GPR values represent a "pass" or a "fail." LCLp values with 99% and 99.9% confidence levels were calculated as the upper prediction limits for the GPR estimated from the linear regression between the measured and predicted GPR. Results There was an overestimation trend of the low measured GPR. The maximum prediction errors for RIDGE, RF, SVR, and STACKING were 3.2%, 2.9%, 2.3%, and 2.2% at the global 2%/2 mm and 6.3%, 6.6%, 6.1%, and 5.5% at the local 3%/2 mm, respectively. In the global 2%/2 mm, the sensitivity was 100% for all the machine learning models except RIDGE when using 99% LCLp. The specificity was 76.1% for RIDGE, RF, and SVR and 66.3% for STACKING; however, the specificity decreased dramatically when 99.9% LCLp was used. In the local 3%/2 mm, however, only STACKING showed 100% sensitivity when using 99% LCLp. The decrease in the specificity using 99.9% LCLp was smaller than that in the global 2%/2 mm, and the specificity for RIDGE, RF, SVR, and STACKING was 61.3%, 61.3%, 72.0%, and 66.8%, respectively. Conclusions STACKING had better prediction accuracy for low GPR values than other machine learning models. Applying LCLp to a regression model enabled the consistent evaluation of quantitative and qualitative GPR predictions. Adjusting the confidence level of the LCLp helped improve the balance between the sensitivity and specificity. We suggest that STACKING can assist the safe and efficient operation of PSQA.
引用
收藏
页码:727 / 741
页数:15
相关论文
共 49 条
  • [1] Use of metrics to quantify IMRT and VMAT treatment plan complexity: A systematic review and perspectives
    Antoine, Mikael
    Ralite, Flavien
    Soustiel, Charles
    Marsac, Thomas
    Sargos, Paul
    Cugny, Audrey
    Caron, Jerome
    [J]. PHYSICA MEDICA-EUROPEAN JOURNAL OF MEDICAL PHYSICS, 2019, 64 : 98 - 108
  • [2] Evaluation of the Delta4 phantom for IMRT and VMAT verification
    Bedford, James L.
    Lee, Young K.
    Wai, Philip
    South, Christopher P.
    Warrington, Alan P.
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 2009, 54 (09) : N167 - N176
  • [3] Breiman L, 1996, MACH LEARN, V24, P49
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Integration of AI and Machine Learning in Radiotherapy QA
    Chan, Maria F.
    Witztum, Alon
    Valdes, Gilmer
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3
  • [6] Examination of the properties of IMRT and VMAT beams and evaluation against pre-treatment quality assurance results
    Crowe, S. B.
    Kairn, T.
    Middlebrook, N.
    Sutherland, B.
    Hill, B.
    Kenny, J.
    Langton, C. M.
    Trapp, J. V.
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 2015, 60 (06) : 2587 - 2601
  • [7] Quantification of beam complexity in intensity-modulated radiation therapy treatment plans
    Du, Weiliang
    Cho, Sang Hyun
    Zhang, Xiaodong
    Hoffman, Karen E.
    Kudchadker, Rajat J.
    [J]. MEDICAL PHYSICS, 2014, 41 (02)
  • [8] IMRT commissioning: Multiple institution planning and dosimetry comparisons, a report from AAPM Task Group 119
    Ezzell, Gary A.
    Burmeister, Jay W.
    Dogan, Nesrin
    LoSasso, Thomas J.
    Mechalakos, James G.
    Mihailidis, Dimitris
    Molineu, Andrea
    Palta, Jatinder R.
    Ramsey, Chester R.
    Salter, Bill J.
    Shi, Jie
    Xia, Ping
    Yue, Ning J.
    Xiao, Ying
    [J]. MEDICAL PHYSICS, 2009, 36 (11) : 5359 - 5373
  • [9] A closer look at RapidArc® radiosurgery plans using very small fields
    Fog, Lotte S.
    Rasmussen, Jens F. B.
    Aznar, Marianne
    Kjaer-Kristoffersen, Flemming
    Vogelius, Ivan R.
    Engelholm, Svend Aage
    Bangsgaard, Jens Peter
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 2011, 56 (06) : 1853 - 1863
  • [10] Predicting VMAT patient-specific QA results using a support vector classifier trained on treatment plan characteristics and linac QC metrics
    Granville, Dal A.
    Sutherland, Justin G.
    Belec, Jason G.
    La Russa, Daniel J.
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 2019, 64 (09)