Evaluation of prediction and classification performances in different machine learning models for patient-specific quality assurance of head-and-neck VMAT plans

被引：15

作者：

Kusunoki, Terufumi ^{[1
,2
]}

Hatanaka, Shogo ^{[2
]}

Hariu, Masatsugu ^{[2
]}

Kusano, Yohsuke ^{[1
]}

Yoshida, Daisaku ^{[1
,3
]}

Katoh, Hiroyuki ^{[3
]}

Shimbo, Munefumi ^{[2
]}

Takahashi, Takeo ^{[2
]}

机构：

[1] Kanagawa Canc Ctr, Sect Med Phys & Engn, Yokohama, Kanagawa, Japan

[2] Saitama Med Univ, Saitama Med Ctr, Dept Radiat Oncol, Kawagoe, Saitama, Japan

[3] Kanagawa Canc Ctr, Dept Radiat Oncol, Yokohama, Kanagawa, Japan

来源：

MEDICAL PHYSICS | 2022年 / 49卷 / 01期

关键词：

classification; machine learning models; prediction; quality assurance; VMAT; VOLUMETRIC-MODULATED ARC; GAMMA PASSING RATE; RADIATION-THERAPY; IMRT; COMPLEXITY; TRANSMISSION; RADIOTHERAPY; DOSIMETRY; INDEX; BEAM;

D O I：

10.1002/mp.15393

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Purpose The purpose of this study is to evaluate the prediction and classification performances of the gamma passing rate (GPR) for different machine learning models and to select the best model for achieving machine learning-based patient-specific quality assurance (PSQA). Methods The measurement verification of 356 head-and-neck volumetric modulated arc therapy plans was performed using a diode array phantom (Delta4 Phantom), and GPR values at 2%/2 mm with global normalization and 3%/2 mm with local normalization were calculated. Machine learning models, including ridge regression (RIDGE), random forest (RF), support vector regression (SVR), and stacked generalization (STACKING), were used to predict the GPR. Each machine learning model was trained using 260 plans, and the prediction accuracy was evaluated using the remaining 96 plans. The prediction error between the measured and predicted GPR was evaluated. For the classification evaluation, the lower control limit for the measured GPR and lower control limit for predicted GPR (LCLp) was defined to identify whether the GPR values represent a "pass" or a "fail." LCLp values with 99% and 99.9% confidence levels were calculated as the upper prediction limits for the GPR estimated from the linear regression between the measured and predicted GPR. Results There was an overestimation trend of the low measured GPR. The maximum prediction errors for RIDGE, RF, SVR, and STACKING were 3.2%, 2.9%, 2.3%, and 2.2% at the global 2%/2 mm and 6.3%, 6.6%, 6.1%, and 5.5% at the local 3%/2 mm, respectively. In the global 2%/2 mm, the sensitivity was 100% for all the machine learning models except RIDGE when using 99% LCLp. The specificity was 76.1% for RIDGE, RF, and SVR and 66.3% for STACKING; however, the specificity decreased dramatically when 99.9% LCLp was used. In the local 3%/2 mm, however, only STACKING showed 100% sensitivity when using 99% LCLp. The decrease in the specificity using 99.9% LCLp was smaller than that in the global 2%/2 mm, and the specificity for RIDGE, RF, SVR, and STACKING was 61.3%, 61.3%, 72.0%, and 66.8%, respectively. Conclusions STACKING had better prediction accuracy for low GPR values than other machine learning models. Applying LCLp to a regression model enabled the consistent evaluation of quantitative and qualitative GPR predictions. Adjusting the confidence level of the LCLp helped improve the balance between the sensitivity and specificity. We suggest that STACKING can assist the safe and efficient operation of PSQA.

引用

页码：727 / 741

页数：15

共 49 条

[1] Use of metrics to quantify IMRT and VMAT treatment plan complexity: A systematic review and perspectives
Antoine, Mikael
Ralite, Flavien
Soustiel, Charles
Marsac, Thomas
Sargos, Paul
Cugny, Audrey
Caron, Jerome
[J]. PHYSICA MEDICA-EUROPEAN JOURNAL OF MEDICAL PHYSICS, 2019, 64 : 98 - 108
[2] Evaluation of the Delta4 phantom for IMRT and VMAT verification
Bedford, James L.
Lee, Young K.
Wai, Philip
South, Christopher P.
Warrington, Alan P.
[J]. PHYSICS IN MEDICINE AND BIOLOGY, 2009, 54 (09) : N167 - N176
[3] Breiman L, 1996, MACH LEARN, V24, P49
[4] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[5] Integration of AI and Machine Learning in Radiotherapy QA
Chan, Maria F.
Witztum, Alon
Valdes, Gilmer
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3
[6] Examination of the properties of IMRT and VMAT beams and evaluation against pre-treatment quality assurance results
Crowe, S. B.
Kairn, T.
Middlebrook, N.
Sutherland, B.
Hill, B.
Kenny, J.
Langton, C. M.
Trapp, J. V.
[J]. PHYSICS IN MEDICINE AND BIOLOGY, 2015, 60 (06) : 2587 - 2601
[7] Quantification of beam complexity in intensity-modulated radiation therapy treatment plans
Du, Weiliang
Cho, Sang Hyun
Zhang, Xiaodong
Hoffman, Karen E.
Kudchadker, Rajat J.
[J]. MEDICAL PHYSICS, 2014, 41 (02)
[8] IMRT commissioning: Multiple institution planning and dosimetry comparisons, a report from AAPM Task Group 119
Ezzell, Gary A.
Burmeister, Jay W.
Dogan, Nesrin
LoSasso, Thomas J.
Mechalakos, James G.
Mihailidis, Dimitris
Molineu, Andrea
Palta, Jatinder R.
Ramsey, Chester R.
Salter, Bill J.
Shi, Jie
Xia, Ping
Yue, Ning J.
Xiao, Ying
[J]. MEDICAL PHYSICS, 2009, 36 (11) : 5359 - 5373
[9] A closer look at RapidArc® radiosurgery plans using very small fields
Fog, Lotte S.
Rasmussen, Jens F. B.
Aznar, Marianne
Kjaer-Kristoffersen, Flemming
Vogelius, Ivan R.
Engelholm, Svend Aage
Bangsgaard, Jens Peter
[J]. PHYSICS IN MEDICINE AND BIOLOGY, 2011, 56 (06) : 1853 - 1863
[10] Predicting VMAT patient-specific QA results using a support vector classifier trained on treatment plan characteristics and linac QC metrics
Granville, Dal A.
Sutherland, Justin G.
Belec, Jason G.
La Russa, Daniel J.
[J]. PHYSICS IN MEDICINE AND BIOLOGY, 2019, 64 (09)

← 1 2 3 4 5 →