Objectives To develop and test AI-integrated biopsy avoidance strategies to improve the specificity of screening breast ultrasound (US). Materials and methods This retrospective study included consecutive asymptomatic women with BI-RADS 3, 4a, 4b, 4c, or 5 masses on screening breast US exams acquired from two hospitals between December 2019 and December 2020 (development cohort) and June 2020 and December 2020 (external validation cohort). If more than one lesion was present, the most suspicious lesion was analyzed. Logistic regression was used to develop the AI-integrated biopsy avoidance strategies in which BI-RADS 4a masses were downgraded to BI-RADS 3 if the AI classifications were "both planes benign" in all women or "benign and malignant" in the women <= 45 years of age. Diagnostic performance metrics were calculated for both cohorts and compared to initial assessments by radiologists using the Wilcoxon rank-sum test for noninferiority of sensitivity (relative noninferiority margin, 5%) and the McNemar test for specificity. Results The development and external validation cohorts consisted of 393 women (median age, 45 years [IQR, 40-50 years]) with 101 malignancies and 166 women (median age, 47 years [IQR, 42-51 years]) with 31 malignancies, respectively. The developed strategy improved specificity from 53.3% (72/135; 95% CI: 45.0, 62.1) to 80.7% (109/135; [95% CI: 74.2, 87.5]; p < 0.001) while maintaining sensitivity (both 100% [31/31; 95% CI: 98.9, 100]), and would have avoided 61.7% (37/60 [95% CI: 48.2, 73.7]) of benign biopsies of BI-RADS 4a masses in the external validation cohort. Conclusion A strategy integrating AI classification in two orthogonal planes, age, and BI-RADS classification improved the specificity of screening breast US while maintaining non-inferior sensitivity.