Key PointsQuestionIs the diagnostic performance of a common thyroid nodule gene expression classifier in the initial validation study consistent with results of postmarketing studies? FindingsIn this systematic review and meta-analysis of 19 studies involving 2568 cytologically indeterminate thyroid nodules, the diagnostic performance of the gene expression classifier reported in the initial validation study could not explain the results in subsequent publications and was significantly different for atypia or follicular lesion of undetermined significance compared with follicular neoplasm specimens. MeaningThe initial validation study cohort did not appear to be representative of the populations to which the gene expression classifier has subsequently been applied. ImportanceIn the United States, the most used molecular test for the evaluation of cytologically indeterminate thyroid nodules is the Afirma gene expression classifier (GEC). ObjectiveTo evaluate the GEC's diagnostic performance through a novel approach to assess whether the findings of the initial validation study are consistent with the results of postmarketing studies. Data SourcesPubMed was systematically searched from inception through October 26, 2017, using the terms gene expression classifier or Afirma or GEC and thyroid. Study SelectionStudies included were those in which the GEC diagnostic performance could be calculated on consecutively resected cytologically indeterminate thyroid nodules. Data Extraction and SynthesisTwo observers independently assessed study eligibility and risk of bias using the quality assessment tool for observational cohort and cross-sectional studies of the National Heart, Lung, and Blood Institute. Summary data were extracted by a reviewer and reviewed independently by another. Study authors were contacted if missing data were needed. Data were pooled using a random-effects model. PRISMA and MOOSE guidelines were followed. Main Outcomes and MeasuresEvaluation of the linear correlation between the benign call rate (BCR) and the positive predictive value (PPV). ResultsOf the 137 retrieved titles, 19 (13.9%) were included, comprising a total of 2568 thyroid nodules. Based on a simulation using the sensitivity and specificity reported in the initial validation study, the observed BCR and PPV values in postmarketing studies would have to be explained by different underlying prevalence rates of cancer (15% vs 30%), which is an impossible event. Furthermore, the overall correlation between BCR and PPV for independent studies fell outside the PPV 95% CI of the initial validation study (95% CI, 0.17-0.32) at the BCR of pooled independent studies (0.45) and was just at the limit of the BCR 95% CI of the initial validation study (95% CI, 0.32-0.45) at the PPV of pooled independent studies (0.45). The diagnostic performance was statistically significantly better for atypia or follicular lesions of undetermined significance (diagnostic odds ratio [DOR], 5.67; 95% CI, 4.23-7.60) compared with follicular neoplasms (DOR, 2.24; 95% CI, 1.45-3.47). Conclusions and RelevanceThe findings suggest that the initial validation study cohort was not representative of the populations in whom the GEC has been used, calling into question its reported diagnostic performance, including its negative predictive value. This systematic review and meta-analysis uses PubMed to find postmarketing studies of a common molecular test for cytologically indeterminate thyroid nodules in an effort to clinically validate the test.