Objectives To systematically review the literature on inter- and intra-rater reliability of scoring US and MRI changes in hand OA. Methods MEDLINE, EMBASE, CINHAL, Web of Science and AMED were searched from inception to January 2020. Kappa (kappa), weighted kappa (kappa(w)) and intra-class correlation coefficients for dichotomous, semi-quantitative and summated scores, respectively, and their 95% CI were pooled using a random-effects model. Heterogeneity between studies was assessed and reliability estimates were interpreted using the Landis-Koch classification. Results Fifty studies met the inclusion criteria (29 US, 17 MRI, 4 involving both modalities). The pooled kappa (95% CI) for inter-rater reliability was substantial for US-detected osteophytes [0.66 (0.54, 0.79)], grey-scale synovitis [0.64 (0.32, 0.97)] and power Doppler [0.76, (0.47, 1.05)], whereas intra-rater reliability was almost perfect for osteophytes [0.82 (0.80, 0.84)], central bone erosions (CBEs) [0.83 (0.78, 0.89)] and effusion [0.83 (0.74, 0.91)], and substantial for grey-scale synovitis [0.64 (0.49, 0.79)] and power Doppler [0.70 (0.59, 0.80)]. Inter-rater reliability for dichotomous assessment was substantial for MRI-detected CBEs [0.75 (0.67, 0.83)] and synovitis [0.69 (0.51, 0.87)], slight for osteophytes [0.14 (0.04, 0.25)], and almost perfect for sum score of osteophytes, CBEs, joint space narrowing (JSN), and bone marrow lesions (BMLs) (0.81-0.89). Intra-rater reliability was almost perfect for sum score of MRI synovitis [0.92 (0.87, 0.96)], BMLs [0.88 (0.78, 0.98)], osteophytes [0.86 (0.74, 0.98)], CBEs [0.83 (0.66, 1.00)] and JSN [0.91 (0.87, 0.91)]. Conclusion US and MRI are reliable in detecting hand OA features. US may be preferred due to low cost and increasing availability.