Several yield-based drought tolerance indices are in use, but their efficiencies are not well documented in plant breeding programs. I therefore examined the repeatability, similarity and accuracy of these indices under different levels of drought stress (mild, moderate, severe) in durum wheat. In a 4-year experiment (2007-2010) 24 durum genotypes (breeding lines, landraces, and old and new varieties) were evaluated under rainfed and irrigated conditions in the Dryland Agricultural Research Institute of Iran (Sararood Station). Several yield-based drought tolerance indices including stress tolerance index (STI), geometric mean productivity (GMP), mean productivity (MP), tolerance index (TOL), stress susceptible index (SSI), yield stability index (YSI) and yield index (YI) were calculated based on the different levels of stress. Discrimination among the genotypes on the basis of mean values was better under severe stress than mild stress conditions. Pearson's correlation coefficients and Kendall's coefficient of concordance were used to estimate the repeatability and similarity of these indices. The coefficient of variation (CV%) derived from re-sampling correlations obtained by the bootstrap method was used to quantify the predictive efficiency of drought indices. Correlations between the estimates of drought tolerance indices from different levels of stress were not highly repeatable, suggesting that a single level of stress could not serve as a basis to quantify drought tolerance in durum genotypes. MP followed by STI and GMP showed moderate repeatability with high accuracy, indicating that their efficiency in screening genotypes is not highly dependent on the nature of stress. SSI, TOL and YSI with variable concordance values were found to be inaccurate indices for identifying drought tolerant genotypes.