Studies have found non-negligible differences in cortical thickness estimates across versions of software that are used for processing and quantifying MRI-based cortical measurements, and issues have arisen regarding these differences, as obtained estimates could potentially affect the validity of the results. However, more critical for diagnostic classification than absolute thickness estimates across versions is the inter-subject stability. We aimed to investigate the effect of change in software version on classification of older persons in groups of healthy, mild cognitive impairment and Alzheimer's Disease. Using MRI samples of 100 older normal controls, 100 with mild cognitive impairment and 100 Alzheimer's Disease patients obtained from the Alzheimer's Disease Neuroimaging Initiative database, we performed a standard reconstruction processing using the FreeSurfer image analysis suite versions 4.1.0, 4.5.0 and 5.1.0. Pair-wise comparisons of cortical thickness between FreeSurfer versions revealed significant differences, ranging from 1.6% (4.1.0 vs. 4.5.0) to 5.8% (4.1.0 vs. 5.1.0) across the cortical mantle. However, change of version had very little effect on detectable differences in cortical thickness between diagnostic groups, and there were little differences in accuracy between versions when using entorhinal thickness for diagnostic classification. This lead us to conclude that differences in absolute thickness estimates across software versions in this case did not imply lacking validity, that classification results appeared reliable across software versions, and that classification results obtained in studies using different FreeSurfer versions can be reliably compared. Hum Brain Mapp 37:1831-1841, 2016. (c) 2016 Wiley Periodicals, Inc.