Previous models have investigated the impact upon diversity-and hence upon the reliability of fault-tolerant software built from 'diverse' versions-of the variation in 'difficulty' of demands over the demand space. These models are essentially static, taking a single snapshot view of the system. In this paper we consider a generalisation in which the individual versions are allowed to evolve-and their reliability to grow-through debugging. In particular, we examine the trade-off that occurs in testing between, on the one hand, the increasing reliability of individual versions, and on the other hand the possible diminution of diversity.