The Steered Response Power using the Phase Transform weight (SRP-PHAT) has been shown to be robust in noisy and reverberant conditions. Also, volume contraction has been applied effectively to trap the global maximum for densely-hilly 3-D spaces like the SRP. However, previous methods have suffered from the presence of peaks representing multiple talkers in close proximity as is likely in a conversational cocktail-party setting. We present a volume contraction algorithm called Multi-Stage Rejection Sampling (MSRS) for detection of multiple peaks in the SRP-PHAT space. Our method not only circumvents sorting - a computationally expensive step in volume contraction algorithms - but also automatically divides a search volume into sub-volumes for robust detection of multiple peaks. We discuss some modifications to the standard SRP-PHAT functional and present results using all real-room data for baseline white-noise, an eight-speaker teleconferencing setup and a fully unconstrained cocktail-party situation containing about 21 persons in the room.