Our goal is to develop a robust out-of-core sorting program for a
distributed-memory cluster. The literature contains two dominant
paradigms for out-of-core sorting algorithms: merging-based and
partitioning-based. We explore a third paradigm, that of oblivious
algorithms. Unlike the two dominant paradigms, oblivious algorithms
do not depend on the input keys and therefore lead to predetermined
I/O and communication patterns in an out-of-core setting.
Predetermined I/O and communication patterns facilitate overlapping
I/O, communication, and computation for efficient implementation. We
have developed several out-of-core sorting programs using the paradigm
of oblivious algorithms. Our baseline implementation, 3-pass
columnsort, was based on Leighton's columnsort algorithm. Though
efficient in terms of I/O and communication, 3-pass columnsort has a
restriction on the maximum problem size. As our first effort toward
relaxing this restriction, we developed two implementations: subblock
columnsort and M-columnsort. Both of these implementations
incur substantial performance costs: subblock columnsort performs
additional disk I/O, and M-columnsort needs substantial
amounts of extra communication and computation. In this paper we
present slabpose columnsort, a new oblivious algorithm that we have
designed explicitly for the out-of-core setting. Slabpose columnsort
relaxes the problem-size restriction at no extra I/O or communication
cost. Experimental evidence on a Beowulf cluster shows that unlike
subblock columnsort and M-columnsort, slabpose columnsort runs
almost as fast as 3-pass columnsort. To the best of our knowledge,
our implementations are the first out-of-core multiprocessor sorting
algorithms that make no assumptions about the keys and produce output
that is perfectly load balanced and in the striped order assumed by
the Parallel Disk Model.