Parallel concatenated turbo codes, introduced a decade ago by Berrou et at, have summoned vast interest in the information theoretic channel coding research community because of their unpaired ability to effectively perform close to channel capacity limits on the AWGN channel coupled to coding and decoding feasibility. Nevertheless, the performance figures usually attained by turbo coding schemes are often obtained through a greedy choice of interleaver size (the larger the better) and a number of decoding iterations so high that the iterative turbo-decoding algorithm can be assumed to converge and the error rate performance approaches that of an hypothetic maximum likelihood estimation decoder. These assumptions explicitly conflict with the feasibility of a real-world useful implementation of an iterative turbo-decoder. This is because big interleavers require large amounts of memory and imply large decoding delays. Moreover, large interleavers obviously imply large codewords and harsh frame synchronization issues. This paper defines two complexity indexes related to turbo-decoder implementation issues, specifically power requirements in terms of compare and exchange operations and latency in terms of required storage, and shows how sensible parameters trade-off in a turbo decoder hardware design. Finally, we show a feasible sub-optimal constrained design methodology and compare our designs with some of the greedy designs available in the literature.