Recent advances in network coding research dramatically changed the underlying structure of optimal multicast routing algorithms and made them efficiently computable. While most such algorithm design assume a single file/layer being multicast, layered coding introduces new challenges into the paradigm due to its cumulative decoding nature. Layered coding is designed to handle heterogeneity in receiver capacities, and a node may decode layer k only if it successfully receives all layers in I k. We show that recently proposed optimization models for layered multicast do not correctly address this challenge. Me argue that in order to achieve the absolute maximum throughput (or minimum cost), it is necessary to decouple application layer throughput from network layer throughput. In particular, a node should be able to receive a nonconsecutive layer or a partial layer even if it cannot decode and utilize it (e.g., for playback in media streaming applications). The rationale is that nodes at critical network locations need to receive data just for helping other peers. We present a mathematical programming model that addresses the above challenges and achieves the absolute optimal performance. Simulation results show considers able throughput gain (cost reduction) compared with previous models, in a broad range of network scenarios. We further generalize our model for studying the optimal progression of layer sizes. Pie show that such optimization is non-convex, and apply a Simulated Annealing algorithm to solve it, with flexible trade-off between solution quality and running time. We verify the effectiveness of the new model and the Simulated Annealing algorithm through extensive simulations, and point out insights on the relation between optimal layer sizes and node capacity distribution.