Proliferation of communication-intensive real-time applications with "elastic" timeliness constraints, such as streaming stored video, requires a new design for end-host communication subsystems. The design should (i) provide per-flow or per-service-class guarantees, (ii) maximize the aggregate utility of the communication service across all clients, (iii) gracefully adapt to transient overload, and (iv) avoid, if possible, starving lower-priority service classes during the period of sustained overload. We propose a QoS-optimization algorithm and communication subsystem architecture that satisfy the above requirements. Ii provides each client its contracted QoS, while adapting gracefully to transient overload load and resource shortage. A new concept of flexible QoS contract Is introduced, specifying multiple acceptable levels of service (or QoS levels for short) and their corresponding rewards for each client. Allowing clients to specify multiple QoS levels permits the server 20 perform QoS-optimization and degrade client's QoS under transient overload predictably, as specified In the QoS contract. Clients receive a money-back guarantee if the contracted QoS is violated by the server. The proposed resource-management mechanism maximizes server's total reward under resource constraints. We implemented and evaluated the architecture on a Pentium-based PC platform running under The Open Group (TOG) MK7.2 kernel, demonstrating the capability of our communication subsystem In meeting its design goals.