Multimodal interfaces are inherently flexible, which is a key feature that makes them suitable for both universal access and next-generation mobile computing. Recent studies also have demonstrated that multimodal architectures can improve the performance stability and overall robustness of the recognition-based component technologies they incorporate (e.g., speech, vision, pen input). This paper reviews data from two recent studies in which a multimodal architecture suppressed errors and stabilized system performance for accented speakers and during mobile use. It concludes with a discussion of key issues in the design of future multimodal interfaces for diverse user groups.