Current terminals for voice and documents are rather ineffective because they lack intelligence. Newer terminals, such as the visual-telephone, increase the types of media supported by communication networks, but their functionality remains somewhat limited. The introduction of ISDN services must be accompanied by an increase in the intelligence of all terminals. This presentation addresses the evolution of voice synthesis technology and the recognition systems developed for characters, scenes, images, and human faces. We will discuss the feasibility studies performed on two prototype systems constructed by NTT.