The integration of language and vision capabilities in computers can be seen purely as a multi-media task without any theoretical assumptions being required. However, it is worth exploring whether the modalities have anything serious in common, in particular in the light of claim that most non-technical language use is metaphorical. What consequences would that have for the underlying relationship of language and vision: is it possible that vision is largely metaphorical? The conclusion (see also, Wilks 1978b and Wilks and Okada (in press) is that visual processing can embody structural ambiguity (whether compositional or not), but not anything analogous to metaphor. Metaphor is essentially connected with the extension of sense and only symbols can have senses. But if it makes no sense to say a figure can be metaphorical (unless it embodies symbolic elements) that must also mean, alas, that it makes no sense to say it is literally anything either. Only a symbol can be literally something. A hat is a hat is a hat, but never, ever literally so.