The increasing demand for wireless video transmission requires new transmission paradigms. This paper reviews our recent work on such a new paradigm, namely the combination of caching of popular video files on wireless devices, with device-to-device (D2D) communication, so that users can obtain files from other wireless devices in their vicinity. The D2D communication is controlled by the base station (BS), and occurs only between devices that are within a small area (cluster) in order to allow high frequency reuse. The cluster size can be optimized for maximum overall system throughput. The caching strategy of the devices can be deterministic or random. Besides numerical optimization of the clustering and caching parameters, we also determine analytical upper and lower bound for how the throughput scales with the number of devices in a cell. For highly concentrated video request distributions, the throughput scales linearly, while in other cases only a slower increase is possible.