Due to the increasing popularity of microblogging platforms, the amount of messages (posts) related to public events, especially posts encompassing multimedia content, is steadily increasing. The inclusion of images can convey much more information about the event, compared to their text, which is typically very short (e.g., tweets). Although such messages can be quite informative regarding different aspects of the event, there is a lot of spam and redundancy making it challenging to extract pertinent insights. In this work, we describe a summarization framework that, given a set of social media messages about an event, aims to select a subset of images derived from them, that, at the same time, maximizes the relevance of the selected images and minimizes their redundancy. To this end, we propose a topic modelling technique to capture the relevance of messages to event topics and a graph-based algorithm to produce a diverse ranking of the selected high-relevance images. A user-centred evaluation on a large Twitter dataset around several real-world events demonstrates that the proposed method considerably outperforms a number of state-of-the-art summarization algorithms in terms of result relevance, while at the same time it is also highly competitive in terms of diversity. Namely, we get an improvement of 25% in terms of precision compared to the second best result, and 7% in terms of diversity.