Multi-instance learning studies problems in which labels are assigned to bags that contain multiple instances. In these settings, the relations between instances and labels are usually ambiguous. In contrast, multi-task learning focuses on the output space in which an input sample is associated with multiple labels. In real world, a sample may be associated with multiple labels that are derived from observing multiple aspects of the problem. Thus many real world applications are naturally formulated as multi-instance multi-task (MIMT) problems. A common approach to MIMT is to solve it task-by-task independently under the multi-instance learning framework. On the other hand, convolutional neural networks (CNN) have demonstrated promising performance in single-instance single-label image classification tasks. However, how CNN deals with multi-instance multi-label tasks still remains an open problem. This is mainly due to the complex multiple-to-multiple relations between the input and output space. In this work, we propose a deep leaning model, known as multi-instance multi-task convolutional neural networks (MIMT-CNN), where a number of images representing a multi-task problem is taken as the inputs. Then a shared sub-CNN is connected with each input image to form instance representations. Those sub-CNN outputs are subsequently aggregated as inputs to additional convolutional layers and full connection layers to produce the ultimate multi-label predictions. This CNN model, through transfer learning from other domains, enables transfer of prior knowledge at image level learned from large single-label single-task data sets. The bag level representations in this model are hierarchically abstracted by multiple layers from instance level representations. Experimental results on mouse brain gene expression pattern annotation data show that the proposed MIMT-CNN model achieves superior performance.