In recent years, increasing deep hashing methods have been applied in large-scale multi-label image retrieval. However, in the existing deep network models, the extracted low-level features cannot effectively integrate the multi-level semantic information and the similarity ranking information of pairwise multi-label images into one hash coding learning scheme. Therefore, we cannot obtain an efficient and accurate index method. Motivated by this, in this paper, we proposed a novel approach adopting the cosine distance of pairwise multilabel images semantic vector to quantify existing multi-level similarity in a multi-label image. Meanwhile, we utilized the residual network to learn the final representation of multilabel images features. Finally, we constructed a deep hashing framework to extract features and generate binary codes simultaneously. On the one hand, the improved model uses a deeper network and more complex network structures to enhance the ability of low-level features extraction. On the other hand, the improved model was trained by a fine-tuning strategy, which can accelerate the convergence speed. Extensive experiments on two popular multi-label datasets demonstrate that the improved model outperforms the reference models regarding accuracy. The mean average precision is improved by 1.0432 and 1.1114 times on two datasets, respectively.