Both clinical images and metadata are the foundation of clinical diagnosis, effectively fusing these two resources is a major difficulty in the detection of skin cancer. Even though existing fusion methods produced better fusion outcomes, they only carried out single-level fusion prior to making decisions and used distinct feature extraction for each modal data. The ability of inter-modal synergy is diminished by this fusion strategy, resulting in coarse fusion features. To enhance the multidimensional representation of images, we suggest a Self-contrastive Feature Guidance Based Multidimensional Collaborative Network (SGMC Net). Specifically, we split the fusion method into three steps: spatial dimension fusion, channel dimension fusion, and adaptive corrective outputting to establish multidimensional collaboration between metadata and image features in the feature extraction process. Accordingly, we build three blocks: channel fusion block, spatial fusion block, and feature rectification block. On this basis, we propose a Self-contrastive Feature Guidance method that utilizes the contrast loss between shallow and deep features of the image as a supervisory signal in a non-enhanced manner to optimize shallow features. Finally, extensive experiments were conducted on PAD-UFES-20 and Der7pt dataset, our method achieved an accuracy of 83.3% beyond other state-of-the-art models. We further validated the effectiveness of the feature guidance method, showing a 5.2% improvement in accuracy for SGMC18.