Foreign object detection in Social Internet of Things (SIoT) systems is a challenging task due to high positioning accuracy requirements, unbalanced samples, and interference from environmental features. So, this paper presents a DETR-based algorithm for environmental foreign object feature detection in SIoT systems. The algorithm addresses issues such as interference from other features in foreign object images, high positioning accuracy requirements, and imbalanced samples in traditional image foreign object detection algorithms. The algorithm captures global context information through the attention mechanism and realizes long-distance information fusion to extract more accurate features. In the DETR network, the detection accuracy is improved by optimizing the loss function. Additionally, a convolutional attention model is introduced to address the problem of sample imbalance and improve the significance of foreign objects in the images, enhancing their characteristic expression ability in the detection network. Results from testing the algorithm using environmental monitoring video data show that it can effectively eliminate the influence of environmental features and achieve a detection accuracy of 96.6%.