Artificial pollination can considerably improve pollination success and boost chilli pepper fruit set and quality when grown in enclosed environments (e.g., greenhouses). Artificial pollination, on the other hand, raises production costs while also necessitating specific operating abilities. The precise and efficient identification of pepper blossoms is a critical step in the development of robotic pollinators or pollination drones. In this paper, we propose a pepper flower detection method based on YOLOv8 that incorporates multi-scale, attention, and conditional information. To begin, the CBAM structure that incorporates edge information is integrated into Backbone to expand the feature extraction receptive field and facilitate the learning of long-distance dependency. The BERT model is then used to encode conditional information, which is integrated into the backbone via the ELAN layer to assist the training and inference processes. Finally, an improved MPDIoU is applied to increase detection accuracy while increasing flexibility. The experimental results show that the modification enhances the network depth and reduces the number of parameters from 4M to 2.85M, while improving the mean average accuracy (mAP) by 3.1% over the baseline approach. The study’s findings can help in crop object detection. The chilli pepper flower dataset: https://drive.google.com/file/d/1cKNie_iAzx-K4iPLQRVdyiOKV1d9zHrF/view?usp=drive_link The source code is available in https://drive.google.com/drive/folders/1ubNnKu7PWYAdUXvbs4Z2OBAVcSAQ3WLd?usp=drive_link.