Accurate detection of tea shoots in the field conditions is vital to intelligent tea picking, field management, yield prediction, or phenotypic analysis. In the strategic planning of tea plantation construction, the multiple cultivars planting pattern is usually adopted for regulating production peaks and mitigating losses caused by pests, diseases, or natural disasters. Previous studies on tea shoots detection are mostly focused on the single cultivar, with limited exploration into multi-cultivar. Therefore, the study utilized deep learning methods to investigate detecting multi-cultivar tea shoots and optimized the training strategy to improve the detection performance. Firstly, images of tea shoots of three cultivars were acquired under field conditions to construct different types of datasets based on labeling methods and contained cultivars (named as: multi-cultivar and hybrid label dataset, multi-cultivar and multi -label dataset, and single cultivar datasets). Secondly, several tea shoots detection models based on the Faster RCNN and YOLO series were built and analyzed to determine the baseline network model. To improve the detection performance, the training parameters of the baseline network model were optimized. Meanwhile, the training effects of different models based on three types of datasets and transfer learning with various pre -training weights were compared. The study results show that YOLOv7 achieved the highest detection performance with the mean average precision of 82.4 %, while consuming fewer computational resources. The giga floating-point operations per second was 105.1 and frames per second was 59.5. The model trained on multi-cultivar and multi -label dataset with a batch size of 2, 200 epochs and pre -training weights derived from the MS COCO dataset, achieved a mean average precision of 87.1 % in detecting multi-cultivar tea shoots. Optimal training strategy was made through optimizing training parameters, multi -label method and transfer learning, which significantly improved the detection performance of the network model, and the mean average precision increased 4.7 %. The study achieved accurate detection of multi-cultivar tea shoots, providing technical and theoretical support for the development of intelligent tea field management and mechanized picking.