YOLO (You Only Look Once), the 2D object detection method, is extremely fast since a single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. However, it makes more localization errors and its training velocity is relatively slow. Benefiting from the thoughts of cluster center in super-pixel segmentation and anchor box in Faster R-CNN, in this paper, we propose a modified method based on YOLO (shorted for M-YOLO). First, we substituted YOLOs last fully connected layer for a convolutional layer, on which the cluster boxes (some anchor boxes centered on cluster center) can completely cover the whole image at the beginning of training. As a result, the new structure can speed up the training process. Second, we increase the number of divided grids i.e. cluster centers, from 7 x 7 to the maximum 17 x 17, as well as the number of predicted bounding boxes, i.e. anchor boxes, from 2 to the maximum 9 for each grid cell. The measure can improve the IOU performance. Simultaneously, we also put forward a new kind of NMS (non-max suppression) to solve the problem aroused by M-YOLO. The experimental results show that M-YOLO improves the localization accuracy by about 10%, the convergence speed of the training process is also improved.