The 3-D object detection using Lidar becomes essential for subsequent vehicle decision-making and planning as part of an intelligent vehicle perception system. Voxel region convolutional neural network (RCNN) is a two-stage voxel-based 3-D object detection algorithm that is fast and accurate. However, the detection accuracy for specific categories is insufficient in complex traffic scenarios, and thus, we propose the Voxel RCNN-HA algorithm. First, in light of the shortcomings of Voxel RCNN in detecting pedestrians, a hybrid detection head is proposed to balance the advantages and disadvantages of anchor-based and anchor-free algorithms and significantly improve pedestrian detection performance while maintaining vehicle accuracy. Second, self-attention is introduced in the second stage of the algorithm and a Voxel region of interest (RoI) self-attention pooling module is developed to obtain both local and global features in RoI, which addresses the issue that the original Voxel RoI pooling module is challenging to obtain global features of large objects. On the one million scenes (ONCE) dataset, the proposed Voxel RCNN-HA achieves 66.79% mean average precision (mAP) and 11.7 frames per second (FPS), and outperforms both Voxel RCNN and CenterPoints in terms of detection accuracy. Additionally, experiments on the Waymo Open dataset and Custom-Rslidar dataset further validate the effectiveness and generalization of the proposed method.