For the classification of hyperspectral imagery (HSI), the convolutional neural network (CNN) can learn the discriminative spatial-spectral information of the image better than the traditional classification methods. However, when CNN uses the local receptive field to extract the features of HSI, it may cause the feature expression of the same pixel on the feature map to be inconsistent, and eventually cause noise in the classification results. To overcome this, we introduce the attention mechanism in the CNN model to improve the feature expressiveness. A spectral-spatial attention aggregation network (SSAAN) for HSI classification is designed, and there are two attention branches in our method. The spectral attention module with the squeeze-and-excitation (SESAM) automatically obtains the importance of each feature channel of HSI, and then enhances the useful band features and suppresses the less-useful band features according to this importance. In the spatial attention module with selective kernel (SKSAM), first, different convolution kernels of 2D-CNN are used to extract the shallow-middle-deep layer features from the principal components after dimension reduction, and the pixel spatial information from the three paths is combined and aggregated. Then, the feature maps of kernels of different sizes are aggregated according to the selection weights. Finally, the feature vectors obtained from the two branches of the spatial attention module and the spectral attention module are connected to further improve feature representation, and the classification result is obtained by the softmax function. Experimental results through three real HSI data sets show that our proposed method SSAAN achieves better performance compared to the state-of-the-art methods.