Although Javanese language can be considered as a popular local language in Indonesia, available tools to support Javanese language learning are still very limited. Similar to any other languages, Javanese language also has homograph words, so different pronunciations for same word can produce different meanings. This condition then may cause confusion for Javanese language learners since one of the uniqueness of Javanese language lies in the vowels and consonants pronunciation. To overcome this problem, a method to help the classification of Javanese vowels sound with convolutional neural network (CNN) is proposed. To extract the features from the audio data, Mel-frequency spectral coefficients (MFSC) is used rather than standard Mel-frequency cepstral coefficients (MFCC). The proposed CNN architecture model consists of three convolutional and pooling layers, a fully connected layer, and a logistic regression unit as an output layer. Experiment results show that CNN with dropout regularization is able to classify five types of Javanese vowels sound with accuracy of 94%.