We consider the problem of driver monitoring for attention and alertness in a multimodal setup using images from RGB and infrared (IR) cameras, by employing regular vine copulas. This task is modeled as a classification problem with nine classes, which include one safe-driving and eight anomaly situations, taken from the driver monitoring dataset (DMD). The high-level representation from individual RGB and IR features are extracted using the ResNet50 architecture. These individual features exhibit classwise correlation, and we propose a fusion for classification based on the regular vine copula technique to exploit this correlation. A detailed performance comparison of various classifiers including random forest (RF), adaboost, k-nearest neighbor, support vector machines, naive Bayes, multi-layer perceptron and a combination of a linear layer and a rectified linear activation unit (LinR) is carried out. Our experiments demonstrate that copula-based approach outperforms the conventional classification with individual RGB- and IR-based features, in terms of classification accuracy. In particular, LinR outperforms all the other techniques in the noiseless case with a training accuracy of 93.80. Further, we study the effect salt-and-pepper and Gaussian noise on the classification performance, and show that the copula-based fusion architecture with RF classifier outperforms other algorithms/architectures.