Emotion recognition finds broad applications across psychology, computer science, and artificial intelligence. However, the intricacies of emotional states pose challenges, rendering single modality-based emotion recognition approaches less robust. In this study, we introduce a multi-view features fusion algorithm leveraging convolutional neural networks (CNNs) for enhanced emotion detection. Initially, imaging photoplethysmography (IPPG) signals are extracted from face videos. Subsequently, we employ heart rate variability (HRV) for feature extraction and deploy branch convolutional neural networks to achieve multi-view representation of emotional attributes within IPPG and facial video signals. Our methodology was validated using the DEAP public dataset, demonstrating that our approach attained accuracies of 72.37% and 70.82% for arousal and valence dimensions, respectively. Notably, our multi-view strategy enhances emotion recognition accuracy by 7.23% and 5.31% for arousal and valence, respectively, when contrasted with methodologies relying solely on facial expressions. This advancement underscores our method's capability to capture multimodal emotional expressions through facial videos without the necessity for additional sensors, thus significantly elevating the precision and robustness of emotion recognition endeavors.