Capacitive sensing presents a prominent technology offering various advantages over existing sensing systems, where its adoption spans various domains. While touch-based capacitive sensing has been widely utilized, noncontact capacitive sensing encounters significant challenges due to the internal complexity of the signal and susceptibility to external interference from peripheral objects and environmental conditions. To address these challenges, our study introduces a novel capacitive proximity sensing-based model for recognizing hand motions, tailored to interpret the decision-making process of the classification model. We leverage this model to design an end-to-end framework that controls the device interface through ergonomic interaction facilitated by capacitive proximity sensing. Our framework encompasses various stages, including signal extraction, signal processing, motion detection, motion-frame extraction, motion classification, and interface control. These stages undergo rigorous examination and comparison, demonstrating noteworthy performance metrics, notably a motion detection rate of 98.6%, a motion-frame extraction rate of 98.4%, and a classification accuracy of 99.37%, measured in real-time operation and ascertained from an extensive dataset of 1000 motion data samples. In addition, we introduce an interpretative methodology assessing sensor contributions in recognizing motions across six deep learning models, yielding consistent results and sharing consensus insights of their learning mechanisms. As part of our contributions, we publicly make available the "CAPPRO" dataset, consisting of 1000 hand motions obtained from four subjects, encompassing ten motion types. Our framework underscores the potential of capacitive proximity sensing within application domains, marking a significant advance toward accurate and interpretable hand motion classification for interface control technology.