Nowadays, the technologies in computer vision (CV) are labor-saving and convenient to identify human malicious behaviors. However, they usually fail to consider the robustness, generalization and interpretability of calculation frameworks. In this paper, a very common but sometimes difficult-to-detect case research called armed boundary sabotage is conducted, which is achieved by computer vision module (CVM) and reasoning module (RM). Among them, CVM is used for extracting the key information from raw videos, while RM is applied to obtain the final reasoning results. Considering the transient and confusing properties in such scenarios, a specific humanobject interaction analysis process with soft constraint is proposed in CVM. In addition, two reasoning methods which are data-based reasoning method and language-based reasoning methods are implemented in RM. The results show that the human-object interaction analysis process with soft constraint prove to be effective and practical, while the optimal testing accuracy achieves 0.7871. Furthermore, the two proposed reasoning methods are promising for identification of human malicious behaviors. Among them, the advanced language-based reasoning method outperforms others, with highest precision value of 0.8750 and perfect recall value of 1.0000. Besides, these proposals are also verified to be high-performance in other external intrusion scenarios of our previous work. Finally, our research also obtain state-of-the-art results by comparing with other related works.