Multiple interaction modes, such as gestures and physical objects, coexist in the process of AR (Augmented Reality) embodied cognition. On the basis of the unified representation of AR embodied cognition and the complementary advantages of different modal information, multimodal interaction information can be semantically aligned and organically fused to achieve efficient and reliable transmission of uncertain or ambiguous interaction information between humans and AR systems. However, the impact mechanism of multimodal fusion natural interactions on higher-level interaction intention understanding, higher-quality cognitive behaviour utility, and lower interaction cognitive load in dynamic and complex AR embodied cognitive scenes is uncertain and limited in existing research. This article provides an in-depth analysis of the intervention and regulatory mechanism of multimodal fusion natural interactions on AR embodied cognition. By rational organization and redesign for AR multimodal fusion interactions methods, AR-EMFNI (AR Embodied Multimodal Fusion Natural Interactions) method is proposed that includes five stages: deep interaction intention knowledge base construction, interaction mode enhancement fusion, real interaction intention reasoning, trust evaluation and optimization, and interaction task assistance guidance. A total of 109 participants were recruited from the majoring in Rail Transit Signal and Control, and five AR multimodal interactive situation systems and 3D-printed embodied cognitive interactive behaviour systems were developed using the ZD6 switch machine assembly interactive task forms. The experiments were designed to collect four types of data: knowledge acquisition, interactive intention reasoning, embodied interactive behaviour, and questionnaires. ANOVA was conducted with the AR interactive mode as the moderating variable. The experimental results indicate that AR gesture interaction enhances learners' natural closeness and direct presence in AR interaction behaviour, but robust recognition of AR learners' dynamic and complex hand interaction actions is needed. AR physical interaction improves the coverage and systematic of knowledge transfer, but limits the selective construction and interactive presentation of AR embodied cognitive content. AR touch interaction reduces learners' cognitive load on AR devices, but reasonable implicit interaction parameter settings need to be considered during the AR interaction process. AR-SMFI (AR Simple Multimodal Fusion Interaction) interaction indicates that AR multimodal humancomputer interaction is not simply the superposition and aggregation of input information, but rather the mutual supplementation and organic integration of information between different interaction modalities. Compared with the other four AR interaction methods, AR-EMFNI interaction improved cognitive test scores performance by 15%, achieved 81% accuracy in intention reasoning, and reduced the cognitive load by 36%, which effectively solves the concurrent transmission problem of uncertainty and ambiguity multimodal interaction information in AR embodied cognition, and which effectively promotes higher-level knowledge transfer, higher-quality embodied interaction behaviour, and stronger flow experience in AR embodied cognition.