Automatic Speech-to-Speech Translation (S2ST) is crucial for VR, providing immersive experiences and global accessibility. For this task, cascade pipelines are often used, but it faces challenges in low-resource languages due to data scarcity, complexity, and maintenance, meanwhile end-to-end models, though promising, are still in early development. This study explores the latest SeamlessM4T model, an end-to-end S2ST architecture showing great potential for VR applications, and discusses its strengths and limitations in the context of educational VR for low-resource languages.