This article designs a framework for a music online classroom, optimizing the actions of performing robots to improve their performance. Combined with speech recognition and control algorithms, the robot's actions are adjusted in real-time based on the emotions and rhythm of the music to increase student engagement and interactivity. Music recognition adopts methods based on time-frequency analysis and pattern recognition, identifying the type of music played based on the frequency and amplitude distribution patterns in the audio signal; Voice recognition uses feature analysis and speech recognition algorithms based on speech signals to better provide targeted feedback and interaction. This article optimizes the teaching of music online classrooms and verifies the feasibility and effectiveness of using performance robots based on speech sensor recognition and artificial intelligence in music online classrooms through experiments and analysis. Performance robots can mimic the way humans perform music, including singing, playing, and dancing, helping students better understand and imitate music performances. The music online classroom application in this article combines advanced technologies such as voice sensing recognition, artificial intelligence, and performance robots, providing students with a more convenient, personalized, and efficient music learning experience.