Generative artificial intelligence (AI), particularly ChatGPT, is revolutionizing various sectors, from exercise applications to accounting software, politics, and pharmaceuticals. As versatile aerial vehicles, drones have broad applications in videography, military operations, and surveying. However, their programming and optimal utilization often require extensive training. This research tackles these challenges by utilizing ChatGPT's sophisticated logic and prompt training features to enable drones to operate autonomously in various settings, ranging from everyday tasks to emergencies like search and rescue missions. Enhancing Microsoft Research's PromptCraft robotics, the project integrates innovative algorithms and GPT4-Vision, improving command efficiency, speed, and accuracy. This integration also leverages additional sensor data feedback, allowing the drones to process user prompts with enhanced contextual understanding. Initial results show a significant improvement in command response times and accuracy, enabling the drones to interpret and execute complex voice commands in various environments. This paper presents a multimodal framework that enriches the capabilities of voice-controlled robotic systems and broadens the scope of AI applications in real-time systems, laying the groundwork for customized AI-driven systems, including robots tailored for diverse applications and the shift towards AGI.