Real-time object detection and tracking are critical for applications such as robotics and autonomous systems. Embedded platforms present challenges in balancing speed, accuracy, and efficiency. Existing approaches often struggle to achieve both high accuracy and real-time performance within resource-constrained embedded systems. The main challenge remains in balancing detection speed, tracking consistency, and hardware efficiency for practical deployment. This work proposes a deep learning (DL) framework optimized for embedded systems, ensuring high accuracy, minimal latency, and efficient resource utilization for real-world applications. The framework integrates a novel Botox Optimization Algorithm-tuned Adaptive CNN (BOA-ACNN) for real-time object recognition and tracking. The dataset comprises annotated video sequences capturing diverse scenarios involving vehicles, pedestrians, and dynamic camera movements. The framework employs a Kalman filter for real-time motion prediction and noise smoothing, thereby enhancing tracking stability. Additionally, SIFT features are utilized to improve detection robustness under varying scales and environmental conditions. The system incorporates BOA for hyper-parameter fine-tuning and ACNN for efficient real-time detection and tracking, achieving latency of 97.5 mu s, throughput of 200.4 activation/us, precision of 96.2%, recall of 97%, F1-score of 97%, mAP of 98.4%, and overall accuracy of 98.97%. This framework facilitates real-time object identification and tracking with high accuracy and low latency on embedded devices, demonstrating superior performance for practical applications.