Intrusion detection in Internet of Things (IoT) networks is essential to identify and mitigate security breaches and unauthorized access to connected devices. As IoT devices continue to advance, securing interconnected systems against malicious attacks is essential to ensure data privacy, system integrity, and user safety. However, traditional intrusion detection systems (IDSs) often struggle to adapt to novel and evolving threats. Developing a system that can autonomously learn and enhance intrusion detection capabilities to effectively identify and mitigate emerging threats is challenging. To address this issue, we propose a reinforcement learning (RL)-based approach for enhancing cybersecurity in IoT networks. First, various preprocessing techniques such as handling missing values, outliers, and min-max scaling normalization were applied. Then, domain-specific features related to network traffic patterns, including packet size distribution, packet count, and packet rate, were extracted. Statistical measures, including mean, variance, and entropy, were calculated from these features to determine temporal and spatial variations in the network data. Moreover, deep learning models, including long short-term memory networks and convolutional neural networks (CNNs), were used to automatically extract high-level features from raw data such as network logs or sensor readings. The extracted feature sets were concatenated within a feature fusion layer that enabled efficient dimensionality reduction via principal component analysis. A hybrid optimization approach was also introduced for feature selection using green anaconda optimization and the chaotic learning osprey optimization algorithm. Furthermore, an RL-based IDS (RL-IDS) that incorporates RNNs and autoencoders with a deep Q-network was proposed to enhance threat detection. Experimental outcomes emphasize the superiority of the method, obtaining 99.45% accuracy with 70% training data and 99.80% with 80%. Precision and F-measure are 99.99% and 99.89%, respectively, outperforming current models like CNN, DNN, LSTM, and GAN-DRL, while efficiently reducing false positives and false negatives.