Background: Renal tumors, encompassing benign, malignant, and normal variants, represent a significant diagnostic challenge in radiology due to their overlapping visual characteristics on computed tomography (CT) scans. Manual interpretation is time consuming and susceptible to inter-observer variability, emphasizing the need for automated, reliable classification systems to support early and accurate diagnosis. Method and Materials: We propose KidneyNeXt, a custom convolutional neural network (CNN) architecture designed for the multi-class classification of renal tumors using CT imaging. The model integrates multi-branch convolutional pathways, grouped convolutions, and hierarchical feature extraction blocks to enhance representational capacity. Transfer learning with ImageNet 1K pretraining and fine tuning was employed to improve generalization across diverse datasets. Performance was evaluated on three CT datasets: a clinically curated retrospective dataset (3199 images), the Kaggle CT KIDNEY dataset (12,446 images), and the KAUH: Jordan dataset (7770 images). All images were preprocessed to 224 x 224 resolution without data augmentation and split into training, validation, and test subsets. Results: Across all datasets, KidneyNeXt demonstrated outstanding classification performance. On the clinical dataset, the model achieved 99.76% accuracy and a macro-averaged F1 score of 99.71%. On the Kaggle CT KIDNEY dataset, it reached 99.96% accuracy and a 99.94% F1 score. Finally, evaluation on the KAUH dataset yielded 99.74% accuracy and a 99.72% F1 score. The model showed strong robustness against class imbalance and inter-class similarity, with minimal misclassification rates and stable learning dynamics throughout training. Conclusions: The KidneyNeXt architecture offers a lightweight yet highly effective solution for the classification of renal tumors from CT images. Its consistently high performance across multiple datasets highlights its potential for real-world clinical deployment as a reliable decision support tool. Future work may explore the integration of clinical metadata and multimodal imaging to further enhance diagnostic precision and interpretability. Additionally, interpretability was addressed using Grad-CAM visualizations, which provided class-specific attention maps to highlight the regions contributing to the model's predictions.