Recently, the regionproposal networks (RPN) have been combinedwith the Siamese networkfor tracking,and shown excellent accuracy with high efficiency. Nevertheless, previously proposedone-stage Siamese-RPNtrackersdegenerate in presence of similar distractorsand large scale variation. Addressing these issues, we propose a multi-stage tracking framework, Siamese Cascaded RPN (C-RPN), which consists of a sequence of RPNs cascadedfrom deep high-level to shallow low-level layers in a Siamese network. Comparedto previous solutions, C-RPN has severaladvantages: (1) Each RPN is trained using the outputs of RPN in the previous stage. Such process stimulates hardnegative sampling, resulting in more balanced training samples. Consequently, the RPNs are sequentially more discriminative in distinguishingdifficult background (i.e., similar distractors). (2) Multi-level features arefully leveragedthrough a novelfeature transferblock (FTB)for each RPNfurther improving the discriminabilityof C-RPN using both high-level semantic and low-level spatial information. (3) With multiple steps of regressions, C-RPN progressively refines the location and shape of the target in each RPN with adjusted anchor boxes in the previous stage, which makes localization more accurate. C-RPN is trained end-to-end with the multi-task lossfunction. In inference, C-RPN is deployed as it is, without any temporaladaption,for real-time tracking. In extensive experiments on OTB-2013, OTB-2015, VOT2016, VOT-2017, LaSOT and TrackingNet, C-RPN consistently achieves state-of-the-artresultsand runs in real-time.