Remote sensing images often contain a significant amount of clouds, which can result in substantial resource costs during transmission and storage. Cloud detection can reduce these costs. Although current cloud detection methods perform well in extracting large and thick clouds, there are still some issues, such as missed detection of small and thin clouds and false detection in non-cloud areas. Therefore, we propose a deep learning framework called DB-Net. It consists of three main modules: feature extraction module (FEM), cascaded feature enhancement module (CFEM), and feature fusion module (FFM). In the FEM, we leverage the advantages of both convolutional neural network and Transformer by utilizing two branches to reduce the loss of semantic information. To enhance the acquisition capability of multi-scale semantic information, in the CFEM, regular convolutions are replaced with deformable convolutions to adaptively capture cloud features of various sizes, and a cascaded structure is designed to enhance the interaction of information among different scales. Furthermore, to focus on small and thin cloud information and suppress non-cloud background information, we designed the FFM using attention mechanisms to enhance the target information in the features extracted by FEM and CFEM. Extensive experiments were conducted on the GF1-WHU dataset, and comparisons were made with mainstream cloud detection networks. The experimental results indicate that the proposed DB-Net method reduces cloud information omission, effectively focuses on thin clouds and small clouds, and improves overall cloud detection performance.