Doppler through-wall radar (TWR) excels in indoor target localization. The traditional localization method employs short-time Fourier transform (STFT) for time-frequency analysis (TFA), but an error occurs when multiple targets' instantaneous frequencies (IFs) cross or are close. This article presents an algorithm using a data fusion network (DF-Net) to enhance the Wigner-Ville distribution (WVD) by eliminating cross-terms. In DF-Net, both the WVD spectrogram and complex signals are inputs to the model, which uses complex convolutions for encoding. A channel weight reassignment (CWR) module and multilayer residual down-up sampling (MRDUS) module are employed to refine the WVD spectrogram and remove cross-terms. Target IFs extracted from enhanced spectrogram enable accurate localization. The DF-Net has been validated through simulations and real-world experiments, demonstrating its superiority. It not only performs well when dealing with IFs crossings but also exhibits superior performance in high-noise environments. As a result, the target localization error and IFs error of the proposed algorithm are reduced by approximately 59.2% and 68.8%, respectively, compared to the state-of-the-art methods.