Through-the-wall 3D imaging is promising for sensing concealed human targets, offering more detailed information. Conventional imaging techniques, such as the back projection (BP) algorithm in multiple-input-multiple-output (MIMO) through-wall radar (TWR), suffer from poor resolution due to a limited aperture and artifacts introduced by grating lobes. In this paper, we propose an imaging neural network based on multi-layer perceptrons (MLP) to enhance TWR image formation performance. The network intuitively processes the radar range-channel profiles as the input and directly outputs high-quality 3D imaging results. It incorporates MLP-Mixer as the backbone and effectively integrates features from various channels. Specifically, we design a dataset construction method utilizing point clouds captured by a stereo camera, which provides high-resolution labels while naturally avoiding grating lobes and image broadening. The network achieves high-quality image formation in real-measured data even with training solely on the simulated dataset. To further mitigate the target flickering and ghost false alarms, we fine-tune the network using a small amount of real-measured data.