In multiple-input multiple-output (MIMO) communication systems, precoding is an essential component to increase the data rate by projecting the transmit signal to the channel space of good condition. Singular value decomposition (SVD) is often used to produce the precoding matrix consisting of the right-singular vectors of the channel matrix, which is known to maximize channel capacity for Gaussian inputs. However, for non-Gaussian inputs, it is desired for the precoder to maximize mutual information (MI) between the channel input and output directly, since the theoretical channel capacity is only achievable under Gaussian input assumption. In this paper, we investigate a data-driven approach to learn the MI-maximizing precoder for finite-alphabet inputs. In particular, we focus on training a single model capable of producing the optimal precoder for various MIMO system configurations regarding the number of antennas, the number of data streams, and the modulation order, so it can be deployed easily in practice without the overhead of storing and managing multiple models. In our simulation results, it is shown that the proposed learned precoder results in higher MI values and lower block error rates in various scenarios, compared to the conventional capacity-maximizing SVD-based precoder.