This work developed meta-learning control policies to achieve fast online adaptation to different changing conditions, which generate diverse and robust locomotion. The proposed method updates the interaction model constantly, samples feasible sequences of actions of estimated state-action trajectories, and then applies the optimal actions to maximize the reward. To achieve online model adaptation, our proposed method learns different latent vectors of each training condition, which is selected online based on newly collected data from the past 10 samples within 0.2s. Our work designs appropriate state space and reward functions, and optimizes feasible actions in an MPC fashion which are sampled directly in the joint space with constraints, hence requiring no prior design or training of specific gaits. We further demonstrated the robot's capability of detecting unexpected changes during the interaction and adapting the control policy in less than 0.2s. The extensive validation on the SpotMicro robot in a physics simulation shows adaptive and robust locomotion skills under changing ground friction, external pushes, and different robot dynamics including motor failures and the whole leg amputation.