Large eddy simulation (LES) of the resonant inertial response of the upper ocean to strong wind forcing is carried out; the results are used to evaluate the performance of each of the two second-order turbulence closure models presented by Mellor and Yamada (Rev Geophys Space Phys 20:851-875, 1982) (MY) and by Nakanishi and Niino (J Meteorol Soc Jpn 87:895-912, 2009) (NN). The major difference between MY and NN is in the formulation of the stability functions and the turbulent length scale, both strongly linked with turbulent fluxes; in particular, the turbulent length scale in NN, unlike that in MY, is allowed to decrease with increasing density stratification. We find that MY underestimates and NN overestimates the development of mixed layer features, for example, the strong entrainment at the base of the oceanic mixed layer and the accompanying decrease of sea surface temperature. Considering that the stability functions in NN perform better than those in MY in reproducing the vertical structure of turbulent heat flux, we slightly modify NN to find that the discrepancy between LES and NN can be reduced by more strongly restricting the turbulent length scale with increasing density stratification.