We propose non-linear generative models referred to as Sparse Spectral Latent Variable Models (SLVM), that combine the advantages of spectral embeddings with the ones of parametric latent variable models: (1) provide stable latent spaces that preserve global or local geometric properties of the modeled data; (2) offer low-dimensional generative models with probabilistic, bi-directional mappings between latent and ambient spaces, (3) are probabilistically consistent (i.e., reflect the data distribution, both jointly and marginally) and efficient to learn and use. We show that SLVMs compare favorably with competing methods based on PCA GPLVM or GTM for the reconstruction of typical human motions like walking, running, pantomime or dancing in a benchmark dataset. Empirically, we observe that SLVMs are effective for the automatic 3d reconstruction of low-dimensional human motion in movies.