In the quantum state tomography problem, one wishes to estimate an unknown d-dimensional mixed quantum state rho, given few copies. We show that O(d/epsilon) copies suffice to obtain an estimate (rho) over cap that satisfies parallel to(rho) over cap - rho parallel to(2)(F) <= epsilon (with high probability). An immediate consequence is that O(rank(rho) . d/epsilon(2)) <= O(d(2)/epsilon(2)) copies suffice to obtain an epsilon-accurate estimate in the standard trace distance. This improves on the best known prior result of O(d(3)/epsilon(2)) copies for full tomography, and even on the best known prior result of O(d(2)log(d/epsilon)/epsilon(2)) copies for spectrum estimation. Our result is the first to show that nontrivial tomography can be obtained using a number of copies that is just linear in the dimension. Next, we generalize these results to show that one can perform efficient principal component analysis on rho. Our main result is that O(kd/epsilon(2)) copies suffice to output a rank-k approximation (rho) over cap whose trace-distance error is at most epsilon more than that of the best rank-k approximator to rho. This subsumes our above trace distance tomography result and generalizes it to the case when rho is not guaranteed to be of low rank. A key part of the proof is the analogous generalization of our spectrum-learning results: we show that the largest k eigenvalues of rho can be estimated to trace distance error epsilon using O(k(2)/epsilon(2)) copies. In turn, this result relies on a new coupling theorem concerning the Robinson-Schensted-Knuth algorithm that should be of independent combinatorial interest.