Flash memory is a non-volatile computer memory comprised of blocks of cells, wherein each cell can take on q different values or levels. While increasing the cell level is easy, reducing the level of a cell can be accomplished only by erasing an entire block. Since block erasures are highly undesirable, coding schemes - known as floating codes or flash codes - have been designed in order to maximize the number of times that information stored in a flash memory can be written (and re-written) prior to incurring a block erasure. An (n, k, t)(q) flash code C is a coding scheme for storing k information bits in n cells in such a way that any sequence of up to t writes (where a write is a transition 0 -> 1 or 1 -> 0 in any one of the k bits) can be accommodated without a block erasure. The total number of available level transitions in n cells is n(q-1), and the write deficiency of C, defined as delta(C) = n(q-1) - t, is a measure of how close the code comes to perfectly utilizing all these transitions. For k > 6 and large n, the best previously known construction of flash codes achieves a write deficiency of O(qk(2)). On the other hand, the best known lower bound on write deficiency is Omega(qk). In this paper, we present a new construction of flash codes that approaches this lower bound to within a factor logarithmic in k. To this end, we first improve upon the so-called "indexed" flash codes, due to Jiang and Bruck, by eliminating the need for index cells in the Jiang-Bruck construction. Next, we further increase the number of writes by introducing a new multi-stage (recursive) indexing scheme. We then show that the write deficiency of the resulting flash codes is O(qk log k) if q >= log(2) k, and at most O(k log(2) k) otherwise.