The following source coding problem was introduced by Birk and Kol: a sender holds a word x is an element of {0, 1}(n), and wishes to broadcast a codeword to n receivers, R-1, ... , R-n. The receiver R-i is interested in x(i), and has prior side information comprising some subset of the n bits. This corresponds to a directed graph G on n vertices, where i(j) is an edge iff R-i knows the bit x(j). An index code for G is an encoding scheme which enables each R-i to always reconstruct x(i), given his side information. The minimal word length of an index code was studied by Bar-Yossef, Birk, Jayram, and Kol (FOCS'06). They introduced a graph parameter, minrk(2) (G), which completely characterizes the length of an optimal linear index code for G. They showed that in various cases linear codes attain the optimal word length, and conjectured that linear index coding is in fact always optimal. In this work, we disprove the main conjecture of Bar-Yossef, Birk, Jayram, and Kol in the following strong sense: for any epsilon > 0 and sufficiently large n, there is an n-vertex graph G so that every linear index code for G requires codewords of length at least n(1-epsilon), and yet a nonlinear index code for G has a word length of n(epsilon). This is achieved by an explicit construction, which extends Alon's variant of the celebrated Ramsey construction of Frankl and Wilson. In addition, we study optimal index codes in various, less restricted, natural models, and prove several related properties of the graph parameter minrk(G).