Graph coloring is an assignment of colors to the vertices of a graph such that no two adjacent vertices get the same color. It is a key building block in many applications. Finding a coloring with a minimal number of colors is often only part of the problem. In addition, the solution also needs to be computed quickly. Several parallel implementations exist, but they may suffer from low parallelism depending on the input graph. We present an approach that increases the parallelism without affecting the coloring quality. On 18 test graphs, our technique yields an average of 3.4 times more parallelism. Our CUDA implementation running on a Titan V is 2.9 times faster on average and uses as few or fewer colors as the best GPU codes from the literature.