With the share of machine learning (ML) workloads in data centers rapidly increasing, cloud providers are beginning to incorporate accelerators such as tensor processing units (TPUs) to improve the energy-efficiency of applications. However, without optimizing application parameters, users may underutilize accelerators and end up wasting energy and money. This paper presents TPUPoint to facilitate the development of efficient applications on TPU-based cloud platforms. TPUPoint automatically classifies repetitive patterns into phases and identifies the most timing-critical operations in each phase. Further, TPUPoint can associate phases with checkpoints to allow fast-forwarding in applications, thereby significantly reducing the time and money spent optimizing applications. By running TPUPoint on a wide array of representative ML workloads, we found that computation is no longer the most time-consuming operation; instead, the infeed and reshape operations, which exchange and realign data, become most significant. TPUPoints advantages significantly increase the potential for discovering optimal parameters to quickly balance the complex workload pipeline of feeding data into a system, reformatting the data, and computing results.