Emerging nonvolatile memories as embedded memories offer low leakage power and high memory density, compared to the static random access memory (SRAM) and embedded dynamic random access memory (eDRAM) at the same technology node. However, the emerging memories generally suffer from limited cycling endurance. For read/write intensive applications, the limited endurance could become a bottleneck that limits the lifetime of the overall system. In this work, Intel's reported prototype 3-D stackable ferroelectric random access memory (FeRAM) is considered as the global buffer memory of a tensor-processing-unit (TPU)-like architecture. An endurance-aware compiler is proposed to evaluate the maximum number of deep neural network (DNN) trainings considering the experimentally measured endurance limit. In addition, the proposed compiler applies two strategies to alleviate the endurance issue. The first strategy is wear leveling, and the second strategy is the dual-mode operation between volatile and nonvolatile modes. The maximum numbers of trainings increase by 6 x to 300 x and 4 x to 58 x thanks to the wear-leveling and dual-mode operations, respectively. Finally, a guideline of the system endurance (maximum number of trainings) is provided with given memory device endurance to bridge the gap between memory device engineers and system designers.