Single-ISA heterogeneous multi-core architecture which consists of diverse superscalar cores is increasing importance in the processor architecture. Using a proper superscalar core for characteristic in a program contributes to reduce energy consumption and improve performance. However, designing a heterogeneous multi-core processor requires a large design and verification effort. Therefore, we have proposed FabHetero which generates diverse heterogeneous multi-core processors automatically using FabScalar, FabCache, and FabBus which generate various designs of superscalar core, cache system, and flexible shared bus system, respectively. This paper is extended from our previous work, and it also presents the detail of FabCache. In the previous paper, the detail design of L1 data cache is not described, and the mechanism for high-end performance such as non-blocking cache is not implemented. In addition, the physical design and power estimation are not described. To solve these problems, this paper describes detail design of FabCache, in particular L1 data cache to show the suitability for high-end processors. This paper also focuses on performance estimation and the physical design of the caches which have arbitrary parameters such as cache capacity, line size, associativity, access latency, and line transmission width between cache hierarchies generated by FabCache. According to the estimation results, FabCache generates cache systems which have almost the same area and power consumption as hand-tuned cache because the ratio of L1 instruction cache controller including extra circuits is only 3.5% and the increased power consumption by comparing with hand-tuned cache is less than 0.1% although having the overhead of automatic generation.