This paper presents IC realization of a random forest (RF) machine learning classifier. Algorithm-architecture-circuit is co-optimized to minimize the energy-delay product (EDP). Deterministic subsampling (DSS) and balanced decision trees result in reduced interconnect complexity and avoid irregular memory accesses. Low-swing analog in-memory computations embedded in a standard 6T SRAM enable massively parallel processing thereby minimizing the memory fetches and reducing the EDP further. The 65nm CMOS prototype achieves a 6.8x lower EDP compared to a conventional design at the same accuracy (94%) for an 8-class traffic sign recognition problem.