This paper describes recent progress towards porting a Unified Flow Solver OHS to heterogeneous parallel computing. The main challenge of porting CIF'S to graphics processing units (GPUs) comes from the dynamically adapted mesh, which causes irregular data access. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using discrete velocity method (DVM), the DSMC module, and the Lattice Boltzmann Method (LBM) solver, all using octree Cartesian mesh with adaptive Mesh Refinement (AMR). Double digit speedup on single GPU and good scaling for multi-GPU has been demonstrated.