We study the regret for risk-sensitive reinforcement learning (RL) with the exponential utility in the episodic MDP. Recent works establish both a lower bound Omega((e(vertical bar beta vertical bar(H-1)/2) - 1)root SAT/vertical bar beta vertical bar) and the best known (upper) bound (O) over tilde ((e(vertical bar beta vertical bar H) - 1)root H(2)SAT/vertical bar beta vertical bar), where H is the length of the episode, S the size of state space, A the size of action space, T the total number of timesteps, and beta the risk parameter. The gap between the upper and the lower bound is exponential and hence is unsatisfactory. In this paper, we show that a variant of UCB-ADVANTAGE algorithm reduces a factor of root H from the best previously known bound in any arbitrary MDP. To further sharpen the regret bound, we introduce a brand new mechanism of regret analysis and derive a problem-dependent regret bound without prior knowledge of the MDP from the algorithm. This bound is much tighter in MDPs with special structures. Particularly, we show that a regret that matches the information-theoretic lower bound up to logarithmic factors can be attained within a rich class of MDPs, which improves an exponential factor over the best previously known bound. Further, we derive a novel information-theoretic lower bound of Omega(max(h is an element of[H]) c(v,h+1)* root SAT/vertical bar beta vertical bar), where max(h is an element of[H]) c(v,h+1)* is a problem-dependent statistic. This lower bound shows that the problem-dependent regret bound achieved by the algorithm is optimal in its dependence on max(h is an element of[H]) c(v,h+)1*.