This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov de-cision processes (MDPs). The involved variance metric concerns reward variability during the whole pro-cess, and future deviations are discounted to their present values. This discounted mean-variance op-timization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property-time-consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the un-treatable MDP to a standard one with a redefined reward function in standard form and derive a dis-counted mean-variance performance difference formula. With the pseudo mean, we propose a unified al-gorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems, including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Fur-thermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.& COPY; 2023 Elsevier B.V. All rights reserved.