Purpose: According to the oncologist, a single medication is insufficient to completely cure the disease; as a result, most patients undergo treatment from two or more types of therapy, sometimes in succession, in order to control the progression of their cancer. Methods: This comprehensive review explores the applications and challenges associated with multimodality treatment planning utilizing the Markov decision process (MDP) framework. The benefits of employing MDP in treatment planning include its ability to incorporate multiple treatment modalities, optimize treatment sequences, and account for uncertainties in patient response. However, several limitations exist, such as the complexity of modeling interactions between different treatment modalities, the need for accurate input data, and the computational burden associated with solving large-scale MDP problems. Results: This review highlights the importance of considering both the benefits and limitations of using MDP in multimodality treatment planning to enhance patient outcomes and optimize resource allocation in healthcare settings. Finally, by taking a case analysis into consideration, we analyze how altering the ultimate cost function affects the efficacy of MDP optimal plans. Conclusion: We summarize the findings of several studies that have employed MDP in treatment planning and discuss the benefits, limitations, and potential future directions of this approach. It also provides a comprehensive review of the benefits of MDP-based multimodality treatment planning, including improved treatment outcomes, reduced toxicities, personalized treatment decisions, and optimized sequencing and timing of treatment modalities. © The Author(s), under exclusive licence to The Brazilian Society of Biomedical Engineering 2024.