Policy-Gradient-Based Reinforcement Learning for Maximizing Operator's Profit in Open-RAN

被引:0
作者
Sharara, Mahdi [1 ]
Hoteit, Sahar [2 ,3 ]
Carlinet, Yannick [1 ]
Masucci, Antonia Maria [1 ]
Perrot, Nancy [1 ]
机构
[1] Orange Innovat Div, F-92320 Chatillon, France
[2] Univ Paris Saclay, Lab signaux & Syst, CentraleSupelec, CNRS, F-91190 Gif Sur Yvette, France
[3] Inst Univ France IUF, Paris, France
关键词
Mixed integer linear programming; Open-RAN; Policy-gradient; Profit maximization; Reinforcement learning; Resource allocation; RADIO;
D O I
10.1007/s10922-025-09935-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open radio access network (O-RAN) is a novel architecture that enables the disaggregation and the virtualization of network components. This would provide new ways to mix and match network components by opening the interfaces linking them. O-RAN drives down network deployment costs and allows new players to enter the RAN market. It permits network operators to maximize resource utilization and deliver new network edge services at a lower price, resulting in higher profits for operators. In this context, we consider a computing resource allocation problem for maximizing the operator's profit. Given that an operator receives payments from subscribers and incurs costs from the infrastructure provider, the objective is to maximize the difference between these payments and costs. We model the problem using mixed integer linear programming (MILP), and we prove it to be NP-Hard. The aim is to allocate computing resources to users considering different processing priority levels. Then, we propose to solve the problem using policy-gradient-based reinforcement learning (RL). We consider different variants for the RL model. This includes using different neural network architectures such as Feed Forward and Bi-directional Long-Short Term Memory. Additionally, we consider REINFORCE and Actor-Critic algorithms, and we evaluate two strategies, Single Action per State and multiple actions per state (MAS). Our simulation results demonstrate that MAS combined with the Feed-Forward neural network and Actor-Critic algorithm outperforms all other RL variants regarding the operator's profit optimization. MAS combined with Actor-Critic reaches up to 93.06% of the optimal MILP-based solution yielded by solving the MILP formulation. Additionally, it has significantly lower complexity compared to solving the MILP problem to optimality, as our numerical experimentation shows that the execution time is reduced by up to 99.53%.
引用
收藏
页数:28
相关论文
共 30 条
[1]  
Andrychowicz M, 2020, Arxiv, DOI arXiv:2006.05990
[2]  
[Anonymous], 2019, Cloud Architecture and Deployment Scenarios for O-RAN Virtualized RAN
[3]   Intelligence and Learning in O-RAN for Data-Driven NextG Cellular Networks [J].
Bonati, Leonardo ;
D'Oro, Salvatore ;
Polese, Michele ;
Basagni, Stefano ;
Melodia, Tommaso .
IEEE COMMUNICATIONS MAGAZINE, 2021, 59 (10) :21-27
[4]  
D'Oro S, 2022, Arxiv, DOI arXiv:2203.02370
[5]  
Elsayed M, 2019, 2019 IEEE 2ND 5G WORLD FORUM (5GWF), P590, DOI [10.1109/5GWF.2019.8911618, 10.1109/5gwf.2019.8911618]
[6]   Dynamic Placement of O-CU and O-DU Functionalities in Open-RAN Architecture [J].
Hojeij, Hiba ;
Sharara, Mahdi ;
Hoteit, Sahar ;
Veque, Veronique .
2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,
[7]  
Khatibi S, 2018, EUR CONF NETW COMMUN, P266, DOI 10.1109/EuCNC.2018.8442563
[8]  
Korte B, 2012, ALGORITHMS COMB, V21, P459, DOI 10.1007/978-3-642-24488-9_17
[9]   Economic and Technical Implications of Implementation of OpenRAN by "RAKUTEN MOBILE" [J].
Kumar U, Ashwin ;
Hallur, Girt Gundu .
2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, :959-964
[10]   Intelligent Radio Access Network Slicing for Service Provisioning in 6G: A Hierarchical Deep Reinforcement Learning Approach [J].
Mei, Jie ;
Wang, Xianbin ;
Zheng, Kan ;
Boudreau, Gary ;
Bin Sediq, Akram ;
Abou-Zeid, Hatem .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (09) :6063-6078