Discrete Probabilistic Inference as Control in Multi-path Environments

被引:0
|
作者
Deleu, Tristan [1 ,3 ]
Nouri, Padideh [2 ]
Malkin, Nikolay [1 ]
Precup, Doina [2 ,4 ]
Bengio, Yoshua [1 ]
机构
[1] Univ Montreal, Montreal, PQ, Canada
[2] McGill Univ, Montreal, PQ, Canada
[3] Valence Labs, Montreal, PQ, Canada
[4] Google DeepMind, London, England
来源
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2024年 / 244卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.
引用
收藏
页码:997 / 1021
页数:25
相关论文
共 50 条
  • [1] Risk-based discrete multi-path planning method for UAVs in urban environments
    Hu X.
    Wu Y.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (06):
  • [2] A multi-path routing service for immersive environments
    Shi, SL
    Wang, LL
    Calvert, KL
    Griffioen, JN
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 699 - 706
  • [3] Acoustic localization in multi-path aware environments
    Wang, Yan
    Qun, Wan
    Bai, Danping
    Jin, Jiang
    2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 667 - +
  • [4] Static Probabilistic Timing Analysis for Multi-path Programs
    Lesage, Benjamin
    Griffin, David
    Altmeyer, Sebastian
    Davis, Robert I.
    2015 IEEE 36TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2015), 2015, : 361 - 372
  • [5] Performance evaluation of CPPM modulation in multi-path environments
    Tasev, Z
    Kocarev, L
    CHAOS SOLITONS & FRACTALS, 2003, 15 (02) : 319 - 326
  • [6] Estimating Multiple Target Locations in Multi-Path Environments
    Shen, Junyang
    Molisch, Andreas F.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2014, 13 (08) : 4547 - 4559
  • [7] Discrete Water Filling Multi-Path Packet Scheduling
    Schneuwly, Arno
    Malak, Derya
    Medard, Muriel
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1658 - 1663
  • [8] Multi-path utility maximization and multi-path TCP design
    Vo, Phuong Luu
    Tuan Anh Le
    Lee, Sungwon
    Hong, Choong Seon
    Kim, Byeongsik
    Song, Hoyoung
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (01) : 1848 - 1857
  • [9] Adaptive Multi-Path Routing for Congestion Control
    Chaitanya, N. Krishna
    Varadarajan, S.
    Sreenivasulu, P.
    SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 189 - 192
  • [10] PRIDE: Path Integration Based Delay Flstimation in Multi-Device Multi-Path Environments
    Peng, Wei
    Zhao, Xuehui
    Jiang, Tao
    Adachi, Fumiyuki
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (12) : 11587 - 11596