Discrete Probabilistic Inference as Control in Multi-path Environments

被引：0

作者：

Deleu, Tristan ^{[1
,3
]}

Nouri, Padideh ^{[2
]}

Malkin, Nikolay ^{[1
]}

Precup, Doina ^{[2
,4
]}

Bengio, Yoshua ^{[1
]}

机构：

[1] Univ Montreal, Montreal, PQ, Canada

[2] McGill Univ, Montreal, PQ, Canada

[3] Valence Labs, Montreal, PQ, Canada

[4] Google DeepMind, London, England

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2024年 / 244卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

引用

页码：997 / 1021

页数：25

共 50 条

[1] Risk-based discrete multi-path planning method for UAVs in urban environments
Hu X.
Wu Y.
Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (06):
[2] A multi-path routing service for immersive environments
Shi, SL
Wang, LL
Calvert, KL
Griffioen, JN
2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 699 - 706
[3] Acoustic localization in multi-path aware environments
Wang, Yan
Qun, Wan
Bai, Danping
Jin, Jiang
2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 667 - +
[4] Static Probabilistic Timing Analysis for Multi-path Programs
Lesage, Benjamin
Griffin, David
Altmeyer, Sebastian
Davis, Robert I.
2015 IEEE 36TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2015), 2015, : 361 - 372
[5] Performance evaluation of CPPM modulation in multi-path environments
Tasev, Z
Kocarev, L
CHAOS SOLITONS & FRACTALS, 2003, 15 (02) : 319 - 326
[6] Estimating Multiple Target Locations in Multi-Path Environments
Shen, Junyang
Molisch, Andreas F.
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2014, 13 (08) : 4547 - 4559
[7] Discrete Water Filling Multi-Path Packet Scheduling
Schneuwly, Arno
Malak, Derya
Medard, Muriel
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1658 - 1663
[8] Multi-path utility maximization and multi-path TCP design
Vo, Phuong Luu
Tuan Anh Le
Lee, Sungwon
Hong, Choong Seon
Kim, Byeongsik
Song, Hoyoung
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (01) : 1848 - 1857
[9] Adaptive Multi-Path Routing for Congestion Control
Chaitanya, N. Krishna
Varadarajan, S.
Sreenivasulu, P.
SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 189 - 192
[10] PRIDE: Path Integration Based Delay Flstimation in Multi-Device Multi-Path Environments
Peng, Wei
Zhao, Xuehui
Jiang, Tao
Adachi, Fumiyuki
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (12) : 11587 - 11596

← 1 2 3 4 5 →