Reward-based online learning in non-stationary environments: Adapting a P300-speller with a "Backspace" key

被引:0
|
作者
Dauce, Emmanuel [1 ]
Proix, Timothee [2 ]
Ralaivola, Liva [3 ]
机构
[1] Ecole Cent Marseille, INSERM, UMR S 1106, Marseille, France
[2] Aix Marseille Univ, INSERM, UMR S 1106, Marseille, France
[3] Aix Marseille Univ, CNRS, UMR 7279, Marseille, France
来源
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2015年
关键词
Online learning; Reinforcement learning; Policy gradient; Brain-Computer Interfaces; P300; speller; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We adapt a policy gradient approach to the problem of reward-based online learning of a non-invasive EEG-based "P300"-speller. We first clarify the nature of the P300-speller classification problem and present a general regularized gradient ascent formula. We then show that when the reward is immediate and binary (namely "bad response" or "good response"), each update is expected to improve the classifier accuracy, whether the actual response is correct or not. We also estimate the robustness of the method to occasional mistaken rewards, i.e. show that the learning efficacy may only linearly decrease with the rate of invalid rewards. The effectiveness of our approach is tested in a series of simulations reproducing the conditions of real experiments. We show in a first experiment that a systematic improvement of the spelling rate is obtained for all subjects in the absence of initial calibration. In a second experiment, we consider the case of the online recovery that is expected to follow failed electrodes. Combined with a specific failure detection algorithm, the spelling error information (typically contained in a "backspace" hit) is shown useful for the policy gradient to adapt the P300 classifier to the new situation, provided the feedback is reliable enough (namely having a reliability greater than 70%).
引用
收藏
页数:8
相关论文
共 8 条
  • [1] An RBF online learning scheme for non-stationary environments based on fuzzy means and Givens rotations
    Karamichailidou, Despina
    Koletsios, Sotirios
    Alexandridis, Alex
    NEUROCOMPUTING, 2022, 501 : 370 - 386
  • [2] An Online Learning Framework for UAV Target Search Missions in Non-Stationary Environments
    Khial, Noor
    Mhaisen, Naram
    Mabrok, Mohamed
    Mohamed, Amr
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 753 - 758
  • [3] An online semi-supervised P300 speller based on extreme learning machine
    Wang, Junjie
    Gu, Zhenghui
    Yu, Zhuliang
    Li, Yuanqing
    NEUROCOMPUTING, 2017, 269 : 148 - 151
  • [4] Stream-Based Active Learning with Verification Latency in Non-stationary Environments
    Castellani, Andrea
    Schmitt, Sebastian
    Hammer, Barbara
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 260 - 272
  • [5] P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
    Marinescu, Andrei
    Dusparic, Ivana
    Taylor, Adam
    Cahill, Vinny
    Clarke, Siobhan
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 1897 - 1898
  • [6] Graph-based method for autonomous adaptation in online learning of non-stationary data
    Alvarenga, W. J.
    Costa, A. C. A. A.
    Campos, F. V.
    Torres, L. C. B.
    Braga, A. P.
    INFORMATION SCIENCES, 2025, 700
  • [7] Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments
    Marinescu, Andrei
    Dusparic, Ivana
    Clarke, Siobhan
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2017, 12 (02)
  • [8] Adaptation method of the exploration ratio based on the orientation of equilibrium in multi-agent reinforcement learning under non-stationary environments
    Okano T.
    Noda I.
    Limited F.
    1600, Fuji Technology Press (21): : 939 - 947