LO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{L}\mathcal{O}^2$$\end{document}net: Global–Local Semantics Coupled Network for scene-specific video foreground extraction with less supervision

被引:0
|
作者
Tao Ruan
Shikui Wei
Yao Zhao
Baoqing Guo
Zujun Yu
机构
[1] Beijing Jiaotong University,School of Mechanical, Electronic and Control Engineering
[2] Beijing Jiaotong University,Frontiers Science Center for Smart High
[3] Beijing Key Laboratory of Advanced Information Science and Network Technology,speed Railway System
关键词
Video foreground extraction; Scene-specific training; Deep neural network; Semantic edge; Multi-scale features;
D O I
10.1007/s10044-023-01193-5
中图分类号
学科分类号
摘要
Video foreground extraction has been widely applied to quantitative fields and attracts great attention all over the world. Nevertheless, the performance of a such method can be easily reduced due to the dizzy environment. To tackle this problem, the global semantics (e.g., background statistics) and the local semantics (e.g., boundary areas) can be utilized to better distinguish foreground objects from the complex background. In this paper, we investigate how to effectively leverage the above two kinds of semantics. For global semantics, two convolutional modules are designed to take advantage of data-level background priors and feature-level multi-scale characteristics, respectively; for local semantics, another module is further put forward to be aware of the semantic edges between foreground and background. The three modules are intertwined with each other, yielding a simple yet effective deep framework named gLO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{L}\mathcal{O}$$\end{document}bal–LO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{L}\mathcal{O}$$\end{document}cal Semantics Coupled Network (LO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{L}\mathcal{O}^2$$\end{document}Net), which is end-to-end trainable in a scene-specific manner. Benefiting from the LO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{L}\mathcal{O}^2$$\end{document}Net, we achieve superior performance on multiple public datasets, with less supervision trained against several state-of-the-art methods.
引用
收藏
页码:1671 / 1683
页数:12
相关论文
共 6 条