SRRV: A Novel Document Object Detector Based on Spatial-Related Relation and Vision

被引:10
作者
Bi, Hengyue [1 ]
Xu, Canhui [1 ]
Shi, Cao [1 ]
Liu, Guozhu [1 ]
Li, Yuteng [1 ]
Zhang, Honghong [1 ]
Qu, Jing [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Proposals; Visualization; Cognition; Layout; Task analysis; Document object detection; spatial-related relation; graph convolutional Network; feature representation; document layout analysis; VIDEO SALIENCY DETECTION; NETWORK;
D O I
10.1109/TMM.2022.3165717
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document object detection is a challenging task due to layout complexity and object diversity. Most of existing methods mainly focus on vision information, neglecting representative inherent spatial-related relationship among document objects. To capture structural information and contextual dependencies, we propose a novel document object detector based on spatial-related relation and vision (SRRV). It consists of three parts: vision feature extraction network, relation feature aggregation network and result refinement network. Vision feature extraction network enhances information propagation of hierarchical feature pyramid by adopting feature augmentation paths. Then, relation feature aggregation network combines graph construction module and graph learning module. Specifically, graph construction module calculates spatial information from geometric attributes of region proposals to encode relation information, while graph learning module stacks Graph Convolutional Network (GCN) layers to aggregate relation information at global scale. Both the vision and relation features are fed into result refinement network for feature fusion and relational reasoning. Experiments on the PubLayNet, POD and Article Regions datasets demonstrate that spatial relation information improves the performance with better accuracy and more precise bounding box prediction.
引用
收藏
页码:3788 / 3798
页数:11
相关论文
共 52 条
[1]   CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images [J].
Agarwal, Madhav ;
Mondal, Ajoy ;
Jawahar, C., V .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :9491-9498
[2]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.325
[3]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[4]   Depth-Quality-Aware Salient Object Detection [J].
Chen, Chenglizhao ;
Wei, Jipeng ;
Peng, Chong ;
Qin, Hong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2350-2363
[5]   Bilevel Feature Learning for Video Saliency Detection [J].
Chen, Chenglizhao ;
Li, Shuai ;
Qin, Hong ;
Pan, Zhenkuan ;
Yang, Guowei .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (12) :3324-3336
[6]   Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion [J].
Chen, Chenglizhao ;
Li, Shuai ;
Wang, Yongguang ;
Qin, Hong ;
Hao, Aimin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3156-3170
[7]  
CHEN J, 2020, P IEEE CVF C COMPUT, V3, P392
[8]   Iterative Visual Reasoning Beyond Convolutions [J].
Chen, Xinlei ;
Li, Li-Jia ;
Li Fei-Fei ;
Gupta, Abhinav .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7239-7248
[9]  
DENG J, 2014, EUR C COMP VIS, P48, DOI DOI 10.1007/978-3-319-10590-1_4
[10]   ICDAR2017 Competition on Page Object Detection [J].
Gao, Liangcai ;
Yi, Xiaohan ;
Jiang, Zhuoren ;
Hao, Leipeng ;
Tang, Zhi .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1417-1422