InteractNet: Social Interaction Recognition for Semantic-rich Videos

被引:0
|
作者
Lyu, Yuanjie [1 ]
Qin, Penggang [1 ]
Xu, Tong [1 ]
Zhu, Chen [1 ,2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] BOSS Zhipin, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal analysis; video-and-language understanding; graph convo- lutional network;
D O I
10.1145/3663668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The overwhelming surge of online video platforms has raised an urgent need for social interaction recognition techniques. Compared with simple short-term actions, long-term social interactions in semantic-rich videos could reflect more complicated semantics such as character relationships or emotions, which will better support various downstream applications, e.g., story summarization and fine-grained clip retrieval. However, considering the longer duration of social interactions with severe mutual overlap, involving multiple characters, dynamic scenes, and multi-modal cues, among other factors, traditional solutions for short-term action recognition may probably fail in this task. To address these challenges, in this article, we propose a hierarchical graph-based system, named InteractNet, to recognize social interactions in a multi-modal perspective. Specifically, our approach first generates a semantic graph for each sampled frame with integrating multi- modal cues and then learns the node representations as short-term interaction patterns via an adapted GCN module. Along this line, global interaction representations are accumulated through a sub-clip identification module, effectively filtering out irrelevant information and resolving temporal overlaps between interactions. In the end, the association among simultaneous interactions will be captured and modelled by constructing a global-level character-pair graph to predict the final social interactions. Comprehensive experiments on publicly available datasets demonstrate the effectiveness of our approach compared with state-of-the-art baseline methods.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] An integrated topic modeling and auto-encoder for semantic-rich network embedding and news recommendation
    Tham Vo
    Neural Computing and Applications, 2023, 35 : 18681 - 18696
  • [22] INTERACTION-GCN: A GRAPH CONVOLUTIONAL NETWORK BASED FRAMEWORK FOR SOCIAL INTERACTION RECOGNITION IN EGOCENTRIC VIDEOS
    Felicioni, Simone
    Dimiccoli, Mariella
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2348 - 2352
  • [23] Indoor Semantic-Rich Link-Node Model Construction Using Crowdsourced Trajectories From Smartphones
    Guo, Sheng
    Pun, Man-On
    IEEE SENSORS JOURNAL, 2019, 19 (22) : 10917 - 10934
  • [24] A top-down methodology for building semantic-rich service-oriented collaborative virtual enterprise (CVE)
    Chen, Gang
    Ren, Wei
    Chen, David
    Zhang, Jing Bing
    Sun, Chengzheng
    Yang, Zhonghua
    Low, Chor Ping
    Zhuang, Liqun
    IECON 2007: 33RD ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-3, CONFERENCE PROCEEDINGS, 2007, : 2621 - +
  • [25] Automatic concrete defect detection and reconstruction by aligning aerial images onto semantic-rich building information model
    Chen, Junjie
    Lu, Weisheng
    Lou, Jinfeng
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (08) : 1079 - 1098
  • [26] Events and Objects Interaction Recognition Using Aerial Videos
    Alshomrani, Fatimah
    2019 10TH IFIP INTERNATIONAL CONFERENCE ON NEW TECHNOLOGIES, MOBILITY AND SECURITY (NTMS), 2019,
  • [27] Semantic Analysis for Automatic Event Recognition and Segmentation of Wedding Ceremony Videos
    Cheng, Wen-Huang
    Chuang, Yung-Yu
    Lin, Yin-Tzu
    Hsieh, Chi-Chan
    Fang, Shao-Yen
    Chen, Bing-Yu
    Wu, Ja-Ling
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (11) : 1639 - 1650
  • [28] A Hybrid Model for Concurrent Interaction Recognition from Videos
    Sivarathinabala, M.
    Abirami, S.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2016, 11 (04) : 553 - 566
  • [29] Action and Interaction Recognition in First-person videos
    Narayan, Sanath
    Kankanhalli, Mohan S.
    Ramakrishnan, Kalpathi R.
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 526 - +
  • [30] Semantic Search for scientific Videos - Automatic Indexing by Named Entity Recognition
    Strobel, Sven
    Plank, Margret
    ZEITSCHRIFT FUR BIBLIOTHEKSWESEN UND BIBLIOGRAPHIE, 2014, 61 (4-5): : 254 - 258