A Video-Based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants

被引：2

作者：

Zhu, Shaotong ^{[1
]}

Wan, Michael ^{[1
,2
]}

Hatamimajoumerd, Elaheh ^{[1
]}

Jain, Kashish ^{[1
]}

Zlota, Samuel ^{[1
]}

Kamath, Cholpady Vikram ^{[1
]}

Rowan, Cassandra B. ^{[5
]}

Grace, Emma C.

Goodwin, Matthew S. ^{[3
,4
]}

Hayes, Marie J. ^{[5
]}

Schwartz-Mette, Rebecca A. ^{[5
]}

Zimmerman, Emily ^{[4
]}

Ostadabbas, Sarah ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Augmented Cognit Lab, Boston, MA 02115 USA

[2] Northeastern Univ, Roux Inst, Portland, ME USA

[3] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA USA

[4] Northeastern Univ, Bouve Coll Hlth Sci, Boston, MA USA

[5] Univ Maine, Psychol Dept, Orono, ME USA

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II | 2023年 / 14221卷

关键词：

Non-nutritive sucking; Action recognition; Action segmentation; Optical flow; Temporal convolution;

D O I：

10.1007/978-3-031-43895-0_55

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS)-an infant sucking pattern with no nutrition delivered-as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of the relevant activity. Our NNS activity segmentation algorithm tackles this problem by identifying periods of NNS with high certainty-up to 94.0% average precision and 84.9% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 h of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3% accuracy, and NNS segmentation achieves 90.8% precision and 84.2% recall. Our code and the manually annotated NNS in-the-wild dataset can be found at https://github.com/ostadabbas/NNS-Detection- and-Segmentation. Supported by MathWorks and NSF-CAREER Grant #2143882.

引用

页码：586 / 595

页数：10

共 26 条

[1] How might non nutritional sucking protect from sudden infant death syndrome
Abed, Bruno Zavala
Oneto, Sabrina
Abreu, Alexandre R.
Chediak, Alejandro D.
[J]. MEDICAL HYPOTHESES, 2020, 143
[2] Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[3] Risk Factors, Protective Factors, and Current Recommendations to Reduce Sudden Infant Death Syndrome A Review
Carlin, Rebecca F.
Moon, Rachel Y.
[J]. JAMA PEDIATRICS, 2017, 171 (02) : 175 - 180
[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[5] RetinaFace: Single-shot Multi-level Face Localisation in the Wild
Deng, Jiankang
Guo, Jia
Ververas, Evangelos
Kotsia, Irene
Zafeiriou, Stefanos
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5202 - 5211
[6] Ding GD, 2023, Arxiv, DOI arXiv:2210.10352
[7] Two-frame motion estimation based on polynomial expansion
Farnebäck, G
[J]. IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 363 - 370
[8] X3D: Expanding Architectures for Efficient Video Recognition
Feichtenhofer, Christoph
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 200 - 210
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Huang X., 2019, CVPR WORKSHOPS

← 1 2 3 →