AN END-TO-END FOOD PORTION ESTIMATION FRAMEWORK BASED ON SHAPE RECONSTRUCTION FROM MONOCULAR IMAGE

被引:1
|
作者
Shao, Zeman [1 ]
Vinod, Gautham [1 ]
He, Jiangpeng [1 ]
Zhu, Fengqing [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Dietary Assessment; Image-based Food Energy Estimation; 3D Shape Reconstruction; Deep Learning Framework;
D O I
10.1109/ICME55011.2023.00166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dietary assessment is a key contributor to monitoring health status. Existing self-report methods are tedious and time-consuming with substantial biases and errors. Image-based food portion estimation aims to estimate food energy values directly from food images, showing great potential for automated dietary assessment solutions. Existing image-based methods either use a single-view image or incorporate multi-view images and depth information to estimate the food energy, which either has limited performance or creates user burdens. In this paper, we propose an end-to-end deep learning framework for food energy estimation from a monocular image through 3D shape reconstruction. We leverage a generative model to reconstruct the voxel representation of the food object from the input image to recover the missing 3D information. Our method is evaluated on a publicly available food image dataset Nutrition5k, resulting a Mean Absolute Error (MAE) of 40.05 kCal and Mean Absolute Percentage Error (MAPE) of 11.47% for food energy estimation. Our method uses RGB image as the only input at the inference stage and achieves competitive results compared to the existing method requiring both RGB and depth information.
引用
收藏
页码:942 / 947
页数:6
相关论文
共 50 条
  • [1] An End-to-end Network for Monocular Visual Odometry Based on Image Sequence
    Yao, Mingwei
    Quan, Hongyan
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] End-to-end Hand Mesh Recovery from a Monocular RGB Image
    Zhang, Xiong
    Li, Qiang
    Mo, Hong
    Zhang, Wenbo
    Zheng, Wen
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2354 - 2364
  • [3] An end-to-end framework for unconstrained monocular 3D hand pose estimation
    Sharma, Sanjeev
    Huang, Shaoli
    PATTERN RECOGNITION, 2021, 115
  • [4] End-to-End Monocular Range Estimation for Forward Collision Warning
    Tang, Jie
    Li, Jian
    SENSORS, 2020, 20 (20) : 1 - 15
  • [5] End-to-End Monocular Pose Estimation for Uncooperative Spacecraft Based on Direct Regression Network
    Huang, Haoran
    Song, Bin
    Zhao, Gaopeng
    Bo, Yuming
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (05) : 5378 - 5389
  • [6] An End-to-End Framework of Road User Detection, Tracking, and Prediction from Monocular Images
    Cheng, Hao
    Liu, Mengmeng
    Chen, Lin
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2178 - 2185
  • [7] End-to-End 6DoF Pose Estimation From Monocular RGB Images
    Zou, Wenbin
    Wu, Di
    Tian, Shishun
    Xiang, Canqun
    Li, Xia
    Zhang, Lu
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2021, 67 (01) : 87 - 96
  • [8] End-to-End Deep Image Reconstruction From Human Brain Activity
    Shen, Guohua
    Dwivedi, Kshitij
    Majima, Kei
    Horikawa, Tomoyasu
    Kamitani, Yukiyasu
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2019, 13
  • [9] End-to-End Learning-Based Image Compression With a Decoupled Framework
    Zhang, Zhaobin
    Esenlik, Semih
    Wu, Yaojun
    Wang, Meng
    Zhang, Kai
    Zhang, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3067 - 3081
  • [10] DwinFormer: Dual Window Transformers for End-to-End Monocular Depth Estimation
    Rahman, Md Awsafur
    Fattah, Shaikh Anowarul
    IEEE SENSORS JOURNAL, 2023, 23 (18) : 21443 - 21451