TemporalFusion: Temporal Motion Reasoning with Multi-Frame Fusion for 6D Object Pose Estimation

Document Type

Conference Proceeding

Publication Date



6D object pose estimation is an essential task in vision-based robotic grasping and manipulation. Prior works extract spatial features by fusing the RGB image and depth without considering the temporal motion information, limiting their performance in heavy occlusion robotic grasping scenarios. In this paper, we present an end-to-end model named TemporalFusion, which integrates the temporal motion information from RGB-D images for 6D object pose estimation. The core of proposed TemporalFusion model is to embed and fuse the temporal motion information from multi-frame RGB-D sequences, which could handle heavy occlusion in robotic grasping tasks. Furthermore, the proposed deep model can also obtain stable pose sequences, which is essential for real-time robotic grasping tasks. We evaluated the proposed method in the YCB-Video dataset, and experimental results show our model outperforms state-of-the-art approaches. Our code is available at https://github.com/mufengjun260/TemporalFusion21.

Publication Source (Journal or Book title)

IEEE International Conference on Intelligent Robots and Systems

First Page


Last Page


This document is currently not available here.