EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos

Kutsev Bengisu Ozyoruk, Institute of Biomedical Engineering, Bogazici University, Turkey. Electronic address: bengisu.ozyoruk@boun.edu.tr.
Guliz Irem Gokceler, Institute of Biomedical Engineering, Bogazici University, Turkey.
Taylor L. Bobrow, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Gulfize Coskun, Institute of Biomedical Engineering, Bogazici University, Turkey.
Kagan Incetan, Institute of Biomedical Engineering, Bogazici University, Turkey.
Yasin Almalioglu, Computer Science Department, University of Oxford, Oxford, UK.
Faisal Mahmood, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Data Science, Dana Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Eva Curto, Institute for Systems and Robotics, University of Coimbra, Portugal.
Luis Perdigoto, Institute for Systems and Robotics, University of Coimbra, Portugal.
Marina Oliveira, Institute for Systems and Robotics, University of Coimbra, Portugal.
Hasan Sahin, Institute of Biomedical Engineering, Bogazici University, Turkey.
Helder Araujo, Institute for Systems and Robotics, University of Coimbra, Portugal.
Henrique Alexandrino, Faculty of Medicine, Clinical Academic Center of Coimbra, University of Coimbra, Coimbra, Portugal.
Nicholas J. Durr, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Hunter B. Gilbert, Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA, USA.
Mehmet Turan, Institute of Biomedical Engineering, Bogazici University, Turkey. Electronic address: mehmet.turan@boun.edu.tr.

Abstract

Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.