Depth Estimation with Neural Nets

Depth Estimates from 2D RGB Images from KITTI Dataset

Knowledge of environmental depth is required for successful autonomous vehicle navigation and VSLAM. Current autonomous vehicles utilize range-finding solutions such as LIDAR, RADAR, and SONAR that suffer drawbacks in both cost and accuracy. Vision-based systems offer the promise of cost-effective, accurate, and passive depth estimation to compete with existing sensor technologies.

Existing research has shown that it is possible to estimate depth from 2D monocular vision cameras using convolutional neural networks. Recent advances suggest that depth estimate accuracy can be improved when networks used for supplementary tasks such as semantic segmentation are incorporated into the network architecture.

Recently, Kyle Cantrell, Dr. Carlos Morato, and I published a paper exploring a novel Serial U-Net (NU-Net) architecture. The Serial U-Net is introduced as a modular, ensembling technique for combining the learned features from N-many U-Nets into a single pixel-by-pixel output. Serial U-Nets are proposed to combine the benefits of semantic segmentation and transfer learning for improved depth estimation accuracy.

Using the Serial U-Net architecture, we were able to create some pretty cool 3D reconstructions from a single RGB image:

3D Reconstruction from Single RGB Image

All of our code is freely available on GitHub. You can use our pre-trained network on an existing video or a live webcam feed. And of course, you can also train the network on your own dataset of RGB and depth image pairs (similar to what is found in the NYU Depth V2 or KITTI datasets). 3D reconstruction MATLAB code is available here.

If you’re interested, you can find the full paper here.