Better depth from monocular image by reasoning globally and locally with MonoViT

AI News Clips by Morris Lee: News to help your R&D

1 min readAug 12, 2022

Better depth from monocular image by reasoning globally and locally with MonoViT

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
arXiv paper abstract https://arxiv.org/abs/2208.03543v1
arXiv PDF paper https://arxiv.org/pdf/2208.03543v1.pdf
GitHub https://github.com/zxcqlf/monovit

Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.

… However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm.

… propose MonoViT, a brand-new framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.

By combining plain convolutions with Transformer blocks, … model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy, allowing MonoViT to achieve state-of-the-art performance on the established KITTI dataset …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Better depth from monocular image by reasoning globally and locally with MonoViT

Written by AI News Clips by Morris Lee: News to help your R&D

No responses yet