Better depth from monocular image by reasoning globally and locally with MonoViT

--

Better depth from monocular image by reasoning globally and locally with MonoViT

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
arXiv paper abstract https://arxiv.org/abs/2208.03543v1
arXiv PDF paper https://arxiv.org/pdf/2208.03543v1.pdf
GitHub https://github.com/zxcqlf/monovit

Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.

… However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm.

… propose MonoViT, a brand-new framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.

By combining plain convolutions with Transformer blocks, … model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy, allowing MonoViT to achieve state-of-the-art performance on the established KITTI dataset …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Max Harlynking on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet