Better depth from monocular image by reasoning globally and locally with MonoViT
Better depth from monocular image by reasoning globally and locally with MonoViT
MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
arXiv paper abstract https://arxiv.org/abs/2208.03543v1
arXiv PDF paper https://arxiv.org/pdf/2208.03543v1.pdf
GitHub https://github.com/zxcqlf/monovit
Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.
… However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm.
… propose MonoViT, a brand-new framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.
By combining plain convolutions with Transformer blocks, … model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy, allowing MonoViT to achieve state-of-the-art performance on the established KITTI dataset …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b