Complete 3D scene using only images having occlusions by masked autoencoder with VoxFormer

--

Complete 3D scene using only images having occlusions by masked autoencoder with VoxFormer

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion
arXiv paper abstract https://arxiv.org/abs/2302.12251
arXiv PDF paper https://arxiv.org/pdf/2302.12251.pdf

… complete 3D geometry of occluded objects and scenes … is vital for recognition and understanding.

… propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images.

… framework adopts a two-stage design where … start from a sparse set of visible and occupied voxel queries from depth estimation, followed by a densification stage that generates dense 3D voxels from the sparse ones.

A key idea of this design is that the visual features on 2D images correspond only to the visible scene structures rather than the occluded or empty spaces.

… Once … obtain the set of sparse queries, … apply a masked autoencoder design to propagate the information to all the voxels by self-attention.

… VoxFormer outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Nathan Dumlao on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet