Complete 3D scene using only images having occlusions by masked autoencoder with VoxFormer

AI News Clips by Morris Lee: News to help your R&D

2 min readFeb 27, 2023

Complete 3D scene using only images having occlusions by masked autoencoder with VoxFormer

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion
arXiv paper abstract https://arxiv.org/abs/2302.12251
arXiv PDF paper https://arxiv.org/pdf/2302.12251.pdf

… complete 3D geometry of occluded objects and scenes … is vital for recognition and understanding.

… propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images.

… framework adopts a two-stage design where … start from a sparse set of visible and occupied voxel queries from depth estimation, followed by a densification stage that generates dense 3D voxels from the sparse ones.

A key idea of this design is that the visual features on 2D images correspond only to the visible scene structures rather than the occluded or empty spaces.

… Once … obtain the set of sparse queries, … apply a masked autoencoder design to propagate the information to all the voxels by self-attention.

… VoxFormer outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Complete 3D scene using only images having occlusions by masked autoencoder with VoxFormer

Written by AI News Clips by Morris Lee: News to help your R&D

No responses yet