Segment 3D scene with unknown objects using NeRF and ranking with CLIP foundation model OV-NeRF
Segment 3D scene with unknown objects using NeRF and ranking with CLIP foundation model OV-NeRF
OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding
arXiv paper abstract https://arxiv.org/abs/2402.04648
arXiv PDF paper https://arxiv.org/pdf/2402.04648.pdf
The development of Neural Radiance Fields (NeRFs) has provided … open-vocabulary 3D semantic perception … However … methods that extract semantics … from Contrastive Language-Image Pretraining (CLIP) for … learning encounter difficulties
… propose OV-NeRF, which exploits the potential of pre-trained vision and language foundation models to enhance semantic field learning through proposed single-view and cross-view strategies.
First, from the single-view perspective, … introduce Region Semantic Ranking (RSR) regularization by leveraging 2D mask proposals derived from SAM to rectify the noisy semantics of each training view
… Second, from the cross-view perspective, … propose a Cross-view Self-enhancement (CSE) strategy to address the challenge raised by view-inconsistent semantics.
Rather than invariably utilizing the 2D inconsistent semantics from CLIP, CSE leverages the 3D consistent semantics generated from the well-trained semantic field itself for semantic field training, aiming to … enhance overall semantic consistency across different views.
… OV-NeRF outperforms current state-of-the-art methods … approach exhibits consistent superior results across various CLIP configurations, further verifying its robustness.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b