3D object detection boxes directly from image and point data using multi-modal features with CMT
3D object detection boxes directly from image and point data using multi-modal features with CMT
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
arXiv paper abstract https://arxiv.org/abs/2301.01283
arXiv PDF paper https://arxiv.org/pdf/2301.01283.pdf
GitHub https://github.com/junjie18/cmt
… propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection.
Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes.
The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features.
The core design of CMT is quite simple while its performance is impressive.
It achieves 74.1% NDS (state-of-the-art with single model) on nuScenes test set while maintaining faster inference speed.
Moreover, CMT has a strong robustness even if the LiDAR is missing …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b