Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3
Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
arXiv paper abstract https://arxiv.org/abs/2404.09216
arXiv PDF paper https://arxiv.org/pdf/2404.09216.pdf
… introduce DetCLIPv3, a high-performing detector that excels not only at both open-vocabulary object detection, but also generating hierarchical labels for detected objects.
DetCLIPv3 … three core designs: 1. Versatile model architecture … derive a robust open-set detection … with generation ability via the integration of a caption head.
2. High information density data: … auto-annotation … leveraging visual large language model to refine captions for … image-text pairs, providing … labels to enhance the training.
3. Efficient training strategy: … employ a pre-training stage with low-resolution inputs … enables the object captioner to … learn … visual concepts from … image-text paired data.
This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.
… DetCLIPv3 demonstrates superior open-vocabulary detection … also … state-of-the-art … in dense captioning task on VG dataset, showcasing its strong generative capability.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b