Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3

--

Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
arXiv paper abstract https://arxiv.org/abs/2404.09216
arXiv PDF paper https://arxiv.org/pdf/2404.09216.pdf

… introduce DetCLIPv3, a high-performing detector that excels not only at both open-vocabulary object detection, but also generating hierarchical labels for detected objects.

DetCLIPv3 … three core designs: 1. Versatile model architecture … derive a robust open-set detection … with generation ability via the integration of a caption head.

2. High information density data: … auto-annotation … leveraging visual large language model to refine captions for … image-text pairs, providing … labels to enhance the training.

3. Efficient training strategy: … employ a pre-training stage with low-resolution inputs … enables the object captioner to … learn … visual concepts from … image-text paired data.

This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.

… DetCLIPv3 demonstrates superior open-vocabulary detection … also … state-of-the-art … in dense captioning task on VG dataset, showcasing its strong generative capability.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Chloe Bolton on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet