Detect unknown objects and attributes better by jointly training on both tasks with OvarNet
Detect unknown objects and attributes better by jointly training on both tasks with OvarNet
OvarNet: Towards Open-vocabulary Object Attribute Recognition
arXiv paper abstract https://arxiv.org/abs/2301.09506
arXiv PDF paper https://arxiv.org/pdf/2301.09506.pdf
… consider … detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.
… make the following contributions: (i) … start with a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr.
The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes;
(ii) … combine all available datasets and train with a federated strategy to finetune the CLIP model, aligning the visual representation with attributes …
(iii) … train a Faster-RCNN type model end-to-end with knowledge distillation, that performs class-agnostic object proposals and classification on semantic categories and attributes …
(iv) … show that recognition of semantic category and attributes … largely outperform existing approaches that treat the two tasks independently, demonstrating strong generalization ability to novel attributes and categories.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b