Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg
Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
arXiv paper abstract https://arxiv.org/abs/2402.08960
arXiv PDF paper https://arxiv.org/pdf/2402.08960.pdf
… open-vocabulary segmentation … rely on image-mask-text triplets, yet this … is labour-intensive
… liberate … correspondence between masks and texts by using independent image-mask and image-text pairs, which can be easily collected respectively.
With this unpaired mask-text supervision, … propose … weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions.
Using the independent image-mask and image-text pairs, … predict a set of binary masks and associate them with entities by resorting to the CLIP embedding space.
… using the large vision-language model (LVLM) to refine text descriptions and devise a multi-scale ensemble to stablise the matching between masks and entities.
Compared to text-only weakly-supervised methods, … Uni-OVSeg achieves substantial improvements … and even surpasses fully-supervised methods …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b