Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg

--

Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg

Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
arXiv paper abstract https://arxiv.org/abs/2402.08960
arXiv PDF paper https://arxiv.org/pdf/2402.08960.pdf

… open-vocabulary segmentation … rely on image-mask-text triplets, yet this … is labour-intensive

… liberate … correspondence between masks and texts by using independent image-mask and image-text pairs, which can be easily collected respectively.

With this unpaired mask-text supervision, … propose … weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions.

Using the independent image-mask and image-text pairs, … predict a set of binary masks and associate them with entities by resorting to the CLIP embedding space.

… using the large vision-language model (LVLM) to refine text descriptions and devise a multi-scale ensemble to stablise the matching between masks and entities.

Compared to text-only weakly-supervised methods, … Uni-OVSeg achieves substantial improvements … and even surpasses fully-supervised methods …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Mae Mu on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet