Segment scene using words in the caption using one stage with PPMN

--

Segment scene using words in the caption using one stage with PPMN

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
arXiv paper abstract https://arxiv.org/abs/2208.05647v1
arXiv PDF paper https://arxiv.org/pdf/2208.05647v1.pdf
GitHub https://github.com/dzh19990407/ppmn

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image.

… two-stage approach first extracts segmentation region proposals … then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase.

However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage … as well as complicated strategies designed for things and stuff

… To alleviate … drawbacks, … propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals

… model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs

… method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Reiseuhu on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet