Segment scene using words in the caption using one stage with PPMN
Segment scene using words in the caption using one stage with PPMN
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
arXiv paper abstract https://arxiv.org/abs/2208.05647v1
arXiv PDF paper https://arxiv.org/pdf/2208.05647v1.pdf
GitHub https://github.com/dzh19990407/ppmn
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image.
… two-stage approach first extracts segmentation region proposals … then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase.
However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage … as well as complicated strategies designed for things and stuff
… To alleviate … drawbacks, … propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals
… model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs
… method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b