Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer
Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer
Language as Queries for Referring Video Object Segmentation
arXiv paper abstract https://arxiv.org/abs/2201.00487v1
arXiv PDF paper https://arxiv.org/pdf/2201.00487v1.pdf
GitHub https://github.com/wjn922/referformer
Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.
… propose a simple and unified framework built upon Transformer, termed ReferFormer.
It views the language as queries and directly attends to the most relevant regions in the video frames.
… all the queries are obligated to find the referred objects only.
… The object tracking is achieved naturally by linking the corresponding queries across frames.
… On Ref-Youtube-VOS, Refer-Former … exceeds the previous state-of-the-art performance by 8.4 points. …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b