Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer

--

Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer

Language as Queries for Referring Video Object Segmentation
arXiv paper abstract https://arxiv.org/abs/2201.00487v1
arXiv PDF paper https://arxiv.org/pdf/2201.00487v1.pdf
GitHub https://github.com/wjn922/referformer

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.

… propose a simple and unified framework built upon Transformer, termed ReferFormer.

It views the language as queries and directly attends to the most relevant regions in the video frames.

… all the queries are obligated to find the referred objects only.

… The object tracking is achieved naturally by linking the corresponding queries across frames.

… On Ref-Youtube-VOS, Refer-Former … exceeds the previous state-of-the-art performance by 8.4 points. …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Farsai Chaikulngamdee on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet