Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer

Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer

Language as Queries for Referring Video Object Segmentation
arXiv paper abstract https://arxiv.org/abs/2201.00487v1
arXiv PDF paper https://arxiv.org/pdf/2201.00487v1.pdf
GitHub https://github.com/wjn922/referformer

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.

… propose a simple and unified framework built upon Transformer, termed ReferFormer.

It views the language as queries and directly attends to the most relevant regions in the video frames.

… all the queries are obligated to find the referred objects only.

… The object tracking is achieved naturally by linking the corresponding queries across frames.

… On Ref-Youtube-VOS, Refer-Former … exceeds the previous state-of-the-art performance by 8.4 points. …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Farsai Chaikulngamdee on Unsplash

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AI News Clips by Morris Lee: News to help your R&D

I apply innovative technologies like machine learning, computer vision, and physics to further an organization's goals. Am recognized innovator with 66 patents.