Segment object in image described by text more simply using SeqTR

--

Segment object in image described by text more simply using SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding
arXiv paper abstract https://arxiv.org/abs/2203.16265v1
arXiv PDF paper https://arxiv.org/pdf/2203.16265v1.pdf
GitHub https://github.com/sean-zhuh/seqtr

… propose … network termed SeqTR for visual grounding tasks, e.g., phrase localization, referring expression comprehension (REC) and segmentation (RES).

… visual grounding often require substantial expertise in designing network architectures and loss functions, making them hard to generalize across tasks.

To simplify … cast visual grounding as a point prediction problem conditioned on image and text inputs, where either the bounding box or binary mask is represented as a sequence of discrete coordinate tokens.

… visual grounding … unified in … SeqTR network without task-specific branches or heads, e.g., the convolutional mask decoder for RES, which greatly reduces the complexity of multi-task modeling.

… SeqTR outperforms (or is on par with) the existing state-of-the-arts, proving that a simple yet universal approach for visual grounding is indeed feasible.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Clay Banks on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet