Segment objects in a video that are mentioned in a text query

Segment objects in a video that are mentioned in a text query

End-to-End Referring Video Object Segmentation with Multimodal Transformers
arXiv paper abstract https://arxiv.org/abs/2111.14821
arXiv PDF paper https://arxiv.org/pdf/2111.14821.pdf
GitHub https://github.com/mttr2021/MTTR

The referring video object segmentation task (RVOS) involves segmentation of a text-referred object instance in the frames of a given video.