Real-time video object segmentation by encoding frame smoothness and low memory with MAVOS

--

Real-time video object segmentation by encoding frame smoothness and low memory with MAVOS

Efficient Video Object Segmentation via Modulated Cross-Attention Memory
arXiv paper abstract https://arxiv.org/abs/2403.17937
arXiv PDF paper https://arxiv.org/pdf/2403.17937.pdf
Project page https://github.com/Amshaker/MAVOS

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.

… approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames.

… propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term modulated cross-attention (MCA) memory to model temporal smoothness without requiring frequent memory expansion.

… MCA effectively encodes both local and global features at various levels of granularity while efficiently maintaining consistent speed regardless of the video length.

… contributions leading to real-time inference and markedly reduced memory demands without any degradation in segmentation accuracy on long videos.

Compared to … transformer-based … MAVOS increases the speed by 7.6x … reducing the GPU memory by 87% with comparable segmentation performance on short and long video datasets …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Jacky Watt on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet