Real-time video object segmentation by encoding frame smoothness and low memory with MAVOS
Real-time video object segmentation by encoding frame smoothness and low memory with MAVOS
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
arXiv paper abstract https://arxiv.org/abs/2403.17937
arXiv PDF paper https://arxiv.org/pdf/2403.17937.pdf
Project page https://github.com/Amshaker/MAVOS
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.
… approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames.
… propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term modulated cross-attention (MCA) memory to model temporal smoothness without requiring frequent memory expansion.
… MCA effectively encodes both local and global features at various levels of granularity while efficiently maintaining consistent speed regardless of the video length.
… contributions leading to real-time inference and markedly reduced memory demands without any degradation in segmentation accuracy on long videos.
Compared to … transformer-based … MAVOS increases the speed by 7.6x … reducing the GPU memory by 87% with comparable segmentation performance on short and long video datasets …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b