Segment objects in videos by generating an auxiliary frame between adjacent frames with Chen
Segment objects in videos by generating an auxiliary frame between adjacent frames with Chen
Space-time Reinforcement Network for Video Object Segmentation
arXiv paper abstract https://arxiv.org/abs/2405.04042
arXiv PDF paper https://arxiv.org/pdf/2405.04042
Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames.
Despite these methods having superior performance, they suffer from two issues: 1) Challenging data can destroy the space-time coherence between adjacent video frames.
2) Pixel-level matching will lead to undesired mismatching caused by the noises or distractors.
… first propose to generate an auxiliary frame between adjacent frames, serving as an implicit short-temporal reference for the query one.
… learn a prototype for each video object and prototype-level matching can be implemented between the query and memory.
… network outperforms the state-of-the-art method … network exhibits a high inference speed of 32+ FPS.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b