Do many types of video segmentation with one model without retraining with TarViS
Do many types of video segmentation with one model without retraining with TarViS
TarViS: A Unified Approach for Target-based Video Segmentation
arXiv paper abstract https://arxiv.org/abs/2301.02657
arXiv PDF paper https://arxiv.org/pdf/2301.02657.pdf
… video segmentation is currently fragmented into different tasks spanning multiple benchmarks … methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks.
… propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined ‘targets’ in video.
… approach is flexible with respect to how tasks define these targets, since it models the latter as abstract ‘queries’ which are then used to predict pixel-precise target masks.
A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining.
… apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET).
… unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b