Video correspondence by self-supervised learning with less cost by using spatial then time with Li
--
Video correspondence by self-supervised learning with less cost by using spatial then time with Li
Spatial-then-Temporal Self-Supervised Learning for Video Correspondence
arXiv paper abstract https://arxiv.org/abs/2209.07778v1
arXiv PDF paper https://arxiv.org/pdf/2209.07778v1.pdf
Learning temporal correspondence from unlabeled videos is of vital importance in computer vision, and has been tackled by different kinds of self-supervised pretext tasks.
… propose a spatial-then-temporal pretext task to address the training data cost problem.
… use contrastive learning from unlabeled still image data to obtain appearance-sensitive features.
… switch to unlabeled video data and learn motion-sensitive features by reconstructing frames.
… propose a global correlation distillation loss to retain the appearance sensitivity learned in the first step, as well as a local correlation distillation loss in a pyramid structure to combat temporal discontinuity.
… method surpasses the state-of-the-art self-supervised methods on a series of correspondence-based tasks …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b