Find moment in video matching a description by self-supervised with automatic queries with MPGN
Find moment in video matching a description by self-supervised with automatic queries with MPGN
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
arXiv paper abstract https://arxiv.org/abs/2210.12617v1
arXiv PDF paper https://arxiv.org/pdf/2210.12617v1.pdf
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query.
… Previous works … relied on the expensive query annotations for VCMR, i.e., the corresponding moment intervals.
… propose a self-supervised learning … Modal-specific Pseudo Query Generation Network (MPGN) … selects candidate temporal moments via subtitle-based moment sampling.
Then, it generates pseudo queries exploiting both visual and textual information from the selected temporal moments.
Through the multimodal information in the pseudo queries, … show that MPGN successfully learns to localize the video corpus moment without any explicit annotation.
… showing competitive results compared with both supervised models and unsupervised setting models.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b