Find location in video matching a sentence with TAN

--

Find location in video matching a sentence with TAN

Temporal Alignment Networks for Long-term Video
arXiv paper abstract https://arxiv.org/abs/2204.02968
arXiv PDF paper https://arxiv.org/pdf/2204.02968.pdf

The objective … is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to:

(1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

The challenge is to train such networks from large-scale datasets … where the associated text sentences have significant noise, and are only weakly aligned when relevant.

… describe a novel co-training method that enables to denoise and train on raw instructional videos without using manual annotation, despite the considerable noise

… proposed model … outperforms strong baselines (CLIP, MIL-NCE) on this alignment dataset by a significant margin

… apply the trained model in the zero-shot settings to multiple downstream video understanding tasks and achieve state-of-the-art results …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Matt Popovich on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet