Survey of transformers for video


Survey of transformers for video

Video Transformers: A Survey
arXiv paper abstract
arXiv PDF paper

… Transformers a promising tool for solving video related tasks, but some adaptations are required.

… In this survey … analyse and summarize the main contributions and trends for adapting Transformers to model video data.

… delve into how videos are embedded and tokenized, finding a very widspread use of large CNN backbones to reduce dimensionality and a predominance of patches and frames as tokens.

… study how the Transformer layer has been tweaked to handle longer sequences, generally by reducing the number of tokens in single attention operation.

… explore how other modalities are integrated with video and

… conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D CNN counterparts with equivalent FLOPs and no significant parameter increase.

Stay up to date. Subscribe to my posts
Web site with my other posts by category


Photo by Sam McGhee on Unsplash



AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.