Survey of video captioning using deep learning
Survey of video captioning using deep learning
A Review of Deep Learning for Video Captioning
arXiv paper abstract https://arxiv.org/abs/2304.11431
arXiv PDF paper https://arxiv.org/pdf/2304.11431.pdf
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction.
In essence, VC involves understanding a video and describing it with language.
Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigation) to video question answering (V-QA), video retrieval and content generation.
This survey covers deep learning-based VC, including but, not limited to, attention-based architectures, graph networks, reinforcement learning, adversarial networks, dense video captioning (DVC), and more.
… discuss the datasets and evaluation metrics used in the field, and limitations, applications, challenges, and future directions for VC.
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b