Answer question about an image using structured information graph with SA-VQA

--

Answer question about an image using structured information graph with SA-VQA

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
arXiv paper abstract https://arxiv.org/abs/2201.10654v1
arXiv PDF paper https://arxiv.org/pdf/2201.10654v1.pdf

Visual Question Answering (VQA) … is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Previous approaches … employ entity-level alignments, such as the correlations between the visual regions and their semantic labels, or the interactions across question words and object features.

These attempts aim to improve the cross-modality representations, while ignoring their internal relations.

… propose to apply structured alignments, which work with graph representation of visual and textual content

… solve … by first converting different modality entities into sequential nodes and the adjacency graph, then incorporating them for structured alignments.

… model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Alina Grubnyak on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet