Get image matching text plus image, also get descriptions of images

--

Get image matching text plus image, also get descriptions of images

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Google AI Blog https://ai.googleblog.com/2021/05/align-scaling-up-visual-and-vision.html

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
arXiv paper abstract https://arxiv.org/abs/2102.05918
arXiv PDF paper https://arxiv.org/pdf/2102.05918.pdf

… In this paper, we leverage a noisy dataset of over one billion image alt-text pairs
… A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
… The aligned visual and language representations also set new state-of-the-art results on Flickr30K and MSCOCO benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable cross-modality search with complex text and text + image queries.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Noman Shahid on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D
AI News Clips by Morris Lee: News to help your R&D

Written by AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.

No responses yet