Image retrieval with a sketch by combining photo and sketch information with XModalViT

Image retrieval with a sketch by combining photo and sketch information with XModalViT

Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval
arXiv paper abstract https://arxiv.org/abs/2210.10486v1
arXiv PDF paper https://arxiv.org/pdf/2210.10486v1.pdf

Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information.

… instances from different modalities can often provide complementary information … propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discarding them.

… framework first maps paired datapoints from the individual photo and sketch modalities to fused representations that unify information from both modalities.

… then decouple the input space of the aforementioned modality fusion network into independent encoders of the individual modalities via contrastive and relational cross-modal knowledge distillation.

Such encoders can then be applied to downstream tasks like cross-modal retrieval.

… demonstrate the expressive capacity of the learned representations by performing a wide range of experiments and achieving state-of-the-art results …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by pure julia on Unsplash

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AI News Clips by Morris Lee: News to help your R&D

I apply innovative technologies like machine learning, computer vision, and physics to further an organization's goals. Am recognized innovator with 66 patents.