Training Computer Vision Transformers without Natural Images


Can Vision Transformers Learn without Natural Images?

Deep AI

They pre-train Vision Transformers without any image collections and annotation labor. They experimentally verify it partially outperforms Self-Supervised Learning without using any natural images in the pre-training phase. It can interpret natural image datasets to a large extent. For example, the performance rates on the CIFAR-10 dataset are as follows: their proposal 97.6% vs. SimCLRv2 97.4% vs. ImageNet 98.0%

arXiv paper abstract
arXiv PDF paper

Unlimited computer fractals can help train AI to see
Large datasets like ImageNet have supercharged the last 10 years of AI vision, but they are hard to produce and contain bias. Computer generated datasets provide an alternative.
M.I.T. Technology Review

Web site
Oral presentation

Stay up to date. Subscribe to my posts
Web site with my other posts by category


Photo by Mike Erskine on Unsplash



AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.