Can Vision Transformers Learn without Natural Images?

They pre-train Vision Transformers without any image collections and annotation labor. They experimentally verify it partially outperforms Self-Supervised Learning without using any natural images in the pre-training phase. It can interpret natural image datasets to a large extent. For example, the performance rates on the CIFAR-10 dataset are as follows: their proposal 97.6% vs. SimCLRv2 97.4% vs. ImageNet 98.0%

Unlimited computer fractals can help train AI to see
Large datasets like ImageNet have supercharged the last 10 years of AI vision, but they are hard to produce and contain bias. Computer generated datasets provide an alternative.
