Vision transformer morphed to CNN works better
Vision transformer morphed to CNN works better
Visformer: The Vision-friendly Transformer
arXiv paper PDF https://arxiv.org/abs/2104.12533
arXiv PDF paper https://arxiv.org/pdf/2104.12533.pdf
GitHub https://github.com/danczs/Visformer
… rapid development … Transformer module to vision …
… there are still growing number of evidences showing that these models suffer over-fitting especially when the training data is limited.
… gradually transit a Transformer-based model to a convolution-based model.
… With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy, and the advantage becomes more significant when the model complexity is lower or the training set is smaller. …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b