Better document understanding without OCR using Donut transformer
Better document understanding without OCR using Donut transformer
Donut: Document Understanding Transformer without OCR
arXiv paper abstract https://arxiv.org/abs/2111.15664
arXiv PDF paper https://arxiv.org/pdf/2111.15664.pdf
Understanding document images (e.g., invoices) has been an important research topic
… current Visual Document Understanding (VDU) systems have come to be designed based on OCR.
… suffer from critical problems induced by the OCR, e.g., (1) expensive computational costs and (2) performance degradation due to the OCR error propagation.
… propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
… pre-train the model to mitigate the dependencies on large-scale real document images.
… achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b