Better document understanding without OCR using Donut transformer


Better document understanding without OCR using Donut transformer

Donut: Document Understanding Transformer without OCR
arXiv paper abstract
arXiv PDF paper

Understanding document images (e.g., invoices) has been an important research topic

… current Visual Document Understanding (VDU) systems have come to be designed based on OCR.

… suffer from critical problems induced by the OCR, e.g., (1) expensive computational costs and (2) performance degradation due to the OCR error propagation.

… propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.

… pre-train the model to mitigate the dependencies on large-scale real document images.

… achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets …

Stay up to date. Subscribe to my posts
Web site with my other posts by category


Photo by Annie Spratt on Unsplash



AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.


See more recommendations