Quantcast
Channel: Andrej Baranovskij Blog
Viewing all articles
Browse latest Browse all 705

Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)

$
0
0
I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR data extraction. In the next step, JSON data is converted to the format understandable by Sparrow annotation processing/review tool.

 

Viewing all articles
Browse latest Browse all 705

Trending Articles