Quantcast
Channel: Andrej Baranovskij Blog
Browsing latest articles
Browse All 733 View Live

Box Annotations in Sparrow for Structured Data Extraction

Check out my video on Box Annotations in Sparrow for Structured Data Extraction! I’ll show you how the Qwen2.5 vision model pulls bounding box annotations from images based on what you need. Plus,...

View Article


PaddleOCR 3.0: Supercharge Your AI

I upgraded to PaddleOCR 3.0 and explain the new PaddleOCR API integration. My goal is to integrate OCR result output with Vision LLM processing to enhance large-scale, structured table data output.  

View Article


Solving Vision LLM Number Formatting Issues Using PaddleOCR and Sparrow

Discover how to fix number formatting errors in vision LLMs like Mistral! In this video, I show how Mistral misreads "56,000" as "56000" and how combining PaddleOCR’s text extraction with Sparrow’s...

View Article

Boost Vision LLM Accuracy with OCR Text Integration

I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision...

View Article

How to Extract Financial Statement Data with Sparrow & Vision LLM

Extract financial statement data with Sparrow and Vision LLM in this quick tutorial! Sparrow auto-detects tables, builds clear grids, and uses OCR for accurate Vision LLM results, preventing errors....

View Article


Solving Upwork Client Task with Sparrow

I show how Sparrow can be used to handle complex Upwork task, with accurate table data extraction. Key requirement is to prevent Vision LLM hallucinations, this is achieved by Sparrow hybrid data...

View Article

Vision LLM with MLX: Extracting Electric Meter Data in Production

In this video, I share my experience using the MLX backend to run Vision LLM (with MLX-VLM) for structured data extraction in a production environment. See how I used Sparrow to accurately read...

View Article

Structured Data Query with Sparrow AI Agent

Sparrow comes with option to extract stuctured data with query. In this video I explain how you can define such query to fetch array and field data. 

View Article


PaddleOCR 3.1 Setup in FastAPI

I explain how to run PaddleOCR 3.1 from FastAPI app. 

View Article


Financial Table Structure Analysis with Computer Vision

Explaining new functionality I'm implementing in Sparrow to pre-process tables with grid structure. This greatly improves table data extraction by Vision LLMs.  

View Article

My Experience with PyCharm AI Assistant

Explaining my experience with PyCharm AI Assistant. Showing example how code changes can be reviewed one by one, before they are accepted into your codebase. 

View Article

Advanced Structured Data Processing in Sparrow

I added instruction and validation functionality into Sparrow. This allows to process business logic with document data directly through Sparrow query. For example, it allows to check if given fields...

View Article

Ollama vs MLX Inference Speed on Mac Mini M4 Pro 64GB

MLX runs faster on first inference, but thanks to model caching or other optimizations by Ollama, second and next inference runs faster on Ollama.  

View Article


Ollama Support in Sparrow and Update to Latest MLX

I explain whats new in Sparrow and what was updated in the recent version. 

View Article

Qwen3-VL New Models Comparison and Performance on Mac Mini M4

I run and compare newest Qwen3-VL models in Sparrow. Qwen3-VL models run fast and provide good accuracy.  

View Article


Qwen3-VL Accuracy Differences on Ollama vs MLX

I run couple of tests with structured data extraction using newest Qwen3-VL model on Mac Mini M4 Pro with 64GB. I discovered the same Qwen3-VL model with the same level of quantantization performs...

View Article

Comparing Qwen3-VL AI Models for OCR Task

I'm comparing the Qwen3-VL 8B BF16 and Qwen3-VL 30B Q8 models for OCR and structured data extraction tasks. Based on my findings, the quantized 30B model runs faster and with better accuracy than the...

View Article


Ollama and MLX-VLM Accuracy Review (Qwen3-VL and Mistral Small 3.2)

I was running detail tests to compare accuracy for the same models (Qwen3-VL and Mistral Small 3.2) running on Ollama and MLX-VLM (recent 0.3.7 version). MLX-VLM runs faster, but with lower accuracy....

View Article

Structured Data Retrieval with Sparrow using OCR and Vision LLM [Improved...

I explain improvements I'm adding into Sparrow to achieve better accuracy for structured data. I'm using a method, where I run OCR step first, then construct advanced prompt with injected OCR data....

View Article

New Ministral 3 14B vs Mistral Small 3.2 24B Review

I review data accuracy retrieval and inference speed for the new Ministral 3 14B model vs older Mistral Small 3.2 24B. Older and larger 24B model wins this time.  

View Article

DeepSeek OCR Review

I'm testing structured data extraction with DeepSeek OCR. It works well and gives good data accuracy and performance to disrupt traditional cloud based document processing solutions. 

View Article


DeepSeek OCR Markdown Processing in Sparrow for Large Tables

I describe new functionality in Sparrow, where DeepSeek OCR is used to extract text data in markdown format and in the next step instruction LLM inference is utilized to convert data into structured...

View Article


Vision LLM Output Control for Better OCR with Prompt Hints

I explain my approach to enforce better OCR output from vision LLMs with prompt hints. This allows to set rules for output data validation and formatting. 

View Article

Get Vision LLMs to Follow Your Rules: Prompt-Guided JSON Formatting

JSON query helps to fetch structured output with Vision LLM and extract document data. I describe how to improve such output with additional rules provided through LLM prompt. In this video I share...

View Article

GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

I compare two OCR models using real test cases: GLM OCR and DeepSeek OCR2. Both are evaluated on their ability to extract document content and convert it into well-structured Markdown. I demonstrate...

View Article

Browsing latest articles
Browse All 733 View Live