Quantcast
Channel: Andrej Baranovskij Blog
Browsing latest articles
Browse All 711 View Live

Visual LLM Structured Output Validation with Sparrow

I explain how Sparrow validates the structured output of visual LLMs to ensure it complies with the JSON schema provided in the query. This process helps prevent errors and hallucinations generated by...

View Article


Batch Inference with Qwen2 Vision LLM (Sparrow)

I'm explaining several hints how to optimize Qwen2 Visual LLM performance for batch processing.  

View Article


Sparrow Apple MLX Backend on Mac Mini M4 (Qwen2 72B 4bit)

I show how I’m running the Qwen2 72B 4bit model locally on a Mac Mini M4 for Sparrow’s backend. MLX (and MLX-VLM) is the main platform I’m using for local data extraction in Sparrow.  

View Article

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX)

I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.  

View Article

Streamlined Table Data Extraction with Sparrow | Table Transformer, Qwen2 VL,...

Learn how to streamline table data extraction with Sparrow, Table Transformer, Qwen2 VL, and MLX on the Mac Mini M4 Pro. Simplify your workflow and get accurate results!  

View Article


Stateless MLX Inference with FastAPI in Sparrow

I show how to run inference with MLX in stateless mode, when loaded model is released after inference completes. This is useful when inference requests are less frequent and it helps to reclaim...

View Article

Vision LLM Structured Output with Sparrow

I show how Sparrow UI Shell works with both image and PDF docs to process and extract structured data with Vision LLM (Qwen2) in the MLX backend.  

View Article

Apple MLX Vision LLM Server with Ngrok, FastAPI and Sparrow

I show how I run Apple MLX backend on my local Mac Mini M4 Pro 64GB and access it from the Web through Ngrok, with automatically provisioned HTTPS certificate.  

View Article


Improving Qwen-VL Structured Output with Image Cropping

Explaining how I'm improving structured output results from Qwen-VL with image cropping in Sparrow.  

View Article


Building Web UI Apps with Python Gradio – A Java Developer’s Perspective

I explain building Web UI apps with Python Gradio framework. I used to work with Java in the past and was building enterprise Web UI apps with JSF. Based on this experience I can tell, Gradio is...

View Article

Structured Data Extraction with Sparrow Agent: Vision LLM & Prefect in Action

Discover how to streamline your data extraction process with Sparrow Agent! In this tutorial, I showcase how Sparrow Agent leverages Vision LLM to intelligently handle complex data tasks, while Prefect...

View Article

Querying Non Existing Fields with Qwen2.5 Vision LLM

I describe how Sparrow helps to query non existing fields with Qwen2.5 Vision LLM. Running it locally with MLX and MLX-VLM.  

View Article

Building AI Agent for Local Structured JSON Output

I explain key steps of building AI agent to process document and extract structured JSON data locally. I'm running it with Sparrow and using Qwen VL model for vision processing backend and OCR. The...

View Article


Temporary Files Cleaner for Gradio Web App

Learn how to implement an automatic temporary file cleanup solution for Gradio web applications. This tutorial shows you how to prevent disk space issues by periodically removing old upload files and...

View Article

Oracle DB 23ai Free Connection Pool in Python

I describe how to connect to Oracle DB from Python. I explain why DB connection pool is important for better performance. Connection is done through thin oracledb mode, without installing Oracle Client. 

View Article


Extract Structured Data from Documents with Sparrow (Free Tier Available)

I built Sparrow for document data extraction 🚀 It's fully open-source and runs locally on your machine You can extract structured data from any document using powerful Mistral 24B 8bit and Qwen 2.5 72B...

View Article

Dashboard with Gradio Python

This video showcases the Sparrow dashboard, where you can view statistics on document data extraction events processed by Sparrow. This elegant dashboard is built with Python using Gradio, a...

View Article


Running Vision Models on Apple Silicon with MLX-VLM

I show and explain how to run Qwen and Mistral vision models on Apple Silicon with MLX-VLM. I share technical tips about how to run both models and show how to pass query prompt. 

View Article

Vision LLM on Mac Mini M4 Pro: Real-World MLX Performance

I discuss the real-world MLX performance of Sparrow for structured data extraction with public access. The current Sparrow online instance runs on a Mac Mini M4 Pro with 64GB of memory. On average, it...

View Article

Local LLM Instruction Processing with Sparrow

I explain how to execute instructions with a payload using a local LLM. This is useful when you want to process your data with an LLM and provide contextual instructions, specifying the desired outcome...

View Article

LLM Microservice with Instruction Calling

I describe the idea of implementing interaction with LLM through a concept of microservice with instruction calling. This works great for enterprise application use cases, such as data validation,...

View Article


Structured Data Annotation with Qwen2.5 VL and MLX-VLM

Qwen2.5 VL can provide bounding box coordinates and confidence values for extracted structured data. This is useful for visual data review and reporting. I will explain with a practical example what...

View Article


Box Annotations in Sparrow for Structured Data Extraction

Check out my video on Box Annotations in Sparrow for Structured Data Extraction! I’ll show you how the Qwen2.5 vision model pulls bounding box annotations from images based on what you need. Plus,...

View Article

PaddleOCR 3.0: Supercharge Your AI

I upgraded to PaddleOCR 3.0 and explain the new PaddleOCR API integration. My goal is to integrate OCR result output with Vision LLM processing to enhance large-scale, structured table data output.  

View Article

Solving Vision LLM Number Formatting Issues Using PaddleOCR and Sparrow

Discover how to fix number formatting errors in vision LLMs like Mistral! In this video, I show how Mistral misreads "56,000" as "56000" and how combining PaddleOCR’s text extraction with Sparrow’s...

View Article

Browsing latest articles
Browse All 711 View Live