HunyuanOCR

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.

Features

End-to-end OCR Vision-Language Model: detection, recognition, layout parsing, translation, and structured output generation in a single inference pass
Lightweight (~1 billion parameters) yet achieves state-of-the-art performance across benchmarks for complex documents, multilingual text, handwritten/stylized fonts, receipts, tickets, and more
Supports complex layouts including columns, tables, formulas, multi-language text, mixed fonts/styles, and video subtitles/frames
Produces structured outputs (e.g., JSON, HTML, Markdown, LaTeX, translated text), enabling downstream processing like automated form filling or data extraction
Open-source with code, pretrained weights and inference scripts — easy to integrate locally or in production workflows
Efficient inference pipeline (via a native-resolution encoder + adaptive visual adapter + light LLM), lowering computational cost compared to massive models

Project Samples

Project Activity

See All Activity >

Follow HunyuanOCR

HunyuanOCR Web Site

User Reviews

Be the first to post a review of HunyuanOCR!

Additional Project Details

Operating Systems

Linux

Programming Language

Python

Related Categories

Python OCR Software, Python AI Models

Registered

2025-11-26

Similar Business Software

LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
PackageX OCR Scanning

PackageX OCR API converts any smartphone into a powerful universal label scanner that reads every bit of text on the label, including barcodes and QR codes. Our state-of-the-art OCR technology uses robust deep learning models and proprietary algorithms to extract information from package...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Nutrient SDK

Nutrient is the comprehensive solution for all your PDF needs, offering tools that effortlessly integrate and operate PDF functionality across any platform. 1. SDK PRODUCTS Integrate robust PDF functionality into iOS, Android, Windows, web (JavaScript), or any cross-platform technology,...

See Software
Square 9

Square 9 removes the frustration of extracting data from documents, forms, and all external sources, so you can harness the full power of your information. Release your team from repetitive tasks while your work flows freely in areas like Accounts Payable, Order Processing, Customer and...

See Software
MyQ

MyQ develops advanced print management solutions that help organizations reduce printing costs, strengthen secure printing, and streamline document workflows across diverse work environments. Our solutions are designed to deliver centralized, easy-to-use print management with flexible deployment...

See Software

Report inappropriate content

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

Get an email when there's a new version of HunyuanOCR

Features

Project Samples

Project Activity

Categories

Follow HunyuanOCR

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered