Search Results for "character recognition code"

Sort By:

Showing 435 open source projects for "character recognition code"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Cortex: Boost Developer Coding Skills
Cortex makes coding easier and faster for developers. See how our portal connects tools and cuts busywork.

Cortex is a simple portal that helps developers work smarter by linking all your tools, setting clear rules, and slashing repetitive tasks. It speeds up onboarding, updates old code, and fixes issues fast. Over 100 big companies use it to save time and get better results.

Try it now!
1

Tesseract OCR

Open Source OCR Engine

Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. ...

Downloads: 2,233 This Week

Last Update: 2025-12-26
See Project
2

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle

PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. ...

Downloads: 70 This Week

Last Update: 2026-01-29
See Project
3

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines.

Downloads: 51 This Week

Last Update: 2026-01-15
See Project
4

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 11 This Week

Last Update: 2026-01-27
See Project
BrandMail Email Signatures for Outlook
Leverage every email as an opportunity to brand consistently and minimise the security risks associated with the tampering of HTML signatures.

BrandMail®, developed by BrandQuantum, is a software solution that seamlessly integrates with Microsoft Outlook to empower every employee in the organisation to automatically create consistently branded emails via a single toolbar that provides access to brand standards and the latest pre-approved content.

Learn More
5

InsightFace

State-of-the-art 2D and 3D Face Analysis Project

State-of-the-art deep face analysis library. InsightFace is an open-source 2D&3D deep face analysis library. InsightFace is an integrated Python library for 2D&3D face analysis. InsightFace efficiently implements a wide variety of state-of-the-art algorithms for face recognition, face detection, and face alignment, which are optimized for both training and deployment. Research institutes and industrial organizations can get benefits from InsightFace library.

Downloads: 375 This Week

Last Update: 2024-08-12
See Project
6

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image...

Downloads: 21 This Week

Last Update: 2025-12-15
See Project
7

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 17 This Week

Last Update: 5 days ago
See Project
8

Patch-NetVLAD

Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

This repository contains code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition".

Downloads: 5 This Week

Last Update: 2024-07-11
See Project
9

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...

Downloads: 392 This Week

Last Update: 2026-01-15
See Project
One Unified Time Tracking Software For Projects, Billing, Pay and Compliance
For companies of all sizes looking for a Time Tracking software

Replicon's time-tracking platform is scalable and configurable to support the diverse needs of small, mid & large businesses with a remote and globally distributed workforce. Replicon’s Time Tracking is a cloud-based, enterprise-grade solution that tracks employee time across projects, tasks, presence, and absence to facilitate client billing, project costing, and compliant payroll processing. The scalable and configurable platform offers seamless integration with common business technology stacks, such as ERP, CRM, Accounting, and payroll solutions. With AI-powered time capture, mobile apps, and labor compliance as a service, Replicon makes time tracking hassle-free.

Learn More
10

Concordia

Crowdsourcing platform for full text transcription and tagging

...It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
11

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...

Downloads: 19 This Week

Last Update: 2026-02-03
See Project
12

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. ...

Downloads: 0 This Week

Last Update: 2026-01-13
See Project
13

Self-Operating Computer

A framework to enable multimodal models to operate a computer

...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 8 This Week

Last Update: 2025-02-28
See Project
14

Pot Desktop

A cross-platform software for text translation and recognition

Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. ...

Downloads: 8 This Week

Last Update: 2025-11-28
See Project
15

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 134 This Week

Last Update: 3 days ago
See Project
16

SCAIL

Towards Studio-Grade Character Animation via In-Context Learning of 3D

SCAIL is a project developed by the ZAI Organization, focusing on AI-driven research initiatives. While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL...

Downloads: 2 This Week

Last Update: 2026-01-30
See Project
17

Agently

AI Agent Application Development Framework

Build AI agent native application in very little code. Easy to interact with AI agents in code using structure data and chained-calls syntax. Enhance AI Agent using plugins instead of rebuilding a whole new agent. Agently is a development framework that helps developers build AI agent native applications really fast. You can use and build AI agents in your code in an extremely simple way.

Downloads: 1 This Week

Last Update: 2026-01-30
See Project
18

DocTR

Library for OCR-related tasks powered by Deep Learning

...Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract. Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). ...

Downloads: 9 This Week

Last Update: 2026-02-04
See Project
19

Unredact

A simple tool for reading in poorly redacted documents

Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.

Downloads: 39 This Week

Last Update: 2026-02-03
See Project
20

MBeautifier

MBeautifier is a MATLAB source code formatter, beautifier

MBeautifier is a lightweight M-Script-based MATLAB source code formatter usable directly in the MATLAB Editor.

Downloads: 5 This Week

Last Update: 2025-04-17
See Project
21

WeChatTweak-macOS

A dynamic library tweak for WeChat macOS

...Right-click the Dock icon to log in to the new WeChat account. Command line execution:open -n /Applications/WeChat.app. Message processing enhancement, supports any emoji export, supports QR code recognition. Supports right click to copy link directly. Open directly by the system default browser. No phone authentication required to reopen the app. UI interface settings panel, support for Alfred workflow, and support for Launchbar action. In order to reduce maintenance costs and ensure update speed, only the latest App Store version of the client is supported by default.

Downloads: 2 This Week

Last Update: 2025-12-15
See Project
22

Python Client For NLP Cloud

NLP Cloud serves high performance pre-trained or custom models for NER

NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.

Downloads: 3 This Week

Last Update: 2024-11-27
See Project
23

Rapid LaTeX OCR

Formula recognition based on LaTeX-OCR and ONNXRuntime

Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes.

Downloads: 1 This Week

Last Update: 2024-11-03
See Project
24

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

...It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
25

pixelmatch

The smallest, simplest JavaScript pixel-level image comparison library

The smallest, simplest and fastest JavaScript pixel-level image comparison library, originally created to compare screenshots in tests. Features accurate anti-aliased pixels detection and perceptual color difference metrics. Inspired by Resemble.js and Blink-diff. Unlike these libraries, pixelmatch is around 150 lines of code, has no dependencies, and works on raw typed arrays of image data, so it's blazing fast and can be used in any environment (Node or browsers). Compares two images,...

Downloads: 3 This Week

Last Update: 2025-02-21
See Project