[go: up one dir, main page]

Search Results for "image to text converter"

Showing 1378 open source projects for "image to text converter"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Yeastar: Business Phone System and Unified Communications Icon
    Yeastar: Business Phone System and Unified Communications

    Go beyond just a PBX with all communications integrated as one.

    User-friendly, optimized, and scalable, the Yeastar P-Series Phone System redefines business connectivity by bringing together calling, meetings, omnichannel messaging, and integrations in one simple platform—removing the limitations of distance, platforms, and systems.
    Learn More
  • 1
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 2
    Z-Image

    Z-Image

    Image generation model with single-stream diffusion transformer

    ...The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.
    Downloads: 84 This Week
    Last Update:
    See Project
  • 3
    GLM-Image

    GLM-Image

    GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image

    GLM-Image is an open-source generative AI model designed to create high-fidelity images from text prompts using a hybrid architecture that combines autoregressive semantic understanding with diffusion-based detail refinement. It excels at generating images that include complex layouts and detailed text content, making it especially useful for posters, diagrams, info-graphics, social media graphics, and visual content that requires precise text placement and semantic alignment. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Image Toolbox

    Image Toolbox

    Image Toolbox is an powerful picture editor, which can crop

    Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.
    Downloads: 22 This Week
    Last Update:
    See Project
  • EHS Software and Management System Icon
    EHS Software and Management System

    ERA offers the only full EHS&Q platform with advanced automation to drive your complete compliance.

    ERA Environmental Software Solutions develops web-based EHS management software for small, medium, and large manufacturers needing to comply with federal, provincial, and state regulations, monitor their air, water, and waste emissions and other environmental outputs, author and manage Safety Data Sheets (SDS) in more than 40 languages, or standardize their Health and Safety procedures for incident and inspection tracking, training delivery, and audit management. The platform also supports comprehensive reporting for programs like TRI, Tier II, Title V, NEI, and NPRI. Companies across the automotive, aerospace, general manufacturing, and paints and coatings industries, to name a few, rely on ERA’s all-in-one, SOC 2 Type II certified SaaS for complete coverage of their EHS needs.
    Learn More
  • 5
    Evernote to Markdown converter

    Evernote to Markdown converter

    Convert Evernote .enex files to Markdown

    Evernote2md is a CLI tool to convert Evernote notes exported in *.enex format to a directory with markdown files.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    ...The model excels at both text-to-image generation and instruction-guided image editing, offering users versatile capabilities for creative and practical tasks—whether generating art, mockups, or adjusting existing visuals with fine control.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. ...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 8
    Intervention Image

    Intervention Image

    PHP Image Processing

    Intervention Image is a PHP image handling and manipulation library. It provides an easy-to-use interface for performing common image operations such as resizing, cropping, and applying filters. It supports a variety of image formats and can be integrated into Laravel projects or used independently in other PHP applications. The library is highly customizable, allowing for simple image manipulation tasks, or more advanced image processing workflows.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    lcd-image-converter

    lcd-image-converter

    Tool to create bitmaps and fonts for embedded applications.

    This program allows you to create bitmaps and fonts, and transform them to "C" source format for embedded applications. The transformation of the images to the source code is made by using templates. Therefore, by modifying the templates, you can change the format of the output within certain limits.
    Leader badge">
    Downloads: 571 This Week
    Last Update:
    See Project
  • Mavenlink | Project Management Software Icon
    Mavenlink | Project Management Software

    Connecting People, Projects, and Profits

    Mavenlink is an innovative online resource management and project management software built for professional services teams. Offering a better way to manage projects and resources, Mavenlink transforms businesses by combining project management, collaboration, time tracking, resource management, and project financials all in one place.
    Get Started Today
  • 10

    Image To Text tools

    ITTT is a Free tool designed to Scan and extract Text from Images.

    Image To Text Tools is a 100% Free user-friendly tool designed to Scan and extract containing text in images into editable text formats. Whether you need to extract text from scanned documents, photographs, or other image files, Image To Text Tools provides accurate and reliable Optical Character Recognition (OCR) capabilities to meet your needs.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 11
    Fooocus

    Fooocus

    Focus on prompting and generating

    Fooocus is an open-source image generation software that simplifies the process of creating images from text prompts. Built on Gradio and leveraging Stable Diffusion XL, Fooocus eliminates the need for manual parameter tweaking, allowing users to focus solely on crafting prompts. It offers a user-friendly interface with minimal setup, making advanced image synthesis accessible to a broader audience.
    Downloads: 206 This Week
    Last Update:
    See Project
  • 12

    Tesseract OCR

    Open Source OCR Engine

    Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. ...
    Downloads: 2,233 This Week
    Last Update:
    See Project
  • 13
    Tesseract.js

    Tesseract.js

    A pure Javascript Multilingual OCR

    Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image parameter, which should be something that is like an image. ...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 14
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    ...By combining text and structured image representations, it aims to facilitate tasks where both descriptive and structural understanding are important, such as detailed image QA, interactive image editing via prompt layers, and image-conditioned generation with structural control. The layered approach supports training signals that help the model learn how visual elements relate to each other and to textual context, rather than simply learning global image embeddings.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 134 This Week
    Last Update:
    See Project
  • 17
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Django MarkdownX

    Django MarkdownX

    Comprehensive Markdown plugin built for Django

    Django MarkdownX is a comprehensive Markdown plugin built for Django, the renowned high-level Python web framework, with flexibility, extensibility, and ease-of-use at its core.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Large Language Models (LLMs)

    Large Language Models (LLMs)

    Connect MATLAB to LLM APIs, including OpenAI® Chat Completions

    This repository enables MATLAB to connect with large language models (LLMs) such as OpenAI's ChatGPT, DALL-E, Azure OpenAI, and Ollama, integrating their natural language processing and image generation capabilities directly within MATLAB environments. It facilitates creating chatbots, summarizing text, and image generation, among other tasks.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 20
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved.
    Downloads: 56 This Week
    Last Update:
    See Project
  • 21
    Diffusion Bee

    Diffusion Bee

    Diffusion Bee is the easiest way to run Stable Diffusion locally

    ...Users can generate images from text prompts, perform image-to-image transformations, and apply additional features like inpainting, outpainting, and model-based upscaling directly within a clean graphical interface. It’s optimized for Apple hardware performance and can automatically manage features like ControlNet, LoRA models, and advanced prompt options without exposing complexity to the user.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 22
    Imagen - Pytorch

    Imagen - Pytorch

    Implementation of Imagen, Google's Text-to-Image Neural Network

    Implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch. It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pre-trained T5 model (attention network). It also contains dynamic clipping for improved classifier-free guidance, noise level conditioning, and a memory-efficient unit design. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs and conversation (e.g. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Easy Diffusion

    Easy Diffusion

    An easy 1-click way to create beautiful artwork on your PC using AI

    Easy Diffusion is a widely used community-driven repository offering a simple, one-click way to install and use Stable Diffusion-based generative AI on a personal computer without advanced technical skills or prior setup. It provides a browser-based user interface that runs locally, allowing users to type text prompts and immediately generate images directly within their web browser, democratizing access to powerful text-to-image models for artists and hobbyists alike. The project abstracts away environment setup, dependencies, and model installation — tasks that can be daunting to beginners — and instead lets users focus on creative experimentation with prompt phrasing, model parameters, and image output settings. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next