[go: up one dir, main page]

Showing 52 open source projects for "pdf index"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Regpack: All-in-One Online Registration and Payment Software Icon
    Regpack: All-in-One Online Registration and Payment Software

    For camps, courses, virtual classes, client billing, events, conferences, meetings, afterschool programs, educational travel, retreats

    Regpack is a powerful onboarding, registration, and payments platform trusted by thousands of organizations worldwide. Our mission is simple: to give you the tools to automate busywork, streamline your processes, and keep your focus where it belongs, on growing your programs and serving your clients.
    Learn More
  • 1
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    ...It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. Easy definition of a document tree, with automatic links to siblings, parents and children. General index as well as a language-specific module index. Automatic highlighting using the Pygments highlighter. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 2

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 11 This Week
    Last Update:
    See Project
  • Respond 100x faster, more accurately, and improve your documentation Icon
    Respond 100x faster, more accurately, and improve your documentation

    Designed for forward-thinking security, sales, and compliance teams

    Slash response times for questionnaires, audits, and RFPs by up to 90%. OptiValue.ai automates the heavy lifting, freeing your team to focus on strategic priorities with intuitive tools for seamless review and validation.
    Learn More
  • 5
    Libros de Programación en Español

    Libros de Programación en Español

    List of programming books in Spanish for free

    Libros de Programación en Español is a curated list of free programming books in Spanish, organized by topic and technology so learners can find high-quality materials without cost. The README is structured as an index with general programming books, followed by sections for specific languages such as JavaScript, TypeScript, Python, Ruby, Rust, PHP, Haskell, Go, Kotlin, Java, and R.Each entry includes the book title, author, and a link to the official or legal free version (PDF, HTML, eBook, etc.), focusing on resources that are legitimately available. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Everything cURL

    Everything cURL

    The book documenting the curl project, the curl tool, libcurl

    Everything curl is an extensive, continuously maintained book that documents the entire curl ecosystem: the curl command-line tool, the libcurl library, the project’s history and development practices, and practical guidance for using and contributing to curl. The project is written as an open source book (CC-BY-4.0) and is available in multiple formats and locations, including an online website, PDF, and ePub so readers can pick the format that suits them. Content ranges from beginner-friendly tutorials and usage examples to deep dives into internals, protocols, bindings, build instructions, and advanced deployment scenarios, making the book useful for both casual users and experienced developers. The repository is large and actively maintained with many commits, organized chapters and helper scripts to build, index and publish the book; the site rendering is powered by mdBook and hosted for easy online reading.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...
    Leader badge">
    Downloads: 4,334 This Week
    Last Update:
    See Project
  • 9
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • Parasoft: Automated Testing to Deliver Superior Quality Software Icon
    Parasoft: Automated Testing to Deliver Superior Quality Software

    Parasoft provides test automation for every phase of the software development life cycle.

    Parasoft helps organizations continuously deliver high-quality software with its AI-powered software testing platform and automated test solutions. Supporting the embedded, enterprise, and IoT markets, Parasoft’s proven technologies reduce the time, effort, and cost of delivering secure, reliable, and compliant software by integrating everything from deep code analysis and unit testing to web UI and API testing, plus service virtualization and complete code coverage, into the delivery pipeline. Bringing all this together, Parasoft’s award-winning reporting and analytics dashboard provides a centralized view of quality, enabling organizations to deliver with confidence and succeed in today’s most strategic ecosystems and development initiatives—security, safety-critical, Agile, DevOps, and continuous testing.
    Learn More
  • 10
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    CVPR 2025

    CVPR 2025

    Collection of CVPR 2025 papers and open source projects

    ...It organizes entries by topic areas such as detection, segmentation, generative models, 3D vision, multi-modal learning, and efficiency, so you can navigate the year’s output efficiently. Each paper entry typically includes a title, author list, and links to the paper PDF and official or third-party code repositories. The list frequently highlights benchmarks, leaderboards, or notable results so readers can assess impact at a glance. Because conference content evolves rapidly, the repository is updated as authors release code or refine readme instructions, keeping the collection timely. For teams planning literature reviews, study groups, or rapid prototyping sprints, it acts as a central index to the year’s most relevant methods with working implementations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Lexifinder

    Lexifinder

    A tool to create the analytical index of a manuscript

    Lexifinder is a free and open source tool to automate the creation of an analytical index of a manuscript, based on a natural language processing model. First, convert your Docx or ODT file into a PDF. Choose the output text file, set the similarity index, and choose your desired keywords. Lexifinder will include in the index all words whose significance resemble that of at least one keyword. The similarity index spans from 1 to 100 and expresses the degree of resemblance required for a noun to be included.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WA2L/WinTools

    WA2L/WinTools

    End User Tools for Windows.

    Some end user utilities for the Windows operating system. The utilities can be called thru the "Send To" context menu when right-clicking on a file or directory in the explorer or thru the Windows "Start Menu". The package can be 'installed' portable and does not need admin rights. ◆ 𝗨𝗧𝗜𝗟𝗜𝗧𝗜𝗘𝗦 - https://sourceforge.net/projects/wa2l-wintools/files/ → README ◆ 𝗙𝗘𝗔𝗧𝗨𝗥𝗘𝗦 - https://wa2l-wintools.sourceforge.net/man1/wintools.1.html -...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    Chordii

    Chordii

    Easy lead sheets from text input

    ChordPro creates elegant, stafless lead sheets for musicians needing only chords and lyrics. It processes plain text input in ChordPro format and it is a rewrite of the old though still popular Chord/Chordii programs.
    Leader badge">
    Downloads: 42 This Week
    Last Update:
    See Project
  • 15
    WIKINDX

    WIKINDX

    Virtual Research Environment / On-line Bibliography Manager

    Reference management, bibliography management, citations and a whole lot more. Designed by academics for academics, under continuous development since 2003, and used by both individuals and major research institutions worldwide, WIKINDX is a Virtual Research Environment (an enhanced on-line bibliography manager) storing searchable references, notes, files, citations, ideas, and more. An integrated WYSIWYG word processor exports formatted articles to RTF and HTML. Plugins include a...
    Downloads: 73 This Week
    Last Update:
    See Project
  • 16
    miRDeep*

    miRDeep*

    MiRDeep*

    Please cite: An, J., Lai, J., Lehman, M.L. and Nelson, C.C. (2013) miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res, 41, 727-737. We will create index for you if you tell us your interested species (j.an@qut.edu.au). download command line version "MDS_command_line_Vxx.zip" clicking "Browse All Files" please find miRPlant in sourceforge for plant miRNA prediction.
    Leader badge">
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TextSeek

    TextSeek

    Professional full-text desktop search tool

    TextSeek is a professional full-text desktop search tool. Unlike the filename search tool like Everything and Listary, TextSeek can search filename and file content easily and quickly. It supports PDF, Word, Excel, Powerpoint, RTF and other formats. The software can run directly, and no extra package is required to install.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    IEC104-RTU-Simulator

    IEC104 RTU simulator

    ...It can simulate any number of RTUs or servers. Simulated RTUs could be connected to different or same SCADA master station. IO signals are indexed and grouped by using index numbers. You can send IO signals from all RTUs to the connected SCADA master stations at once by using index number. It is written in python3 language and code is supporting both Windows and Linux OS. Package contains the following files: iec104rs.py: The code in python 3 language. iec104rs.csv: ini file in comma separated values. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    node-html-pdf

    node-html-pdf

    HTML to PDF converter that uses phantomjs

    HTML to PDF converter that uses phantomjs. html-pdf can read the header or footer either out of the footer and header config object or out of the HTML source. You can either set a default header & footer or overwrite that by appending a page number (1 based index) to the id="pageHeader" attribute of an HTML tag. You can use any combination of those tags.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Docmenta

    Docmenta

    Single Source Publishing Web-Application

    Docmenta is a Java web-application for single source publishing and help authoring. The application allows collaborative creation of documentation, e-books and online-help. Supported output formats are PDF, HTML, WebHelp, EPUB (eBook) and DocBook. For more information, visit: http://www.docmenta.org
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    ms-small-basic-dev-guide

    ms-small-basic-dev-guide

    Command reference for MSB (Microsoft Small Basic)

    Revised - 2017.10.13 This is a "Developer Command Reference Guide" for MSB (Microsoft Small Basic) divided into 12 pdf sections. There are 11 subject areas plus 1 reference doc; master command list, and reference charts: color, ascii, music, and math. 1) Includes master api & reference charts 2) 11 individual subject areas 3) Complete doc set merged for mobile users 4) 12 tab 3 ring binder index page This set of documents are in their **finished format**.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25

    jvqa

    Video Quality Assessment in Java

    jvqa 1.0-alpha-8 March 3, 2015 Video Quality Assessment in Java. Built upon Java Native Access for Avisynth - jnavi (https://sourceforge.net/projects/jnavi). Based on the Fast Structural Similarity index proposed by Chen and Bovik: http://live.ece.utexas.edu/publications/2011/chen_rtip_2011.pdf Implements the original, variance-based SSIM, Multi-Scale SSIM, Fast SSIM, 2, 3 and 4-Component Weighted SSIM, and Gradient Magnitude Similarity Deviation. Indexes may be customized by selecting image structure statistic (variance, gradient, shifted gradient, 2/3/4-component gradient of variance), structure scale (full, downsampled to 256 or multiscaled), pooling window type (Gaussian, Box, Box with downsampling, or unfiltered) and size, index stabilization (logical or by constants) and luminance index (Gaussian, Box, Box with downsampling or no luminance index).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next