[go: up one dir, main page]

Showing 15 open source projects for "corpus"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • New Relic provides the most powerful cloud-based observability platform built to help companies create more perfect software. Icon
    New Relic provides the most powerful cloud-based observability platform built to help companies create more perfect software.

    Get a live and in-depth view of your network, infrastructure, applications, end-user experience, machine learning models and more.

    Correlate issues across your stack. Debug and collaborate from your IDE. AI assistance at every step. All in one connected experience - not a maze of charts.
    Start for Free
  • 1
    Honggfuzz

    Honggfuzz

    Security oriented software fuzzer

    honggfuzz is a general-purpose, high-performance fuzzer that mixes coverage feedback with practical crash triage to uncover memory-safety and logic bugs. It supports multiple fuzzing modes—stdin, file, and networking—so targets can be exercised the same way they run in production. Instrumentation via compiler hooks or hardware/perf counters guides mutations toward previously unseen edges, while persistent mode keeps the target process alive to amortize startup costs. The tool integrates...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Software, information, data sets and documentation for the Web as Corpus community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    American Fuzzy Lop

    American Fuzzy Lop

    American fuzzy lop - a security-oriented fuzzer

    ...Its workflow emphasizes quick start: point it at a target binary with compile-time instrumentation (or use QEMU-based mode when recompilation isn’t possible), seed it with a small corpus, and let it iterate. AFL is known for finding serious security issues in complex software due to its corpus minimization, queue management, and deterministic mutation stages that balance breadth and depth. It provides crash triage helpers and test case minimization so developers can reproduce and fix issues quickly. The design deliberately optimizes for robustness and speed on commodity hardware, which helped it become a standard part of many security testing pipelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    concordia

    concordia

    Powerful search library, best suited for computer-aided translation

    ...Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring lookup at the pace of 5000 queries per second (on personal PC) - a speed which can not be achieved by any other search library. Moreover, Concordia can perform its own "concordia search". ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Contract Management Software | Concord Icon
    Contract Management Software | Concord

    AI-powered contract management that helps businesses track spending, negotiate smarter, and never miss deadlines.

    Concord serves small and mid-sized businesses and Fortune 500 companies. This robust, web-based platform is used by human resource, sales, procurement, and legal teams, and virtually anyone who deals with contracts.
    Learn More
  • 5

    rcqp

    R interface to the Corpus Query Protocol

    Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GloVe

    GloVe

    GloVe model for distributed word representation

    ...If the web datasets above don't match the semantics of your end use case, you can train word vectors on your own corpus. The demo.sh script downloads a small corpus, consisting of the first 100M characters of Wikipedia. It collects unigram counts, constructs and shuffles cooccurrence data, and trains a simple version of the GloVe model. It also runs a word analogy evaluation script in python to verify word vector quality.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7

    mwetoolkit

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

    ...These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see the SVN logs). Up-to-date documentation and details about the tool can be found on the mwetoolkit website: http://mwetoolkit.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    This is a library to extract raw unicode text from any written documents (office documents such as PDF, Word, OpenOffice, ...). It should be useful to developpers of search engine, text processing, corpus analysis, ....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Taking the Paper Out of Work Icon
    Taking the Paper Out of Work

    For organizations that need powerful ECM and document automation software

    The Square 9 AI-powered intelligent document processing platform takes the paper out of work and makes it easier to get things done with digital workflows.
    Learn More
  • 10
    Supporting software for a school research paper to analyze a corpus for letter frequency and word properties.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Get1T is a tool for filtering through the massive quantity of data available in the Web 1T corpus and extracting only the counts you need - including for simple wildcard patterns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    This project is supposed to list the Top R ranked terms that are of between M and N length. It is designed to extract these phrases from a given corpus in a input folder.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Samudra Manthan uses C and MPI for finding interesting n-grams(terms) in a large corpus of data. We use the GigaWord corpus to find top m interesting n-grams using TF*IDF measure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TaCo is a tasty Palm application that enables you to use the Tanaka Corpus on your handheld. The Tanaka Corpus is a collection of Japanese/English sentence pairs that a student of Japanese language can use as a source of example sentences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    reputron is a knowledge extraction engine platform that covers all aspect of text mining, relevance, indexing and querying on a corpus of text documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next