[go: up one dir, main page]

Java Linguistics Software

View 2712 business solutions

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    LexSub

    A Lexical Substitution Framework

    Lexical substitution framework for supervised all-words lexical substitution using delexicalized features. For a runnable (but GPL-licensed) version of LexSub, see LexSub-GPL (sf.net/p/lexsub/lexsub-gpl)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Live Transcribe Speech Engine

    Live Transcribe Speech Engine

    Live Transcribe is an Android application

    Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models. Partial hypotheses stream as words are recognized, then stabilize with minimal jitter as confidence increases, which is crucial for usability. The code emphasizes efficient use of CPU and neural accelerators to balance battery life with responsiveness. Deployed in accessibility contexts, it aims for dependable behavior across accents, environments, and intermittent connectivity, with graceful degradation when resources are constrained.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 5
    Java program to create a (potentially multilingual) glossary of the unique words in any given Lojban text. Note that the Sourceforge page for this was superceded by the Bitbucket repository: https://bitbucket.org/pretoriusjf/vlastezba/overview Any further updates will be made there.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Mechaglot, Calculate Semantic Similarity

    Mechaglot, Calculate Semantic Similarity

    Calculate semantic similarity for any human and human-like languages

    WARNING: There are too many false-positives! This is Alpha release, expect many things to improve, including the algorithms. PLEASE GO TO BROWSE ALL FILES TO READ A FULL DESCRIPTION. The goal of this project is simple: Input two sentences of the same language, and obtain the number (from 0 to 1) denoting the similarity between the inputted sentences, according to semantic categories. This project models my previous project: https://sourceforge.net/projects/semantics/ Difference is, this project does not use any database and computes any Strings as an input. JAVA was the language of choice, due to availability of modelling tools. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. -Powered by WEKA, Classifier4J and SimMetrics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Mitzuli

    Mitzuli

    The open, easy-to-use and powerful translator app for Android

    Mitzuli is an open source translator app for Android featuring a full offline mode, voice input (ASR), camera input (OCR), voice output (TTS), and more!
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cloud-based observability solution that helps businesses track and manage workload and performance on a unified dashboard. Icon
    Cloud-based observability solution that helps businesses track and manage workload and performance on a unified dashboard.

    For developers, engineers, and operational teams in organizations of all sizes

    Monitor everything you run in your cloud without compromising on cost, granularity, or scale. groundcover is a full stack cloud-native APM platform designed to make observability effortless so that you can focus on building world-class products. By leveraging our proprietary sensor, groundcover unlocks unprecedented granularity on all your applications, eliminating the need for costly code changes and development cycles to ensure monitoring continuity.
    Learn More
  • 10
    ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    This project is contains implementations of algorithms to integrate the output of different NLP tools (part of speech taggers, morphologies, parsers, etc.) in order to obtain more accurate, more robust and more fine-grained linguistic analyses. Note that the code is outdated, but left here for documentation purposes. Its functionality may be reimplemented within the NLP2RDF project (http://code.google.com/p/nlp2rdf).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Musaheb

    An Arabic collocation extraction tool

    “Musaheb”, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. “Musaheb” is able to extract n-gram collocations up to 5-gram, in addition to extracting the collocates of the nodes (the word-types we are looking for its collocates) within a window size of zero to 15 words. Moreover, it provides eight collocation statistics to calculate the strength of the collocation, and permits the input of various constraints during node selection and collocate extraction. Based on the user preferences for the node, concordance and collocates selection, the tool saves all nodes and their associated collocates in an XML file; allowing easy conversion to different formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Nasira is a Java library for reading text files with non-ASCII characters (e.g. documents in German, Swedish,...). To do so, it automatically determines the character encoding (iso-8859-1, utf-8) used to encode the file through user-provided hints.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    NetBeans Dictionaries

    Additional dictionary files for the NetBeans spellchecker.

    Additional dictionary files for the NetBeans spellchecker.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    The Neurpheus Morphological Analyser performs morphological analysis, stemming or word form generation tasks using sophisticated classification methods for an analysis of words unseen in a training dictionary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    OPTIMA cidoc-crm Semantic Annotation

    Semantic annotation of archaeology reports with respect to CIDOC-CRM

    The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    The program creates OWL ontology files that describe relationships between entities. Basis are definitions found by searching Wikipedia articles for specific lexico-syntactic patterns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    PRDL Tools

    Privacy Rule Definition Language to write Enterprise Privacy Policies

    PRDL is one of the core components within the ENDORSE project. The scope of the language is to encompass clauses from data protection legislation and enterprise privacy policies in order to e.g. derive data access decisions automatically based on the enterprise privacy policies (EPPs). There have been many initiatives for expressing privacy rules and legal restrictions into a computable way. The attempt of PRDL is to present a collaborative result towards a multistakeholder language. The goal was that PRDL should be sufficiently expressive to define EPPs for SMEs, it should link the wording of the data privacy laws of different European countries, and it should be represented in natural language and therefore should be easy to understand. Additionally it should be able to express the workflows that have to be conducted within the helping wizards. After all, it should be automatically or semi automatically executable by a rule engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    PatchCatcher

    PatchCatcher

    Software for Patchwriting Detection

    PatchCatcher uses suffix arrays to detect common types of patchwriting among scientific papers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A linguistic tool to aid in the study of Linguistics/Phonology, specifically distinctive features of possible language sounds. Comprised of both a Visual C++ .NET version as well as a Java based web applet version. The C++ version has all but been ab
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Phrasal

    Phrasal

    Statistical phrase-based machine translation system

    Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java. At its core, it provides much the same functionality as the core of Moses. Distinctive features include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models. Developed by The Natural Language Processing Group at Stanford University, a team of faculty, postdocs, programmers and students who work together on algorithms that allow computers to process and understand human languages. Our work ranges from basic research in computational linguistics to key applications in human language technology, and covers areas such as sentence understanding, automatic question answering, machine translation, syntactic parsing and tagging, sentiment analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Porter Stemmer

    Java version of Porter's Stemming algorithm

    The Stemmer class transforms a word into its root form. The input word is provided from the add() methods. The stem() method will return the stem as will toString() after stem() has been called). The clear() method will wipe the Stemmer buffer and allow a new word to be input. This version extends Martin Porter's original stemming algorithm by allowing capital letters to exist in words. This version should also be plugged in wherever the old algorithm is used with few accommodations necessary. The code in this version is more readable (in my opinion) than the old version. There is a main at the bottom that shows how to use the Stemmer.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    RDRPOSTagger

    A Rule-based Part-of-Speech and Morphological Tagging Toolkit

    RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Reconcile is an open source research platform for coreference resolution. It combines a large number of open source NLP components and provides extension points for researchers to plug in additional features and techniques.
    Downloads: 0 This Week
    Last Update:
    See Project