Page 4 | Best Open Source Java Linguistics Software

Java Linguistics Software

Linguistics Java Clear Filters

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Large Document Search Engine

A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.

Downloads: 0 This Week

Last Update: 2016-11-02
See Project
2

Leseratte

Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.

Downloads: 0 This Week

Last Update: 2020-10-03
See Project
3

LexSub

A Lexical Substitution Framework

Lexical substitution framework for supervised all-words lexical substitution using delexicalized features. For a runnable (but GPL-licensed) version of LexSub, see LexSub-GPL (sf.net/p/lexsub/lexsub-gpl)

Downloads: 0 This Week

Last Update: 2015-04-01
See Project
4

Live Transcribe Speech Engine

Live Transcribe is an Android application

Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models. Partial hypotheses stream as words are recognized, then stabilize with minimal jitter as confidence increases, which is crucial for usability. The code emphasizes efficient use of CPU and neural accelerators to balance battery life with responsiveness. Deployed in accessibility contexts, it aims for dependable behavior across accents, environments, and intermittent connectivity, with graceful degradation when resources are constrained.

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
Simple, Secure Domain Registration
Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.

Sign up for free
5

Lojban Glossary Builder

Java program to create a (potentially multilingual) glossary of the unique words in any given Lojban text. Note that the Sourceforge page for this was superceded by the Bitbucket repository: https://bitbucket.org/pretoriusjf/vlastezba/overview Any further updates will be made there.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
6

Maui Topic Indexer

Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
7

Mechaglot, Calculate Semantic Similarity

Calculate semantic similarity for any human and human-like languages

WARNING: There are too many false-positives! This is Alpha release, expect many things to improve, including the algorithms. PLEASE GO TO BROWSE ALL FILES TO READ A FULL DESCRIPTION. The goal of this project is simple: Input two sentences of the same language, and obtain the number (from 0 to 1) denoting the similarity between the inputted sentences, according to semantic categories. This project models my previous project: https://sourceforge.net/projects/semantics/ Difference is, this project does not use any database and computes any Strings as an input. JAVA was the language of choice, due to availability of modelling tools. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. -Powered by WEKA, Classifier4J and SimMetrics.

Downloads: 0 This Week

Last Update: 2014-10-07
See Project
8

Metalanguage And Analysis Toolkit

Downloads: 0 This Week

Last Update: 2015-05-09
See Project
9

Mitzuli

The open, easy-to-use and powerful translator app for Android

Mitzuli is an open source translator app for Android featuring a full offline mode, voice input (ASR), camera input (OCR), voice output (TTS), and more!

Downloads: 0 This Week

Last Update: 2015-03-02
See Project
Cloud-based observability solution that helps businesses track and manage workload and performance on a unified dashboard.
For developers, engineers, and operational teams in organizations of all sizes

Monitor everything you run in your cloud without compromising on cost, granularity, or scale. groundcover is a full stack cloud-native APM platform designed to make observability effortless so that you can focus on building world-class products. By leveraging our proprietary sensor, groundcover unlocks unprecedented granularity on all your applications, eliminating the need for costly code changes and development cycles to ensure monitoring continuity.

Learn More
10

Morfologik

ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/

1 Review

Downloads: 0 This Week

Last Update: 2015-09-10
See Project
11

Multiparse

This project is contains implementations of algorithms to integrate the output of different NLP tools (part of speech taggers, morphologies, parsers, etc.) in order to obtain more accurate, more robust and more fine-grained linguistic analyses. Note that the code is outdated, but left here for documentation purposes. Its functionality may be reimplemented within the NLP2RDF project (http://code.google.com/p/nlp2rdf).

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
12

Musaheb

An Arabic collocation extraction tool

“Musaheb”, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. “Musaheb” is able to extract n-gram collocations up to 5-gram, in addition to extracting the collocates of the nodes (the word-types we are looking for its collocates) within a window size of zero to 15 words. Moreover, it provides eight collocation statistics to calculate the strength of the collocation, and permits the input of various constraints during node selection and collocate extraction. Based on the user preferences for the node, concordance and collocates selection, the tool saves all nodes and their associated collocates in an XML file; allowing easy conversion to different formats.

Downloads: 0 This Week

Last Update: 2017-08-22
See Project
13

Nasira

Nasira is a Java library for reading text files with non-ASCII characters (e.g. documents in German, Swedish,...). To do so, it automatically determines the character encoding (iso-8859-1, utf-8) used to encode the file through user-provided hints.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
14

NetBeans Dictionaries

Additional dictionary files for the NetBeans spellchecker.

Additional dictionary files for the NetBeans spellchecker.

Downloads: 0 This Week

Last Update: 2013-03-16
See Project
15

NeurPheus Morphological Analyser

The Neurpheus Morphological Analyser performs morphological analysis, stemming or word form generation tasks using sophisticated classification methods for an analysis of words unseen in a training dictionary.

Downloads: 0 This Week

Last Update: 2013-12-20
See Project
16

OPTIMA cidoc-crm Semantic Annotation

Semantic annotation of archaeology reports with respect to CIDOC-CRM

The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property

Downloads: 0 This Week

Last Update: 2015-10-11
See Project
17

Ontology Creation

The program creates OWL ontology files that describe relationships between entities. Basis are definitions found by searching Wikipedia articles for specific lexico-syntactic patterns.

Downloads: 0 This Week

Last Update: 2014-06-26
See Project
18

PRDL Tools

Privacy Rule Definition Language to write Enterprise Privacy Policies

PRDL is one of the core components within the ENDORSE project. The scope of the language is to encompass clauses from data protection legislation and enterprise privacy policies in order to e.g. derive data access decisions automatically based on the enterprise privacy policies (EPPs). There have been many initiatives for expressing privacy rules and legal restrictions into a computable way. The attempt of PRDL is to present a collaborative result towards a multistakeholder language. The goal was that PRDL should be sufficiently expressive to define EPPs for SMEs, it should link the wording of the data privacy laws of different European countries, and it should be represented in natural language and therefore should be easy to understand. Additionally it should be able to express the workflows that have to be conducted within the helping wizards. After all, it should be automatically or semi automatically executable by a rule engine.

Downloads: 0 This Week

Last Update: 2012-07-09
See Project
19

Pacx

Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.

Downloads: 0 This Week

Last Update: 2014-03-15
See Project
20

PatchCatcher

Software for Patchwriting Detection

PatchCatcher uses suffix arrays to detect common types of patchwriting among scientific papers.

1 Review

Downloads: 0 This Week

Last Update: 2014-05-29
See Project
21

Phonology Charts

A linguistic tool to aid in the study of Linguistics/Phonology, specifically distinctive features of possible language sounds. Comprised of both a Visual C++ .NET version as well as a Java based web applet version. The C++ version has all but been ab

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
22

Phrasal

Statistical phrase-based machine translation system

Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java. At its core, it provides much the same functionality as the core of Moses. Distinctive features include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models. Developed by The Natural Language Processing Group at Stanford University, a team of faculty, postdocs, programmers and students who work together on algorithms that allow computers to process and understand human languages. Our work ranges from basic research in computational linguistics to key applications in human language technology, and covers areas such as sentence understanding, automatic question answering, machine translation, syntactic parsing and tagging, sentiment analysis.

Downloads: 0 This Week

Last Update: 2021-01-19
See Project
23

Porter Stemmer

Java version of Porter's Stemming algorithm

The Stemmer class transforms a word into its root form. The input word is provided from the add() methods. The stem() method will return the stem as will toString() after stem() has been called). The clear() method will wipe the Stemmer buffer and allow a new word to be input. This version extends Martin Porter's original stemming algorithm by allowing capital letters to exist in words. This version should also be plugged in wherever the old algorithm is used with few accommodations necessary. The code in this version is more readable (in my opinion) than the old version. There is a main at the bottom that shows how to use the Stemmer.

Downloads: 0 This Week

Last Update: 2015-10-07
See Project
24

RDRPOSTagger

A Rule-based Part-of-Speech and Morphological Tagging Toolkit

RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/

2 Reviews

Downloads: 0 This Week

Last Update: 2017-05-24
See Project
25

Reconcile

Reconcile is an open source research platform for coreference resolution. It combines a large number of open source NLP components and provides extension points for researchers to plug in additional features and techniques.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project