Search Results for "linguistic" - Page 2

Sort By:

Showing 124 open source projects for "linguistic"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
The sales CRM that makes your life easy, so all you have to do is sell.
The simpler way to sell

Welcome to the simpler way to sell. Pipedrive is CRM software that makes your life easy, for less legwork and more sales. Let us track your sales conversations, eliminate admin tasks, get you more leads and uncover how you win, because your day belongs to you. Join more than 100,000 sales teams around the world that use the CRM rated #1 by SoftwareReviews in 2019. Start your free 14-day trial and get full access – no credit card needed.

Try it free
1

Infinite Monkeys 5.0

Infinite Monkeys 5.26 (IM5.26) is a fully-featured, browser-based scripting engine for generative literature, experimental poetry, and procedural text creation. Built on a custom POS-driven language, IM5.26 enables users to generate complex poetic structures, recursive grammars, narrative fragments, and linguistic artifacts using a flexible, expressive instruction system. The engine includes a powerful Random Script Generator, giving creators instant access to dynamic, evolving script templates. (A complementary Script Forge, allowing users to design their own script-templates from scratch, is currently under development.) IM5.26 also incorporates robust features such as macro definitions, conditional logic, loops, string and array manipulation, phoneme-based operations, dictionary filtering, and semantics-aware word selection. ...

Downloads: 0 This Week

Last Update: 2025-12-17
See Project
2

Tokenized Text Aligner

Aligns tokens in two versions of a text with differing tokenization.

This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization. In its default implementation, it produces a human-readable CSV table associating tokens in text A with tokens in text B, and can also inject token-level annotation from text B to text A. ...

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
3

Lingua-Go

The most accurate natural language detection library for Go

Lingua-Go is a Golang implementation of the Lingua language detection library, providing efficient and accurate language identification for Go-based applications. Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine-learning frameworks or natural language processing applications. ...

Downloads: 0 This Week

Last Update: 2025-01-24
See Project
4

gadict

gadict is a small collection of EN to EN/RU/UK dictionaries.

gadict is a small collection of EN to EN/RU/UK dictionaries. Also project provides additional linguistic information about EN language. All materials are freely accessible (Public domain).

Downloads: 0 This Week

Last Update: 2023-07-21
See Project
Cloud-Based Software Licensing - Zentitle by Nalpeiron
The #1 Software Licensing Solution. Release new Software License Models fast with no engineering. Increase software sales and drive up revenues.

1000’s software companies have used Zentitle to launch new software products fast and control their entitlements easily - many going from startup to IPO on our platform. Our software monetization infrastructure allows you to easily build or

Learn More
5

SentimentAnalysis-Rick&Morty

Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. In this end-of-degree work, we analyze and classify the dialogue of characters in an English-language television series as "Rick and Morty" using Python. ...

Downloads: 0 This Week

Last Update: 2023-07-12
See Project
6

Lingua

The most accurate natural language detection library for Java

Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

Downloads: 0 This Week

Last Update: 2024-09-14
See Project
7

Linguistic Analyzer

The Linguistic Analyzer is a tool for corpus analysis and comparison

The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.

Downloads: 1 This Week

Last Update: 2022-04-16
See Project
8

AhoTTS - TTS for Basque and Spanish

Text-to-Speech for Basque and Spanish

Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder. Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/ http://aholab.ehu.es/ahocoder/

1 Review

Downloads: 0 This Week

Last Update: 2022-05-03
See Project
9

Grade School Math

8.5K high quality grade school math problems

...These aren’t trivial exercises — many require multi-step reasoning, combining arithmetic operations, and handling intermediate steps (e.g. “If she sold half as many in May… how many in total?”). The problems are written by human authors (not automatically generated) to ensure linguistic variety and realism. The repository maintains strict formatting (e.g. JSONL) for problem + answer pairs, and is used broadly in research to benchmark model performance under “word problem” settings. Issues are tracked (people report incorrect problems, ambiguous statements), and contributions are possible for cleaning or expanding the set.

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
Information Security Made Simple and Affordable | Carbide
For companies requiring a solution to scale their business without incurring security debt

Get expert guidance and smart tools to launch or level up your security and compliance efforts without the complexity.

Learn More
10

XZVoice

Free and open source text-to-speech software

...Technically, multi-level rhythmic pauses are taken into account to achieve the purpose of natural synthesizing rhythm, and comprehensively use acoustic parameters and linguistic parameters to establish multiple automatic prediction models based on deep learning. Using massive audio data to train the pronunciation model, the synthetic sound is real, full, cadenced, and expressive, and the MOS score has reached the professional level in the industry.

Downloads: 0 This Week

Last Update: 2022-10-04
See Project
11

Apertium: Machine Translation Toolbox

The free and open-source rule-based machine translation platform

Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.

17 Reviews

Downloads: 8 This Week

Last Update: 2021-04-16
See Project
12

PseudonymizeSpeech

Praat script to pseudonymize speech.

A Praat script to pseudonymize speech. That is, Pseudonymize Speech tries to make it difficult to recognize a speaker while still retaining relevant (para-)linguistic features and intelligibility. There is a trade-off between the level of pseudonymization and the (para-)linguistic features retained. The approach is to manipulate the spectro-temporal structure of the speech to simulate a different length and structure of the vocal tract, as well as a different pitch and speaking rate. The method is deterministic, and partially reversible. ...

Downloads: 0 This Week

Last Update: 2025-07-04
See Project
13

SentEval

A python tool for evaluating the quality of sentence embeddings

...It defines a simple interface—provide an encoder function from sentences to vectors—and then runs consistent training/evaluation loops for tasks like sentiment, entailment, paraphrase, and semantic textual similarity. The suite also contains linguistic probing tasks that illuminate what properties embeddings capture, such as tense, word order, or syntactic structure. Datasets are wrapped with unified preprocessing and metrics so results are comparable across papers and implementations. Because the interface is minimal, researchers can plug in encoders from any framework or language model and obtain a broad evaluation with little glue code. ...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
14

POWLA

OWL/RDF representation for linguistic corpora

POWLA is a formalism that allows to represent linguistic corpora in RDF. POWLA is an OWL/DL formalization of an abstract data model, PAULA (http://www.sfb632.uni-potsdam.de/d1/paula/doc), that has been developed to represent (a) any type of linguistic annotation applicable to textual data, and (b) any combination of annotation layers. For a detailed motivation of POWLA and its application to facilitate interoperability of annotated corpora, see Christian Chiarcos (to appear 2012), Interoperability of Corpora and Annotations, in: Christian Chiarcos, Sebastian Nordhoff and Sebastian Hellmann (eds.), Linked Data in Linguistics. ...

Downloads: 0 This Week

Last Update: 2020-06-08
See Project
15

OLiA

OWL/DL ontologies for linguistic annotations

MOVED TO https://github.com/acoli-repo/olia. The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models for a large number of annotation schemes (OLiA Annotation Models) and their relationship to reference data categories (OLiA Linking Models). The OLiA Reference Model itself is linked to community-maintained repositories such as GOLD (http://linguistics-ontology.org/) and ISOcat (http://www.isocat.org) The OLiA ontologies were originally developed as part of an infrastructure for the sustainable maintenance of linguistic resources (http://www.sfb441.uni-tuebingen.de/c2/index-engl.html), their fields of application include the formalization of annotation schemes, concept-based querying over heterogeneously annotated corpora, and the development of interoperable NLP pipelines.

Downloads: 0 This Week

Last Update: 2019-11-11
See Project
16

AhoTTS Multilingual, a Multilingual TTS

Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English

Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English. It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder. Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/ http://aholab.ehu.es/ahocoder/

1 Review

Downloads: 0 This Week

Last Update: 2019-11-29
See Project
17

TreeForm Syntax Tree Drawing Software

Syntax Tree Drawing Software (Linguistics)

TreeForm Syntax tree drawing software is a Linguistic Syntax/Semantics tree drawing editor. Designed for graphical n-ary tree drawing. Mac users can install the software through the new package, but must give authority through "System Preferences" > "Security & Privacy". Windows and Linux users can run the software through the JAR file directly. All users must have Java 8 or higher installed. https://java.com/en/download/

">

14 Reviews

Downloads: 70 This Week

Last Update: 2019-09-05
See Project
18

pangu.py

Paranoid text spacing in Python

...It’s designed to be pragmatic and lightweight, with sensible defaults that handle common edge cases found in websites, blogs, and multilingual technical docs. Because it targets clarity over heavy linguistic analysis, it’s easy to adopt and delivers immediate, visible improvements to mixed CJK/Latin text.

Downloads: 1 This Week

Last Update: 2025-10-18
See Project
19

MITIE

MITIE: library and tools for information extraction

...MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

Downloads: 0 This Week

Last Update: 2023-08-04
See Project
20

TBXTools

A Python class for Terminology Extraction and Management

TBXTools allows easy and rapid Terminology Extraction and Management. This tool implements both statistical and linguistic methods, along with several utilities to create and manage terminological databases. It is written in Python and uses NLTK (Natural Language Toolkit) The project has moved to Github: https://github.com/aoliverg/TBXTools

Downloads: 1 This Week

Last Update: 2020-12-22
See Project
21

SLING

A natural language frame semantics parser

...We use frame semantics as a common representation for both knowledge representation and document annotation. The SLING parser can be trained to produce frame semantic representations of text directly without any explicit intervening linguistic representation. The SLING project is still work in progress. We do not yet have a full system that can extract facts from arbitrary text, but we have built a number of the subsystems needed for such a system. The SLING frame store is our basic framework for building and manipulating frame semantic graph structures. The Wiki flow pipeline can take a raw dump of Wikidata and convert this into one big frame graph.

Downloads: 0 This Week

Last Update: 2024-08-13
See Project
22

pyhanlp

Chinese participle

pyhanlp is a Python interface for HanLP (Han Language Processing) that lets you use a mature Java-based NLP toolkit from Python workflows without rebuilding the underlying algorithms. It is commonly used for Chinese-language NLP tasks where you want production-grade tokenization and linguistic analysis, but still want the convenience of Python scripting. The project focuses on making HanLP’s capabilities accessible through a Python-friendly API surface, so you can integrate NLP steps into data pipelines, notebooks, and downstream ML or information-extraction code. In practice, it serves as a bridge layer: Python calls are translated into the corresponding HanLP operations, so you can keep your application logic in Python while relying on HanLP’s implementations. ...

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
23

KhmerText

Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.

">

4 Reviews

Downloads: 63 This Week

Last Update: 2018-05-17
See Project
24

rcqp

R interface to the Corpus Query Protocol

Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.

Downloads: 0 This Week

Last Update: 2018-03-13
See Project
25

Quick & Dirty Study Utility

An application designed for quick study from multiple choice questions. It was first developed for the quickest possible study of practice exams. It's designed to enhance multi-linguistic skills (though was originally for the study of CISCO prac exams).

Downloads: 0 This Week

Last Update: 2017-12-14
See Project