[go: up one dir, main page]

Showing 26 open source projects for "arabic corpus"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • The sales CRM that makes your life easy, so all you have to do is sell. Icon
    The sales CRM that makes your life easy, so all you have to do is sell.

    The simpler way to sell

    Welcome to the simpler way to sell. Pipedrive is CRM software that makes your life easy, for less legwork and more sales. Let us track your sales conversations, eliminate admin tasks, get you more leads and uncover how you win, because your day belongs to you. Join more than 100,000 sales teams around the world that use the CRM rated #1 by SoftwareReviews in 2019. Start your free 14-day trial and get full access – no credit card needed.
    Try it free (No Credit Card Required)
  • 1
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2-Texts with errors to evaluate AGD’s ability to discover inserted errors in entirely correct essays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • The only CRM built for B2C Icon
    The only CRM built for B2C

    Stop chasing transactions. Klaviyo turns customers into diehard fans—obsessed with your products, devoted to your brand, fueling your growth.

    Klaviyo unifies your customer profiles by capturing every event, and then lets you orchestrate your email marketing, SMS marketing, push notifications, WhatsApp, and RCS campaigns in one place. Klaviyo AI helps you build audiences, write copy, and optimize — so you can always send the right message at the right time, automatically. With real-time attribution and insights, you'll be able to make smarter, faster decisions that drive ROI.
    Learn More
  • 5

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7

    Queries for OSAC (Arabic) Corpus

    43 Queries for Arabic Information Retrieval Collection

    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic discritization Corpus (Vocalized texts)

    Tashkeela: Arabic discritization Corpus, Resource, Arabic vocalized texts: نصوص عربية مشكولة =========== Contains Arabic text vocalized . Text -format; 75.6 millions words Please cite this resource as: T. Zerrouki, A. Balla, Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems, Data in Brief (2017), http://dx.doi.org/10.1016/j.dib.2017.01.011 Data in Brief ∎ ( ∎∎∎∎ ) ∎∎∎ – ∎∎∎
    Leader badge">
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9

    PADIC

    A multilingual Parallel Arabic DIalectal Corpus

    PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Quality and compliance software for growing life science companies Icon
    Quality and compliance software for growing life science companies

    Unite quality management, product lifecycle, and compliance intelligence to stay continuously audit-ready and accelerate market entry

    Automate gap analysis across FDA, ISO 13485, MDR, and 28+ regulatory standards. Cross-map evidence once, reuse across submissions. Get real-time risk alerts and board-ready dashboards, so you can expand into new markets with confidence
    Learn More
  • 10

    Arabic business corpora

    Arabic business and management corpus

    This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11

    Classical Arabic Corpus

    A corpus contains more than 1 M distinct Arabic words.

    This project has been developed as part of a master thesis named "Edit Distance Adapted to Natural Language Words". The available project consists three parts. First, the corpus gathers more than one million distinct Arab words. Second, the text files of Arabic resources. Third, the index file presents some information about these resources. Additional details about these parts are available in README file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Osman Arabic Text Readability

    Osman Arabic Text Readability

    Open Source tool for Arabic text readability

    We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Arabic Named Entity Gazetteer

    Arabic Named Entity Gazetteer

    ...To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia", In Proceedings of IJCNLP, p392-400. Nagoya, Japan, October, 2013. Author URL: http://www.cs.bham.ac.uk/~fsa081/index.html http://fsalotaibi.kau.edu.sa Email: fsalotaibi {AT} kau.edu.sa fsa081 {AT} cs.bham.ac.uk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    The Arabic corpus has been developed as part of a research project named "A New Approach of Semi-Indexing of Text Documents". This corpus consists of more than 460 Arab books. Arabic corpus can be used for the development of language engineering applications, information retrieval and information extraction. The total corpus size is 137 MB It contains 23,264,785 words and more than 128,584,458 letters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Fine-grained Arabic Named Entity Corpora

    Fine-grained Arabic Named Entity Corpora

    The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes. The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes. A) Manually gold-standard: 1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens and 2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic Named Entity Corpus, ~170K tokens. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    InAra Plagiarism Detection Corpus

    A corpus for the Arabic Intrinsic Plagiarism Detection evaluation

    ARAbic INtrinsic plagiarism detection corpus (InAra Corpus 2013) InAra corpus it the first corpus for the evaluation of Arabic Intrinsic plagiarism detection. The Intrinsic Plagiarism Detection consists in uncovering the plagiarized passages on the basis of the writing style inconsistency in a given suspicious document. As opposed to the external approach, the intrinsic approach does not necessitate any comparison of the suspicious document against the potential sources of plagiarism. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    KALIMAT Multipurpose Arabic Corpus

    A corpus that could be of help for researchers working on Arabic NLP

    KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291 Morphologically Analyse articles. ...
    Leader badge">
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20

    EASC (Essex Arabic Summaries Corpus)

    Arabic natural language resources

    ...Available in two encoding formats UTF-8 and ISO-8859-6 (Arabic). The Essex Arabic Summaries Corpus (EASC) uses copyright material. Users of the corpus are responsible for ensuring that they comply with the terms of the copyrights that apply to the source material and the derived works (summaries) and the terms of relevant copyright law. Any other original data that is distributed with this corpus is made available under the Creative Commons Attributive/Share Alike license (http://creativecommons.org/licenses/by-sa/3.0/). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    AADRTE

    Automatic Arabic Domain-Relevant Term Extraction

    In this research we propose a model for automatic domain-relevant term extraction from Arabic text corpus. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. This increases precision and recall as a measures of relevancy of extracted terms to a specific domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Arabic Obsolete Words

    A list of obsolete words in the Buckwalter Morphological Analyser

    ...Then all the lemmas are queried in the Arabic Gigaword corpus (fourth edition) and if a lemma has a frequency of 10 or less occurrences, then it is considered as obsolete. Reference Mohammed Attia, Pavel Pecina, Lamia Tounsi, Antonio Toral, Josef van Genabith. 2011. A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    Arabic Multiword Expressions

    Multiword expression resources for Arabic, totalling 34,658 MWEs

    Multiword expression resources for Arabic, totalling 34,658 MWEs. These MWEs are extracted from the Arabic wikipedia,from the Arabic Gigaword corpus (4th Edition), and from the English Princeton WordNet translated into Arabic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Arabic Broken Plurals

    List of Arabic Broken Plurals

    This is the List of Arabic Broken Plurals automatically extracted by Mohammed Attia from a large contemporary corpus, provided with morphological patterns for both the singular forms and the plural forms. It contains 2562 broken plural forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    A word count of Modern Standard Arabic from a 1 billion word corpus, sorted according to frequency counts
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next