King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such as:
• Arabic linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research.
• Arabic computational linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research including their various applications.
• Arabic language teaching for both Arabs and non Arabs.
• Artificial intelligence.
• Natural language processing.
• Information retrieval.
• Question answering.
• Machine translation.

Features

  • An electronic corpus: allowing faster and more accurate investigation of written Arabic.
  • A synchronic corpus: including Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic.
  • A general corpus: covering a wide range of genres making it suitable for various research subjects.
  • A representative corpus: it can be used as the basis for generalizations concerning Classical Arabic.
  • A balanced corpus: the number of text samples taken from each genre is proportional to that genre.
  • A monolingual corpus: containing written text of classical Arabic.
  • An unvowelized corpus: only the words of the holy Quran are vowelized.
  • A raw corpus: containing no tagging, lemmatization nor any further type of annotation, just plain text.
  • An automatically annotated version of the corpus with lemma, stem, POS tag, gender and number annotations is also available.

Project Samples

Project Activity

See All Activity >

License

Creative Commons Attribution Non-Commercial License V2.0

Follow KSUCCA Corpus

KSUCCA Corpus Web Site

You Might Also Like
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of KSUCCA Corpus!

Additional Project Details

Operating Systems

Android, Apple iPhone, Linux, Mac, Windows

Registered

2019-11-20