Read https://sourceforge.net/projects/tesseracthindi/files/OCRHindi_using_VietOCR_and_Tesseract.pdf/download for how to use vietocr gui for OCR of Hindi and Sanskrit texts using tesseract-ocr
*****
Please see https://github.com/Shreeshrii/
imagessan and imageshin for newer box/tiff pairs, traineddata files, ocr evaluation statistics and ground truth files with images for Sanskrit and Hindi.
*****
Following is OLD information - saved only for archival purposes.
Tesseract OCR 3.02 provides hin.traineddata for recognizing texts in devanagari scripts. However the Hindi training texts, images and box files are not provided, so it is difficult to improve the accuracy by further improving the traineddata. It is noted that recognition is more accurate and faster if the training is done with the same /similar font as used in the text to be OCRed.
See https://sourceforge.net/p/tesseracthindi/wiki/OCR%20for%20Devanagari/ for more details.
Categories
OCRFollow Sanskrit / Hindi - Tesseract OCR
User Reviews
-
The sanskrit traindata available here is very useful. But we need the same for vedic sanskrit text like अ॒ग्निमी॑ळे पु॒रोहि॑तम् । य॒ज्ञस्य॑ दे॒वमृ॒त्विज॑म्। होता॑रं रत्न॒धात॑मम्॥ Kindly let me know if such traindata is available
-
Much needed tools, thanks for working on it.