This repository hosts the Omniglot dataset for one-shot learning, containing handwritten characters across multiple alphabets along with stroke data. It includes both MATLAB and Python starter scripts (e.g. demo.m, demo.py) to illustrate how to load the images and stroke sequences and run baseline experiments (such as classification by modified Hausdorff distance). The dataset provides both an image representation of each character and the time-ordered stroke coordinates ([x, y, t]) for each instance. Includes stroke data (time-sequenced coordinates) per sample. The repository is intended as a benchmark dataset in few-shot / meta-learning research, not as a plug-and-play detection or classification engine. Pre-split “background” and “evaluation” alphabets for standard benchmarking.
Features
- Contains 1,623 characters drawn by 20 different people each
- Includes stroke data (time-sequenced coordinates) per sample
- Supplies MATLAB and Python demo scripts for usage
- Pre-split “background” and “evaluation” alphabets for standard benchmarking
- Support for “minimal” splits with fewer background alphabets
- Easily extensible / usable as a benchmark dataset for one-shot methods