https://www.nist.gov/itl/products-and-services/emnist-dataset

 

The EMNIST Dataset

What is it? The EMNIST dataset is a set of handwritten character digits derived from

www.nist.gov

28x28 pixel image format and dataset structure that directly matches the MNIST dataset.

  • EMNIST ByClass: 814,255 characters. 62 unbalanced classes.
  • EMNIST ByMerge: 814,255 characters. 47 unbalanced classes.
  • EMNIST Balanced:  131,600 characters. 47 balanced classes.
  • EMNIST Letters: 145,600 characters. 26 balanced classes.
  • EMNIST Digits: 280,000 characters. 10 balanced classes.
  • EMNIST MNIST: 70,000 characters. 10 balanced classes.

The full complement of the NIST Special Database 19 is available in the ByClass and ByMerge splits. The EMNIST Balanced dataset contains a set of characters with an equal number of samples per class. The EMNIST Letters dataset merges a balanced set of the uppercase and lowercase letters into a single 26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit datasets directly compatible with the original MNIST dataset.

 

https://arxiv.org/abs/1702.05373v1

 

EMNIST: an extension of MNIST to handwritten letters

The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements

arxiv.org

https://www.tensorflow.org/datasets/catalog/emnist

 

emnist  |  TensorFlow Datasets

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset. Note: Like the original EMNIST data, images pro

www.tensorflow.org

 

+ Recent posts