Data

The proposed dataset consists of a series of documents from the Bentham collection, which have been prepared in the tranScriptorium project. This dataset includes manuscripts written by Jeremy Bentham (1748-1832) himself over a period of sixty years, as well as fair copies written by Bentham's secretarial staff.

The dataset for this competition is composed of two sets of document images and queries. The first set, is given to the participants to let them experiment with the baseline systems and to be used as validation data. The second one will be used at the end of the competition to evaluate the performance of each participant.

With regard to Track II, an additional set of 423 document images with their line segmentation and transcription is given for training purposes. Systems participating in Track I cannot use this data, since Track I is aimed for training-free approaches. Participants in Track II may use any amount of the given training data, but no external data can be used at all.

The data used for this contest is a subset of the ICDAR2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset.

Training

REMEMBER: You are only allowed to use training data in Track II.

Example of a segmented training line and its transcription.

6 . The evidence of the engagement , consigned to a portable

Validation

Evaluation

Example of a document image (a), two query-by-example images (b) and their query-by-string equivalents (c).

AFORESAID
PLACE
(a) (b) (c)