Bentham manuscripts is a large set of documents that were written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). The transcription of this collection is currently being carried out by amateur volunteers participating in the award-winning crowd-sourced initiative, Transcribe Bentham. Currently, more than 6,000 documents have been transcribed using this public web platform. Bentham dataset used in tranScriptorium is a subset of the transcribed documents.
This dataset is free available for research purposes and it is provided into two parts: the images and the GT. The GT includes information about the layout and the transcription at line level of each image in PAGE format. Both parts must be downloaded separately. A detailed description is included in each part explaining how the dataset is organised.
Images and Ground Truth