Baselines

Four baseline systems are given to the participants, one for each track and assignment. Participants can play with these baseline systems, the validation data and the evaluation toolkit to have a deeper insight about what is expected in each assignment.

The baseline systems will also be evaluated at the end of the contest and will compete against all participants, as explained in the evaluation section. That is, in order to win the competition, you must beat at least the baseline systems.

Track I: Training-free track

  1. Segmentation-based: The baseline uses MPEG-like descriptors know as: "Compact Shape Portrayal Descriptor" (CSPD) [1]. Such descriptors are used to represent both the query images and the segmented word images. In this case, similarity distance between query and word images is given by the weighted Minkowski L1 distance (see [1]).
  2. Segmentation-free: This method is based on block-based document image descriptors that are used at a template matching process satisfying invariance in terms of translation, rotation and scaling. Improvement in terms of time expense is obtained by applying the matching process only on salient regions of the image (see [2]). In order to obtain the software, you will need to contact Dr. Basilis Gatos.

We are very thankful to Dr. Konstantinos Zagoris, Dr. Giorgos Sfikas and Dr. Basilis Gatos, for their help, guidance and software preparing the baseline systems for Track I.

Track II: Training-based track

  1. Query-by-String: See description bellow.
  2. Query-by-Example: See description bellow.

Both baselines for assignments A and B of Track II are based in the popular HMM-Filler approach to KWS [3]. This method was originally presented for line-based query-by-string KWS. First, a set of HMMs are trained for each character using a training set of text line images and their transcriptions.

In order to speed-up the spotting of text queries, we use the approximation to the HMM-Filler based on character-lattices, introduced by [4]: the document page images are automatically segmented into lines and the CL of each of these are obtained, and used to spot text queries.

Since queries in assignment II.B are presented in form of images (i.e. query-by-example KWS), the keywords in the images are recognized using a character bi-gram language model, trained from the keywords present in the training transcriptions. Then, the recognized keyword is simply searched using the CL-Filler KWS approach and the bounding boxes are obtained from the line segmentation and the HMM segmentation information.

References

  1. K. Zagoris, E. Kavallieratou and N. Papamarkos, "Image Retrieval Systems Based On Compact Shape Descriptor and Relevance Feedback Information", Journal of Visual Communication and Image Representation, Vol. 22, pp. 378-390, 2011.
  2. B. Gatos and I. Pratikakis, "Segmentation-free Word Spotting in Historical Printed Documents", ICDAR 2009.
  3. A. Fischer, A. Keller, V. Frinken and H. Bunke, "Lexicon-free handwritten word spotting using character HMMs", Pattern Recognition Letters, May 2012.
  4. A. H. Toselli and E. Vidal, "Fast HMM-Filler Approach for Key Word Spotting in Handwritten Documents", ICDAR 2013.