Training and Interaction

The classification step pre-supposes the existence of a library of already identified glyphs. In our approach, this library is potentially different for each particular language as well as font (or book), and has to built up from scratch for a new project (a project being determined by a unique font).

The idea behind training is roughly this: There is a current collection of glyphs in the library. This is stored on disk and is persistent across sessions (typically one directory would be devoted to a particular project). The library is initially empty. For the given input image, the segmentation steps outlined previously would produce a sequence of glyphs. Each glyph will be compared to the existing library. If sufficiently many matches are found and they are all have the same symbol, then that's the classified value. Otherwise (which is what should happen if the library doesn't have enough information to classify the glyph, which again would happen very frequently at the initial stages), the user is given the option of adding this particular glyph to the library with an associated ASCII symbol name.

Classification by parts

Before going into details of training, we need to discuss what to do when segmentation fails. This can happen, for instance, if the maatra is not properly identified and removed. This should happen less frequently if the segmentation algorithm is improved, but currently this happens frequently enough that ignoring this is not an option. The only idea we now have to deal with this is to perform a brute force search for each glyph in the library inside each candidate image (the situations when this will be attempted is described below). This has all the drawbacks of direct XOR based matching, and alternatives would be most welcome.

Broad details of the User Interface

Here is what we visualize as the ideal front-end interface to the software. Depending on the stage in a particular project, the classifier will have three distinct modes

Full fledged training

For each segmented glyph (of reasonable dimensions), try to find glyphs with correlation greater than a threshold, and pick the best 5 (or less if there are less than 5 matches) of those. If at least 5 matches are found, and all are the same, take that to mean that the glyph has been recognized, so ignore and move on to next glyph. Otherwise ask for the class and add the glyph to the library of glyphs. There would be an option of not classifying as anything, to prevent low quality glyphs or poorly segmented glyphs from polluting the glyph pool. (Note that the choice of the number 5 is arbitrary, and can instead be a user-settable parameter.)

Classification, but with some interactive training

This is the mode recommended after some initial training. The idea here is to classify, but ask when in doubt. This is important because not all glyphs are equally common, and even with good segmentation, once in a while we will invariably come across a glyph with not enough training samples. We need a way to add them into the training set, and eventually, such occurrences should stop.

The problematic cases are the ones which are not segmented well enough (and really need to be classified 'by parts'). If they are 'too wide to be a glyph', then that's fine. But otherwise, we don't know whether it's a potential training set sample. For such cases, we really do have to interrupt the process.

So, for each segmented glyph, if it's too big, classify by parts (don't ask). Otherwise, find glyphs matching correlation greater than a threshold, and pick best 5 of those.

If at least 5, and all are the same, classify as that (this is the happy case, which we hope happens most of the time).

If there are no matches, classify by parts, show the result, ask for acceptance, or a new value, especially if classification by parts found no matches (remember this is being classified). Have an option to add that to the training set.

If some matches (less than 5, or 5 but not all same), classify by parts on these matches. If best is good enough, classify as that. Again, ask for confirmation and optionally add to training set (this last part is important for well-segmented but rare glyphs).

Batch mode classification with no training, no interaction

This is similar to the preceding mode, just skip the asking part.


If you have any questions or comments, contact me at deepayan at stat.wisc.edu