Short Term Challenges
The aspects of the algorithm that need most work to have hope for good performance down the line:
- Segmentation into lines (low priority)
- Segmentation into words (slightly higher priority)
- Segmentation into glyphs (extremely high priority); in particular, removal of maatra and identification of below base marks (u-kaar, Ri-kaar)
- Identifying matches (low priority, seems to work OK, at least for high quality prints and scans)