[Mageia-discuss] Reading payment forms with a scanner
Juergen Harms
juergen.harms at unige.ch
Fri Feb 15 17:47:04 CET 2013
> In the display I sugest the user get presented the original _scanned image_
> and the end result.
Right, is planned to be done. For now, the entire display is a quick
hack. So far I only display the middle field (middle of 3) which, I am
sure, holds the reference data - the data most likely to be subject to
typos. I will probably rearrange the way how things are displayed to
show all fields. Another option to envisage - if experience shows that
this is worth while - is to interactively help when parsing lines that
result from poor OCR conversion (get rid of garbage, separate lines into
fields); but: small is beautiful. I need to get a clearer understanding
of the syntax and semantics of the fields of scanned lines
In the meantime I have focused on experimenting - and had a hard time
with lack of reproducability of the quality of results. But the
explanation has become clear: tiling the slips on the scanner has a
tendency to produce bad angular registration, and tesseract (and, even
more, gocr) are very susceptible and get confused if lines are badly
aligned, which is understandable. I hope I wont hit more problems of
that kind, but it is too early to be sure.
Interim results:
- no problem if I scan a single slip
- no problem if I take great care when tiling multiple slips on the
scanner (worth while since scanning is so slow, and simple because the
line meant for OCR is at the bottom and has much white on top and
bottom) - maybe I will make a mechanical contraption to help getting the
alignment right,
- overall: looks good, I decided to put in some more time,
- tesseract clearly provides better results than gocr,
- selection of parameters (resolution, resizing etc.) is important, but
what I have (partly result of googling) is close to optimal,
- parameters for xsane are painful to handle (presently, my .sane
directory is a link that I switch between configurations for
straightforward scanning and for slip handling.
I will re-post once I have reached some kind of "interim product" and
have confidence that it is solid (sorry, for swiss payment slips now -
but keeping in mind the interest to be extensible - there wont be tons
of code).
Juergen
More information about the Mageia-discuss
mailing list