[Mageia-discuss] Reading payment forms with a scanner

Fri Feb 15 17:47:04 CET 2013

> In the display I sugest the user get presented the original _scanned image_
+> and the end result.
+
+Right, is planned to be done. For now, the entire display is a quick 
+hack. So far I only display the middle field (middle of 3) which, I am 
+sure, holds the reference data - the data most likely to be subject to 
+typos. I will probably rearrange the way how things are displayed to 
+show all fields. Another option to envisage - if experience shows that 
+this is worth while - is to interactively help when parsing lines that 
+result from poor OCR conversion (get rid of garbage, separate lines into 
+fields); but: small is beautiful. I need to get a clearer understanding 
+of the syntax and semantics of the fields of scanned lines
+
+In the meantime I have focused on experimenting - and had a hard time 
+with lack of reproducability of the quality of results. But the 
+explanation has become clear: tiling the slips on the scanner has a 
+tendency to produce bad angular registration, and tesseract (and, even 
+more, gocr) are very susceptible and get confused if lines are badly 
+aligned, which is understandable. I hope I wont hit more problems of 
+that kind, but it is too early to be sure.
+
+Interim results:
+- no problem if I scan a single slip
+- no problem if I take great care when tiling multiple slips on the 
+scanner (worth while since scanning is so slow, and simple because the 
+line meant for OCR is at the bottom and has much white on top and 
+bottom) - maybe I will make a mechanical contraption to help getting the 
+alignment right,
+- overall: looks good, I decided to put in some more time,
+- tesseract clearly provides better results than gocr,
+- selection of parameters (resolution, resizing etc.) is important, but 
+what I have (partly result of googling) is close to optimal,
+- parameters for xsane are painful to handle (presently, my .sane 
+directory is a link that I switch between configurations for 
+straightforward scanning and for slip handling.
+
+I will re-post once I have reached some kind of "interim product" and 
+have confidence that it is solid (sorry, for swiss payment slips now - 
+but keeping in mind the interest to be extensible - there wont be tons 
+of code).
+
+Juergen
+