From 1be510f9529cb082f802408b472a77d074b394c0 Mon Sep 17 00:00:00 2001 From: Nicolas Vigier Date: Sun, 14 Apr 2013 13:46:12 +0000 Subject: Add zarb MLs html archives --- zarb-ml/mageia-discuss/2013-February/009193.html | 107 +++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 zarb-ml/mageia-discuss/2013-February/009193.html (limited to 'zarb-ml/mageia-discuss/2013-February/009193.html') diff --git a/zarb-ml/mageia-discuss/2013-February/009193.html b/zarb-ml/mageia-discuss/2013-February/009193.html new file mode 100644 index 000000000..83fb68d8f --- /dev/null +++ b/zarb-ml/mageia-discuss/2013-February/009193.html @@ -0,0 +1,107 @@ + + + + [Mageia-discuss] Reading payment forms with a scanner + + + + + + + + + +

[Mageia-discuss] Reading payment forms with a scanner

+ Juergen Harms + juergen.harms at unige.ch +
+ Fri Feb 15 17:47:04 CET 2013 +

+
+ +
> In the display I sugest the user get presented the original _scanned image_
+> and the end result.
+
+Right, is planned to be done. For now, the entire display is a quick 
+hack. So far I only display the middle field (middle of 3) which, I am 
+sure, holds the reference data - the data most likely to be subject to 
+typos. I will probably rearrange the way how things are displayed to 
+show all fields. Another option to envisage - if experience shows that 
+this is worth while - is to interactively help when parsing lines that 
+result from poor OCR conversion (get rid of garbage, separate lines into 
+fields); but: small is beautiful. I need to get a clearer understanding 
+of the syntax and semantics of the fields of scanned lines
+
+In the meantime I have focused on experimenting - and had a hard time 
+with lack of reproducability of the quality of results. But the 
+explanation has become clear: tiling the slips on the scanner has a 
+tendency to produce bad angular registration, and tesseract (and, even 
+more, gocr) are very susceptible and get confused if lines are badly 
+aligned, which is understandable. I hope I wont hit more problems of 
+that kind, but it is too early to be sure.
+
+Interim results:
+- no problem if I scan a single slip
+- no problem if I take great care when tiling multiple slips on the 
+scanner (worth while since scanning is so slow, and simple because the 
+line meant for OCR is at the bottom and has much white on top and 
+bottom) - maybe I will make a mechanical contraption to help getting the 
+alignment right,
+- overall: looks good, I decided to put in some more time,
+- tesseract clearly provides better results than gocr,
+- selection of parameters (resolution, resizing etc.) is important, but 
+what I have (partly result of googling) is close to optimal,
+- parameters for xsane are painful to handle (presently, my .sane 
+directory is a link that I switch between configurations for 
+straightforward scanning and for slip handling.
+
+I will re-post once I have reached some kind of "interim product" and 
+have confidence that it is solid (sorry, for swiss payment slips now - 
+but keeping in mind the interest to be extensible - there wont be tons 
+of code).
+
+Juergen
+
+ + + + + + + + + + + + +
+

+ +
+More information about the Mageia-discuss +mailing list
+ -- cgit v1.2.1