diff options
Diffstat (limited to 'zarb-ml/mageia-discuss/2013-February/009193.html')
-rw-r--r-- | zarb-ml/mageia-discuss/2013-February/009193.html | 107 |
1 files changed, 107 insertions, 0 deletions
diff --git a/zarb-ml/mageia-discuss/2013-February/009193.html b/zarb-ml/mageia-discuss/2013-February/009193.html new file mode 100644 index 000000000..83fb68d8f --- /dev/null +++ b/zarb-ml/mageia-discuss/2013-February/009193.html @@ -0,0 +1,107 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> + <HEAD> + <TITLE> [Mageia-discuss] Reading payment forms with a scanner + </TITLE> + <LINK REL="Index" HREF="index.html" > + <LINK REL="made" HREF="mailto:mageia-discuss%40mageia.org?Subject=Re%3A%20%5BMageia-discuss%5D%20Reading%20payment%20forms%20with%20a%20scanner&In-Reply-To=%3C511E6688.90400%40unige.ch%3E"> + <META NAME="robots" CONTENT="index,nofollow"> + <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> + <LINK REL="Previous" HREF="009192.html"> + <LINK REL="Next" HREF="009200.html"> + </HEAD> + <BODY BGCOLOR="#ffffff"> + <H1>[Mageia-discuss] Reading payment forms with a scanner</H1> + <B>Juergen Harms</B> + <A HREF="mailto:mageia-discuss%40mageia.org?Subject=Re%3A%20%5BMageia-discuss%5D%20Reading%20payment%20forms%20with%20a%20scanner&In-Reply-To=%3C511E6688.90400%40unige.ch%3E" + TITLE="[Mageia-discuss] Reading payment forms with a scanner">juergen.harms at unige.ch + </A><BR> + <I>Fri Feb 15 17:47:04 CET 2013</I> + <P><UL> + <LI>Previous message: <A HREF="009192.html">[Mageia-discuss] Reading payment forms with a scanner +</A></li> + <LI>Next message: <A HREF="009200.html">[Mageia-discuss] Reading payment forms with a scanner +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#9193">[ date ]</a> + <a href="thread.html#9193">[ thread ]</a> + <a href="subject.html#9193">[ subject ]</a> + <a href="author.html#9193">[ author ]</a> + </LI> + </UL> + <HR> +<!--beginarticle--> +<PRE>><i> In the display I sugest the user get presented the original _scanned image_ +</I>><i> and the end result. +</I> +Right, is planned to be done. For now, the entire display is a quick +hack. So far I only display the middle field (middle of 3) which, I am +sure, holds the reference data - the data most likely to be subject to +typos. I will probably rearrange the way how things are displayed to +show all fields. Another option to envisage - if experience shows that +this is worth while - is to interactively help when parsing lines that +result from poor OCR conversion (get rid of garbage, separate lines into +fields); but: small is beautiful. I need to get a clearer understanding +of the syntax and semantics of the fields of scanned lines + +In the meantime I have focused on experimenting - and had a hard time +with lack of reproducability of the quality of results. But the +explanation has become clear: tiling the slips on the scanner has a +tendency to produce bad angular registration, and tesseract (and, even +more, gocr) are very susceptible and get confused if lines are badly +aligned, which is understandable. I hope I wont hit more problems of +that kind, but it is too early to be sure. + +Interim results: +- no problem if I scan a single slip +- no problem if I take great care when tiling multiple slips on the +scanner (worth while since scanning is so slow, and simple because the +line meant for OCR is at the bottom and has much white on top and +bottom) - maybe I will make a mechanical contraption to help getting the +alignment right, +- overall: looks good, I decided to put in some more time, +- tesseract clearly provides better results than gocr, +- selection of parameters (resolution, resizing etc.) is important, but +what I have (partly result of googling) is close to optimal, +- parameters for xsane are painful to handle (presently, my .sane +directory is a link that I switch between configurations for +straightforward scanning and for slip handling. + +I will re-post once I have reached some kind of "interim product" and +have confidence that it is solid (sorry, for swiss payment slips now - +but keeping in mind the interest to be extensible - there wont be tons +of code). + +Juergen +</PRE> + + + + + + + + + + + +<!--endarticle--> + <HR> + <P><UL> + <!--threads--> + <LI>Previous message: <A HREF="009192.html">[Mageia-discuss] Reading payment forms with a scanner +</A></li> + <LI>Next message: <A HREF="009200.html">[Mageia-discuss] Reading payment forms with a scanner +</A></li> + <LI> <B>Messages sorted by:</B> + <a href="date.html#9193">[ date ]</a> + <a href="thread.html#9193">[ thread ]</a> + <a href="subject.html#9193">[ subject ]</a> + <a href="author.html#9193">[ author ]</a> + </LI> + </UL> + +<hr> +<a href="https://www.mageia.org/mailman/listinfo/mageia-discuss">More information about the Mageia-discuss +mailing list</a><br> +</body></html> |