summaryrefslogtreecommitdiffstats
path: root/zarb-ml/mageia-discuss/2013-February/009193.html
diff options
context:
space:
mode:
Diffstat (limited to 'zarb-ml/mageia-discuss/2013-February/009193.html')
-rw-r--r--zarb-ml/mageia-discuss/2013-February/009193.html107
1 files changed, 107 insertions, 0 deletions
diff --git a/zarb-ml/mageia-discuss/2013-February/009193.html b/zarb-ml/mageia-discuss/2013-February/009193.html
new file mode 100644
index 000000000..83fb68d8f
--- /dev/null
+++ b/zarb-ml/mageia-discuss/2013-February/009193.html
@@ -0,0 +1,107 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+ <HEAD>
+ <TITLE> [Mageia-discuss] Reading payment forms with a scanner
+ </TITLE>
+ <LINK REL="Index" HREF="index.html" >
+ <LINK REL="made" HREF="mailto:mageia-discuss%40mageia.org?Subject=Re%3A%20%5BMageia-discuss%5D%20Reading%20payment%20forms%20with%20a%20scanner&In-Reply-To=%3C511E6688.90400%40unige.ch%3E">
+ <META NAME="robots" CONTENT="index,nofollow">
+ <META http-equiv="Content-Type" content="text/html; charset=us-ascii">
+ <LINK REL="Previous" HREF="009192.html">
+ <LINK REL="Next" HREF="009200.html">
+ </HEAD>
+ <BODY BGCOLOR="#ffffff">
+ <H1>[Mageia-discuss] Reading payment forms with a scanner</H1>
+ <B>Juergen Harms</B>
+ <A HREF="mailto:mageia-discuss%40mageia.org?Subject=Re%3A%20%5BMageia-discuss%5D%20Reading%20payment%20forms%20with%20a%20scanner&In-Reply-To=%3C511E6688.90400%40unige.ch%3E"
+ TITLE="[Mageia-discuss] Reading payment forms with a scanner">juergen.harms at unige.ch
+ </A><BR>
+ <I>Fri Feb 15 17:47:04 CET 2013</I>
+ <P><UL>
+ <LI>Previous message: <A HREF="009192.html">[Mageia-discuss] Reading payment forms with a scanner
+</A></li>
+ <LI>Next message: <A HREF="009200.html">[Mageia-discuss] Reading payment forms with a scanner
+</A></li>
+ <LI> <B>Messages sorted by:</B>
+ <a href="date.html#9193">[ date ]</a>
+ <a href="thread.html#9193">[ thread ]</a>
+ <a href="subject.html#9193">[ subject ]</a>
+ <a href="author.html#9193">[ author ]</a>
+ </LI>
+ </UL>
+ <HR>
+<!--beginarticle-->
+<PRE>&gt;<i> In the display I sugest the user get presented the original _scanned image_
+</I>&gt;<i> and the end result.
+</I>
+Right, is planned to be done. For now, the entire display is a quick
+hack. So far I only display the middle field (middle of 3) which, I am
+sure, holds the reference data - the data most likely to be subject to
+typos. I will probably rearrange the way how things are displayed to
+show all fields. Another option to envisage - if experience shows that
+this is worth while - is to interactively help when parsing lines that
+result from poor OCR conversion (get rid of garbage, separate lines into
+fields); but: small is beautiful. I need to get a clearer understanding
+of the syntax and semantics of the fields of scanned lines
+
+In the meantime I have focused on experimenting - and had a hard time
+with lack of reproducability of the quality of results. But the
+explanation has become clear: tiling the slips on the scanner has a
+tendency to produce bad angular registration, and tesseract (and, even
+more, gocr) are very susceptible and get confused if lines are badly
+aligned, which is understandable. I hope I wont hit more problems of
+that kind, but it is too early to be sure.
+
+Interim results:
+- no problem if I scan a single slip
+- no problem if I take great care when tiling multiple slips on the
+scanner (worth while since scanning is so slow, and simple because the
+line meant for OCR is at the bottom and has much white on top and
+bottom) - maybe I will make a mechanical contraption to help getting the
+alignment right,
+- overall: looks good, I decided to put in some more time,
+- tesseract clearly provides better results than gocr,
+- selection of parameters (resolution, resizing etc.) is important, but
+what I have (partly result of googling) is close to optimal,
+- parameters for xsane are painful to handle (presently, my .sane
+directory is a link that I switch between configurations for
+straightforward scanning and for slip handling.
+
+I will re-post once I have reached some kind of &quot;interim product&quot; and
+have confidence that it is solid (sorry, for swiss payment slips now -
+but keeping in mind the interest to be extensible - there wont be tons
+of code).
+
+Juergen
+</PRE>
+
+
+
+
+
+
+
+
+
+
+
+<!--endarticle-->
+ <HR>
+ <P><UL>
+ <!--threads-->
+ <LI>Previous message: <A HREF="009192.html">[Mageia-discuss] Reading payment forms with a scanner
+</A></li>
+ <LI>Next message: <A HREF="009200.html">[Mageia-discuss] Reading payment forms with a scanner
+</A></li>
+ <LI> <B>Messages sorted by:</B>
+ <a href="date.html#9193">[ date ]</a>
+ <a href="thread.html#9193">[ thread ]</a>
+ <a href="subject.html#9193">[ subject ]</a>
+ <a href="author.html#9193">[ author ]</a>
+ </LI>
+ </UL>
+
+<hr>
+<a href="https://www.mageia.org/mailman/listinfo/mageia-discuss">More information about the Mageia-discuss
+mailing list</a><br>
+</body></html>