on “wrapping the pdf interface properly for non interactive data”

Hi there,

theres me again, we are after the solution on,

how to “unpaper” the official letters. (hp-scan –adf /OCRmyPDF.sh)

This goes into a Tutorial once.

– we first do I from bash. Following a tutorial

– then we trigger it from tryton module, using proteus, out of gnu-health


Zur Nutzung werden folgende Pakete benötigt[1]

imagemagick # umwandeln?
parallel # Was machen die?
poppler-utils # Was machen die?
pdftk # Was besseres gibts nicht?
unpaper #aha?
tesseract-ocr (sowie gewünschte Sprachpakete)
python-reportlab #aha?
python-lxml #aha?
python-imaging #aha?


apt-get install imagemagick parallel poppler-utils pdftk unpaper tesseract-ocr python-reportlab python-lxml python-imaging ghostscript

The following NEW packages will be installed:
liblept4 libtesseract3 parallel tesseract-ocr tesseract-ocr-deu
tesseract-ocr-eng tesseract-ocr-equ tesseract-ocr-osd unpaper

und now a test object:


hp-scan –adf

./OCRmyPDF.sh -f -l deu /home/kubuntu/hpscan002.pdf /home/kubuntu/hpscan2.pdf

pdftotext /home/kubuntu/hpscan2.pdf -|grep e|wc



edit: okular seems to be a better choice

du -sh /home/kubuntu/hpscan2.pdf
468K    /home/kubuntu/hpscan2.pdf

du -sh /home/kubuntu/hpscan002.pdf
2,7M    /home/kubuntu/hpscan002.pdf

# aha it make the pdf smarter.


pdftotext /home/kubuntu/hpscan2.pdf -|grep e|wc

This seems plausible. Hurray. Hello World.


Finding: -I could no intervene to pause him, or to add somewhats

– no progression indicator

on Question:

-What are cool testing suits?

pdfinfo /home/kubuntu/hpscan002.pdf >test002

diff test test002

< Producer:       GPL Ghostscript 9.10
< CreationDate:   Tue Jul  7 20:56:20 2015
< ModDate:        Tue Jul  7 20:56:20 2015

> Producer:       ReportLab PDF Library – http://www.reportlab.com
> CreationDate:   Tue Jul  7 20:41:25 2015
< Page size:      612 x 1008 pts

> Page size:      611.961 x 1007.94 pts
< File size:      476939 bytes

> File size:      2797505 bytes
< PDF version:    1.4

> PDF version:    1.3

pdffonts /home/kubuntu/hpscan002.pdf
name type encoding emb sub uni object ID
———————————— —————– —————- — — — ———
Helvetica Type 1 WinAnsi no no no 2 0
kubuntu@kubuntu:/opt/OCRmyPDF-2.2-stable$ pdffonts /home/kubuntu/hpscan2.pdf
name                                 type              encoding         emb sub uni object ID
———————————— —————– —————- — — — ———
WKJADM+Helvetica                     Type 1C           WinAnsi          yes yes no      10  0

cromium-browser file:///home/kubuntu/hpscan2.pdf # seems ok


http://stackoverflow.com/questions/11529715/how-do-i-find-the-orientation-of-a-pdf-using-php-or-a-linux-script # command lines for pdfinfo and identify


jhove /home/kubuntu/hpscan2.pdf |less


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s