Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Successful and fortunate crime is called virtue. -- Seneca


tech / rec.photo.digital / Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

SubjectAuthor
* Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectabPeter
+* Is there a Windows program to OCR one PDF which is an IMAGE (textknuttle
|`- Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectabPeter
+- Is there a Windows program to OCR one PDF which is an IMAGE (textPaul in Houston TX
+* Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectabStan Brown
|`* Is there a Windows program to OCR one PDF which is an IMAGE (textPaul
| `* Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectStan Brown
|  `* Is there a Windows program to OCR one PDF which is an IMAGE (textPaul
|   `* Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectStan Brown
|    +- Is there a Windows program to OCR one PDF which is an IMAGE (textPaul
|    `- Is there a Windows program to OCR one PDF which is an IMAGE (textNomen Nescio
+- Is there a Windows program to OCR one PDF which is an IMAGE (textmz721
+- Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectabWolfFan
`- Is there a Windows program to OCR one PDF which is an IMAGE (textkelown

1
Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6ff1v$fdr5$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14225&group=rec.photo.digital#14225

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: occassionally-confused@nospam.co.uk (Peter)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Date: Thu, 15 Jun 2023 17:43:12 +0100
Organization: -
Lines: 7
Message-ID: <u6ff1v$fdr5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 16:42:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6d90d765b4cc8ef325624d7bbb4d7fb0";
logging-data="505701"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//UEGl5g/cX0o0P/4f9dNF"
Cancel-Lock: sha1:rKiDJNRblDlpW/NAj+Mye0fm6i8=
X-No-Archive: yes
X-Newsreader: Forte Agent 3.3/32.846
 by: Peter - Thu, 15 Jun 2023 16:43 UTC

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
selectable).

It's about 200 pages but it's not worth buying OCR software for just one
file.

Is there a way to upload the PDF to the net for others to see what it is?

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6ffsb$ff7q$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14226&group=rec.photo.digital#14226

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: keith_nuttle@yahoo.com (knuttle)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Thu, 15 Jun 2023 12:56:41 -0400
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <u6ffsb$ff7q$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 16:56:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ddb16b1c116fc47f637a56ef3f84ff77";
logging-data="507130"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EMRE+CtuW7hsoYQyFmJLb"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:KTQsK6Xl8NAOcNAfXM0+Yhfi4bw=
In-Reply-To: <u6ff1v$fdr5$1@dont-email.me>
Content-Language: en-US
 by: knuttle - Thu, 15 Jun 2023 16:56 UTC

On 06/15/2023 12:43 PM, Peter wrote:
> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages but it's not worth buying OCR software for just one
> file.
>
> Is there a way to upload the PDF to the net for others to see what it is?
I know nothing about it, but you may try

https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=EAIaIQobChMIqrzm8N3F_wIVzmxMCh0WdA7aEAAYASAAEgIT1_D_BwE

In the past, I have you the camera function of the Adobe Reader, pasted
the selection into Irfanview, and use the Irfanview Plugin to OCR the
information.

https://www.irfanview.info/plugins/kadmos/

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6fpkh$go00$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14227&group=rec.photo.digital#14227

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: occassionally-confused@nospam.co.uk (Peter)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Date: Thu, 15 Jun 2023 20:43:45 +0100
Organization: -
Lines: 139
Message-ID: <u6fpkh$go00$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me> <u6ffsb$ff7q$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 19:43:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6d90d765b4cc8ef325624d7bbb4d7fb0";
logging-data="548864"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ZKI4lNSorij3eEt3bU5l8"
Cancel-Lock: sha1:rgsYudzDPMzryZuhwfBHQD7+0J4=
X-Newsreader: Forte Agent 3.3/32.846
X-No-Archive: yes
 by: Peter - Thu, 15 Jun 2023 19:43 UTC

knuttle <keith_nuttle@yahoo.com> wrote:
>> Is there a way to upload the PDF to the net for others to see what it is?
> I know nothing about it, but you may try
>
> https://pdf.wondershare.net/ad/pdf-editor/ocr.html

Thank you for that bitmap PDF to OCR suggestion as it would be valuable for
anyone on these newsgroups to have a free tool that converts PDF bitmaps
into text using Optical Character Recognition or which can convert a
regular PDF into a Microsoft Office document (which wondershare also does).

After spending about an hour on that tool, my suggestion is that it's not
worth installing unless you're willing to create an account & pay for it.

This is the link I downloaded it from today.
https://download.wondershare.net/inst/pdfelement-pro_setup_full5261.exe

This is the installer but it also creates an offline installer file.
Name: pdfelement-pro_setup_full5261.exe
Size: 2119160 bytes (2069 KiB)
SHA256: 394407574DFCDC76744AF69D6FAAB8DCFD4255B2DFDE73234060C6E024CABD77

This is the offline installer file that the installer above created.
Name: pdfelement-pro_64bit_full5261.exe
Size: 156604880 bytes (149 MiB)
SHA256: 61DD463B27792D5EF880A1E5B5C86FB7784A35A9F72D8FECC4DA5B6866A2C956

I deleted the first online installation and tried the offline installer
(with the Ethernet cord unplugged) & it worked the same either way.

Both installers installed Wondershare PDFelement (Version 9.5.1) onto
Windows 10 and both tried to phone home (with your machine ID in it).
https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=<longnumber>

When installing it tries to be the default pdf editor and it tries to add
things to your context menu (but you can uncheck those checked boxes).

After installing you will likely want to go into preferences to turn off
the "autostart" for the Wondershare screenshot tool & batch tools (whatever
they are).

In the preferences you must update but you can change it to every quarter.

The installer put tons of crap in places it just shouldn't be touching.
C:\Users\you\AppData\Local\Temp\Wondershare
C:\Users\you\AppData\Roaming\Wondershare
HKCU\Software\Wondershare
HKLM\Software\PEPrinter
HKLM\Software\Wondershare
HKLM\Software\Wow6432Node\Wondershare

When you try to OCR a bitmap PDF using "PDFElement > Quick Tools > OCR"
it says it requires a 391.76 MB additional download but it will ask to
do that when you first try to turn a bitmapped PDF into text.

But after that additional OCR download, when you try to OCR a bitmap PDF
it stops you right there telling you that you need an account & to pay.
"Trial version only supports previewing OCR effect."

Same with conversion to Microsoft Word (which I was trying to sneak by).
"Trial version only supports PDF-to-Word of 3 pages."

Both require a Wondershare ID which shouldn't be needed just to save files.

It phones home when you uninstall.
https://cbs.wondershare.com/go.php?pid=5261&m=u&product_version=9.5.1&client_sign=<longnumber>

All in all, it looked like a slick application if you wanted to do more
than one or two OCR or PDF-to-MSOFFICE conversions. But I don't.
> In the past, I have you the camera function of the Adobe Reader, pasted
> the selection into Irfanview, and use the Irfanview Plugin to OCR the
> information.
>
> https://www.irfanview.info/plugins/kadmos/

I had Irfanview 64 so I uninstalled that & installed Irfanview 32 first.
https://www.irfanview.com/
https://www.fosshub.com/IrfanView.html?dwl=iview462_setup.exe
Name: iview462_setup.exe
Size: 3259352 bytes (3182 KiB)
SHA256: 37CDB372C4B6053356ECA2C40AA44F4FB8CD30681C28CDA54E80601D6C7B565A

Then I extracted the 32-bit plugins zip file into the Plugins folder.
Name: iview462_plugins.zip
Size: 16823082 bytes (16 MiB)
SHA256: B85B1220E785F094611EB4BDD9DE17252FA023BB604FDF548CB278878E690780

At first the Irfanview "Options > Start OCR Plugin" was grayed out
but I had to open up a PDF file first to get that menu to ungray itself.

Then an F9 said "Irfanview Can't load Plugin 'OCR_KADMOS.DLL'!
Please install or update Plugins from IrfanView homepage
and/or enable the Plugin in 'Help -> Installed Plugins' menu.
http://www.irfanview.info/plugins/kadmos/"

When I went to Irfanview "Help > Installed Plugins", it provided a long
alphabetical list of plugins (most of which were checked already) but there
was nothing starting with the letter "K" in that list).

I needed to add the Kadmos plugin which wasn't part of the default package.
https://www.irfanview.info/plugins/kadmos/
https://www.irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe
Name: setup_kadmos_irfanview_us.exe
Size: 6630790 bytes (6475 KiB)
SHA256: 82253452ED26CEA5F81CC8E13A0A3EA600B4F607D0CA5F3D1D058D97D403236F

That installer knew where to put itself in the Irfanview32 plugins folder.

Back in Irfanview 32-bit with the Kadmos OCR plugin added, I opened a
multi-page PDF and scrolled to a full page of text (since Kadmos tries to
find text inside of images also) & again I ran the "Options > Start OCR
Plugin" menu selection (set to the F9 hotkey) where I set the language to
"English (UK)" (it was the only option) and it highlighted in bright yellow
the entire page of text in a fullscreen additional window.

After a few seconds of wondering what to do next I realized I'm supposed to
click my mouse button on a start point or sweep out the full page, which I
did and which instantly created another popup of the text ready to save.

That popup contained "KADMOS recognition results" which were only for that
one page of text (there was no way I could find to select the entire book).

The "KADMOS recognition results" editing window allowed edit corrections
(which were needed) & then the "File" menu allowed these choices
Write ASCII text to file
Copy ASCII text to the clipboard
Write UNICODE text to file
Copy UNICODE text to the clipboard

I don't offhand know the difference between ASCII & UNICODE so I saved the
ASCII text to a file and opened it up in an editor and it worked fine.

I can't argue that it didn't work well enough, page by page, especially for
a free program and I can't say it wasn't fast enough, nor while it had
errors, was it all that bad in recognizing the text.

And I'm aware you can break a 200 page PDF into 200 separate PDF files.
But is there something else free like that which can OCR a 200 page book?

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6g9kk$iq9l$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14230&group=rec.photo.digital#14230

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Paul@Houston.Texas (Paul in Houston TX)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Thu, 15 Jun 2023 19:16:07 -0500
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <u6g9kk$iq9l$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 00:16:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a05aa9b1d237cac03911fb7277d52a9f";
logging-data="616757"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4PBmKgxvPtkYBAknilqss"
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8
Cancel-Lock: sha1:PuJLuZBnFqSQiZVsow15jZw0q+o=
In-Reply-To: <u6ff1v$fdr5$1@dont-email.me>
 by: Paul in Houston TX - Fri, 16 Jun 2023 00:16 UTC

Peter wrote:
> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages but it's not worth buying OCR software for just one
> file.
>
> Is there a way to upload the PDF to the net for others to see what it is?

I like Free OCR. It works reasonably well. Better than the other 10-15
that I have tried.
http://www.paperfile.net/

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<MPG.3ef568e53d85901e99013d@news.individual.net>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14232&group=rec.photo.digital#14232

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.karotte.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: the_stan_brown@fastmail.fm (Stan Brown)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Date: Thu, 15 Jun 2023 19:29:34 -0700
Organization: Oak Road Systems
Lines: 23
Message-ID: <MPG.3ef568e53d85901e99013d@news.individual.net>
References: <u6ff1v$fdr5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net uzj5Z0YIVzFOmm/7E2Um5wkj2CFHBnvnDIZnLqV3ckJLBB3LQE
Cancel-Lock: sha1:tWf3qdcmx0E9F5Spw6Clf+RW5nM=
User-Agent: MicroPlanet-Gravity/3.0.11 (GRC)
 by: Stan Brown - Fri, 16 Jun 2023 02:29 UTC

On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:
> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages

If it's 200 pages, don't you mean it's 200 images rather than one
image?

But that's a quibble. OneNote, part of the MS Office suite, can OCR
an image, and it does a fairly good job if the image is fairly clear.
Paste the image from clipboard into OneNote, then right-click on it
and select Copy Text from Picture. Then paste the text from clipboard
to whatever program you wish.

If you don't have Office, google for free OCR sites. There are quite
a few, but I've never used one because I use OneNote. Caution: If
what you're OCRing is sensitive, you wouldn't want to upload it to
some possibly sketchy website.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6h7n8$pen9$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14234&group=rec.photo.digital#14234

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mz721@gmx.com (mz721)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Fri, 16 Jun 2023 18:49:45 +1000
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <u6h7n8$pen9$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 08:49:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fd0d4ccb9689c3f7d56f180421726a64";
logging-data="834281"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2lfyh/5AFAAtXt/lTbLLi"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.10.1
Cancel-Lock: sha1:wsuRj/KtMXldV07DgpGAz7LTDvM=
Content-Language: en-AU
In-Reply-To: <u6ff1v$fdr5$1@dont-email.me>
 by: mz721 - Fri, 16 Jun 2023 08:49 UTC

On 16/06/2023 2:43 am, Peter wrote:
> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages but it's not worth buying OCR software for just one
> file.
>
> Is there a way to upload the PDF to the net for others to see what it is?

Yes, and it works extremely well.

Tesseract

I use it via Cygwin. You can, for example, use some utility to convert
your PDF to a series of images (png, ppm...) and then OCR them. I find
it gives pretty good results, but it works best (for me) using simple
scripts on the command line. For example, you can convert to files
page-001.png etc then loop over them with a simple bash script (Cygwin
uses bash as its shell).

There might be other ways to use it. I am not sure it is the easiest
thing to use, but it does a damned good job for a completely free tool.

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6he7j$q95d$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14235&group=rec.photo.digital#14235

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@needed.invalid (Paul)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Fri, 16 Jun 2023 06:40:50 -0400
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <u6he7j$q95d$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
<MPG.3ef568e53d85901e99013d@news.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 10:40:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e9e635255937d9c5b290d92db16de619";
logging-data="861357"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wTgYC/leqpbODYuDyLuBRm1mPjVXjYTU="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:bxXOlNs62bLHIbNv+lICJfX/OvY=
Content-Language: en-US
In-Reply-To: <MPG.3ef568e53d85901e99013d@news.individual.net>
 by: Paul - Fri, 16 Jun 2023 10:40 UTC

On 6/15/2023 10:29 PM, Stan Brown wrote:
> On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:
>> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
>> selectable).
>>
>> It's about 200 pages
>
> If it's 200 pages, don't you mean it's 200 images rather than one
> image?
>
> But that's a quibble. OneNote, part of the MS Office suite, can OCR
> an image, and it does a fairly good job if the image is fairly clear.
> Paste the image from clipboard into OneNote, then right-click on it
> and select Copy Text from Picture. Then paste the text from clipboard
> to whatever program you wish.
>
> If you don't have Office, google for free OCR sites. There are quite
> a few, but I've never used one because I use OneNote. Caution: If
> what you're OCRing is sensitive, you wouldn't want to upload it to
> some possibly sketchy website.
>

When you operate a scan-to-PDF scanner, a 200 page stack of
sheets, the scan head collects that as 200 images, and that
becomes 200 "objects" in the 200 page PDF file.

And these can then be extracted with mutool.exe if you want
to examine the images. For example, the 10 page service manual
I downloaded as a PDF, it had ten images in it that mutool.exe
dumped for me.

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

Paul

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<MPG.3ef61940aded30ee99013f@news.individual.net>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14237&group=rec.photo.digital#14237

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!lilly.ping.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: the_stan_brown@fastmail.fm (Stan Brown)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Date: Fri, 16 Jun 2023 08:01:56 -0700
Organization: Oak Road Systems
Lines: 14
Message-ID: <MPG.3ef61940aded30ee99013f@news.individual.net>
References: <u6ff1v$fdr5$1@dont-email.me> <MPG.3ef568e53d85901e99013d@news.individual.net> <u6he7j$q95d$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net sUEGa+KaLE2QAX9HyBTglAm2fx1m3eoAi0Ivayi3Fl956uerGO
Cancel-Lock: sha1:N7kFCRwHdHbUn78KfCAiNPedduE=
User-Agent: MicroPlanet-Gravity/3.0.11 (GRC)
 by: Stan Brown - Fri, 16 Jun 2023 15:01 UTC

On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:
>
> When you run "overlay OCR" on that 200 page scanner document,
> each page is an OCR run. All the characters in one image are
> "recognized", then PDF lines-of-text in a particular font,
> are added to the PDF code for that page. Each page is handled
> individually.

What do you use to make the OCR overlay?

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
Shikata ga nai...

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6ibm4$th8f$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14238&group=rec.photo.digital#14238

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@needed.invalid (Paul)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Fri, 16 Jun 2023 15:03:30 -0400
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <u6ibm4$th8f$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
<MPG.3ef568e53d85901e99013d@news.individual.net>
<u6he7j$q95d$1@dont-email.me>
<MPG.3ef61940aded30ee99013f@news.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 19:03:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e9e635255937d9c5b290d92db16de619";
logging-data="967951"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18P9kMOr1lx4uuoJ6NyvAmhCXn+Pp+yg6o="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:7ikzHsd68NbjOmx1js3eKsws+54=
In-Reply-To: <MPG.3ef61940aded30ee99013f@news.individual.net>
Content-Language: en-US
 by: Paul - Fri, 16 Jun 2023 19:03 UTC

On 6/16/2023 11:01 AM, Stan Brown wrote:
> On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:
>>
>> When you run "overlay OCR" on that 200 page scanner document,
>> each page is an OCR run. All the characters in one image are
>> "recognized", then PDF lines-of-text in a particular font,
>> are added to the PDF code for that page. Each page is handled
>> individually.
>
> What do you use to make the OCR overlay?
>

Since Linux is more likely to have a current Tesseract, I used
Win10 Bash shell and a Ubuntu distro.

apt search ocrmypdf

sudo apt install ocrmypdf

You don't really need to do this step, but for test purposes,
I just wanted to run it on a single page. I fed it the image from
page 8.

mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
for a PNG input to PDF output:

ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

INFO - Input file is not a PDF, checking if it is an image...
INFO - Input file is an image
INFO - Image seems valid. Try converting to PDF...
INFO - Successfully converted to PDF, processing...
Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
INFO - Using Tesseract OpenMP thread limit 3
OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.00 savings: 0.0%

To do the whole document, you'd likely need less than that, as some
metadata is already inside the PDF. Something like this maybe.

ocrmypdf --output-type pdf input.pdf output.pdf

The output from my Page 8 image, made this standalone PDF. The DPI
declaration, helped it pick a weird page size for the output.

image-0044.pdf

Wiping over that gives text to copy.

I didn't do quality analysis, or refine the command to do a better job.

I should be able to feed it the entire 10 page PDF intact, and
have it output a 10 page PDF with text overlay. Again, not tested.

It's normal for these processes, to not be able to overlay text
exactly on top of the bitmap character underneath. The Adobe OCR
in their paid tool, does do an exact job. Many other "hobby projects",
do not.

For a start, I was just happy to see Tesseract not fall over.

The Adobe tool (in the Acrobat editor in their distiller package),
first does layout analysis. On a three-column magazine layout,
it correctly removes the image content from consideration,
then it OCR-processes each column and precisely lays the text on top.

And has been previously described in this thread, if there is even
a bit of font&text in the document already, the OCR does not like that
and it bails. It expects "pristine" cut-sheet scan images to work on
and no fonts declared in the PDF. In the case of Adobe, it also expects
the scan to be done at 200DPI to 400DPI (based on page size declaration
and such). Many times, I was thwarted in Adobe by a "this image needs
to be between 200DPI and 400DPI" type of message. And then it takes
half the day to arrange a strict diet of noodles for the stupid thing :-)

Paul

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<MPG.3ef6f1812024514c990141@news.individual.net>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14239&group=rec.photo.digital#14239

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.karotte.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: the_stan_brown@fastmail.fm (Stan Brown)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Date: Fri, 16 Jun 2023 23:24:39 -0700
Organization: Oak Road Systems
Lines: 93
Message-ID: <MPG.3ef6f1812024514c990141@news.individual.net>
References: <u6ff1v$fdr5$1@dont-email.me> <MPG.3ef568e53d85901e99013d@news.individual.net> <u6he7j$q95d$1@dont-email.me> <MPG.3ef61940aded30ee99013f@news.individual.net> <u6ibm4$th8f$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net gZyWpW7Tohs8hAF/xk9ZqAN3PO/J3V+T7DA40GaOEygVra9zv/
Cancel-Lock: sha1:lpfYVMpuSJUOM6uo13ScrXdMSCg=
User-Agent: MicroPlanet-Gravity/3.0.11 (GRC)
 by: Stan Brown - Sat, 17 Jun 2023 06:24 UTC

On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:
>
> On 6/16/2023 11:01 AM, Stan Brown wrote:
> > On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:
> >>
> >> When you run "overlay OCR" on that 200 page scanner document,
> >> each page is an OCR run. All the characters in one image are
> >> "recognized", then PDF lines-of-text in a particular font,
> >> are added to the PDF code for that page. Each page is handled
> >> individually.
> >
> > What do you use to make the OCR overlay?
> >
>
> Since Linux is more likely to have a current Tesseract, I used
> Win10 Bash shell and a Ubuntu distro.

Thanks, Paul, for the detailed explanation. One eye-
opener for me was that the Win10 Bash shell can run
actual Linux programs.

> apt search ocrmypdf
>
> sudo apt install ocrmypdf
>
> You don't really need to do this step, but for test purposes,
> I just wanted to run it on a single page. I fed it the image from
> page 8.
>
> mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages
>
> Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
> for a PNG input to PDF output:
>
> ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf
>
> INFO - Input file is not a PDF, checking if it is an image...
> INFO - Input file is an image
> INFO - Image seems valid. Try converting to PDF...
> INFO - Successfully converted to PDF, processing...
> Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
> INFO - Using Tesseract OpenMP thread limit 3
> OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
> JPEGs: 0image [00:00, ?image/s]
> JBIG2: 0item [00:00, ?item/s]
> INFO - Optimize ratio: 1.00 savings: 0.0%
>
> To do the whole document, you'd likely need less than that, as some
> metadata is already inside the PDF. Something like this maybe.
>
> ocrmypdf --output-type pdf input.pdf output.pdf
>
> The output from my Page 8 image, made this standalone PDF. The DPI
> declaration, helped it pick a weird page size for the output.
>
> image-0044.pdf
>
> Wiping over that gives text to copy.
>
> I didn't do quality analysis, or refine the command to do a better job.
>
> I should be able to feed it the entire 10 page PDF intact, and
> have it output a 10 page PDF with text overlay. Again, not tested.
>
> It's normal for these processes, to not be able to overlay text
> exactly on top of the bitmap character underneath. The Adobe OCR
> in their paid tool, does do an exact job. Many other "hobby projects",
> do not.
>
> For a start, I was just happy to see Tesseract not fall over.
>
> The Adobe tool (in the Acrobat editor in their distiller package),
> first does layout analysis. On a three-column magazine layout,
> it correctly removes the image content from consideration,
> then it OCR-processes each column and precisely lays the text on top.
>
> And has been previously described in this thread, if there is even
> a bit of font&text in the document already, the OCR does not like that
> and it bails. It expects "pristine" cut-sheet scan images to work on
> and no fonts declared in the PDF. In the case of Adobe, it also expects
> the scan to be done at 200DPI to 400DPI (based on page size declaration
> and such). Many times, I was thwarted in Adobe by a "this image needs
> to be between 200DPI and 400DPI" type of message. And then it takes
> half the day to arrange a strict diet of noodles for the stupid thing :-)
>
> Paul

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
Shikata ga nai...

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u6k2h1$175o9$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14240&group=rec.photo.digital#14240

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@needed.invalid (Paul)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Sat, 17 Jun 2023 06:39:27 -0400
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <u6k2h1$175o9$1@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
<MPG.3ef568e53d85901e99013d@news.individual.net>
<u6he7j$q95d$1@dont-email.me>
<MPG.3ef61940aded30ee99013f@news.individual.net>
<u6ibm4$th8f$1@dont-email.me>
<MPG.3ef6f1812024514c990141@news.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 17 Jun 2023 10:39:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e5bd9164b75d07ef2738193b436fe3be";
logging-data="1283849"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+T1pZ0/kKQI/g2/nhwIFkHBS3LqzZsabw="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:UOqJcE874TcOZvDGchXWQn5b7rM=
In-Reply-To: <MPG.3ef6f1812024514c990141@news.individual.net>
Content-Language: en-US
 by: Paul - Sat, 17 Jun 2023 10:39 UTC

On 6/17/2023 2:24 AM, Stan Brown wrote:

> Thanks, Paul, for the detailed explanation. One eye-
> opener for me was that the Win10 Bash shell can run
> actual Linux programs.
>
>> apt search ocrmypdf
>>
>> sudo apt install ocrmypdf

If you're lucky, it can even do graphics. As far as
I know, WSLg was released for Win10. And while one
announcement gave the impression that "mere humans
can install this stuff", it turned out that there were
no improvements at all to installation-usability. There
are still "boxes to tick, hair loss". They gave the impression
that "installing from the Microsoft Store... done", no,
not true by a country mile. You will have to get out
your Ouija board and consult with the spirit world, to
figure out what step you missed.

I run Linux Firefox via Bash shell.

The $HOME directory is inside a .vhdx container, and
would be

/home/username

whereas the current working directory points to C: like this

/mnt/c/Users/username/Downloads

and you can interact with your favorite Windows directory
and that is outside of the Linux container as such.

When you want to work on some Linux setting, it might be in

/home/username/.config

So the Linuxy stuff is stored away from your C: stuff and
you would not find the cache2 Firefox folder mixed in with your
regular Windows.

When you're finished with it, you type "exit" to exit the Bash
shell. Then "wsl --shutdown" and that stops the Linux kernel
and closes the container up.

The Linux program windows that open, are rootless and the technology
in the last hop is something like Terminal Server. The shading
around the edge of the windows is "suspect" and resizing a window
can be annoying to very annoying at times. And that's a measure of
just how many "layers" the graphics have gone through.

[Picture]

https://i.postimg.cc/QdYs5W49/bash-shell-WSLg.gif

Paul

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<f7c023f40eca5a7c0a2e581c79bed43e@dizum.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14241&group=rec.photo.digital#14241

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
From: nobody@dizum.com (Nomen Nescio)
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
References: <u6ff1v$fdr5$1@dont-email.me>
<MPG.3ef568e53d85901e99013d@news.individual.net>
<u6he7j$q95d$1@dont-email.me>
<MPG.3ef61940aded30ee99013f@news.individual.net>
<u6ibm4$th8f$1@dont-email.me>
<MPG.3ef6f1812024514c990141@news.individual.net>
Message-ID: <f7c023f40eca5a7c0a2e581c79bed43e@dizum.com>
Date: Sun, 18 Jun 2023 05:19:56 +0200 (CEST)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!sewer!news.dizum.net!not-for-mail
Organization: dizum.com - The Internet Problem Provider
X-Abuse: abuse@dizum.com
Injection-Info: sewer.dizum.com - 2001::1/128
 by: Nomen Nescio - Sun, 18 Jun 2023 03:19 UTC

In article <MPG.3ef6f1812024514c990141@news.individual.net>
Stan Brown <the_stan_brown@fastmail.fm> wrote:
>
> On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:
> >
> > On 6/16/2023 11:01 AM, Stan Brown wrote:
> > > On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:
> > >>
> > >> When you run "overlay OCR" on that 200 page scanner document,
> > >> each page is an OCR run. All the characters in one image are
> > >> "recognized", then PDF lines-of-text in a particular font,
> > >> are added to the PDF code for that page. Each page is handled
> > >> individually.
> > >
> > > What do you use to make the OCR overlay?
> > >
> >
> > Since Linux is more likely to have a current Tesseract, I used
> > Win10 Bash shell and a Ubuntu distro.
>
> Thanks, Paul, for the detailed explanation. One eye-
> opener for me was that the Win10 Bash shell can run
> actual Linux programs.
>
> > apt search ocrmypdf
> >
> > sudo apt install ocrmypdf
> >
> > You don't really need to do this step, but for test purposes,
> > I just wanted to run it on a single page. I fed it the image from
> > page 8.
> >
> > mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages
> >
> > Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
> > for a PNG input to PDF output:
> >
> > ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf
> >
> > INFO - Input file is not a PDF, checking if it is an image...
> > INFO - Input file is an image
> > INFO - Image seems valid. Try converting to PDF...
> > INFO - Successfully converted to PDF, processing...
> > Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
> > INFO - Using Tesseract OpenMP thread limit 3
> > OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
> > JPEGs: 0image [00:00, ?image/s]
> > JBIG2: 0item [00:00, ?item/s]
> > INFO - Optimize ratio: 1.00 savings: 0.0%
> >
> > To do the whole document, you'd likely need less than that, as some
> > metadata is already inside the PDF. Something like this maybe.
> >
> > ocrmypdf --output-type pdf input.pdf output.pdf
> >
> > The output from my Page 8 image, made this standalone PDF. The DPI
> > declaration, helped it pick a weird page size for the output.
> >
> > image-0044.pdf
> >
> > Wiping over that gives text to copy.
> >
> > I didn't do quality analysis, or refine the command to do a better job.
> >
> > I should be able to feed it the entire 10 page PDF intact, and
> > have it output a 10 page PDF with text overlay. Again, not tested.
> >
> > It's normal for these processes, to not be able to overlay text
> > exactly on top of the bitmap character underneath. The Adobe OCR
> > in their paid tool, does do an exact job. Many other "hobby projects",
> > do not.
> >
> > For a start, I was just happy to see Tesseract not fall over.
> >
> > The Adobe tool (in the Acrobat editor in their distiller package),
> > first does layout analysis. On a three-column magazine layout,
> > it correctly removes the image content from consideration,
> > then it OCR-processes each column and precisely lays the text on top.
> >
> > And has been previously described in this thread, if there is even
> > a bit of font&text in the document already, the OCR does not like that
> > and it bails. It expects "pristine" cut-sheet scan images to work on
> > and no fonts declared in the PDF. In the case of Adobe, it also expects
> > the scan to be done at 200DPI to 400DPI (based on page size declaration
> > and such). Many times, I was thwarted in Adobe by a "this image needs
> > to be between 200DPI and 400DPI" type of message. And then it takes
> > half the day to arrange a strict diet of noodles for the stupid thing :-)
> >
> > Paul
>
>
>
> --
> Stan Brown, Tehachapi, California, USA
> https://BrownMath.com/
> Shikata ga nai...

All that crap to watch a 'restricted' video?

Phising guys love users like the OP. He keeps them in business.

Stay away from Google and their like. They're smarter than you are.

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<0001HW.2A3FCAA601218A3370000288C38F@news.supernews.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14242&group=rec.photo.digital#14242

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 18 Jun 2023 23:28:38 +0000
Date: Sun, 18 Jun 2023 19:28:38 -0400
From: akwolffan@zoho.com (WolfFan)
Organization: the pack
Mime-Version: 1.0
User-Agent: Hogwasher/5.24
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Message-ID: <0001HW.2A3FCAA601218A3370000288C38F@news.supernews.com>
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)
Newsgroups: rec.photo.digital, comp.text.pdf, alt.comp.os.windows-10
Reply-To: akwolffan@zoho.com
References: <u6ff1v$fdr5$1@dont-email.me>
Lines: 26
X-Trace: sv3-cUs3yT8vbAs0ACW/0ZxXUi99YWgOIGJwMLZbO0nBQrPyUeRQjHhx1dZWjAUO9aCe+efPB0LxRLXQVJo!OFWvq9+uOzwY8IBeqbES0YIiGPaiDT4N4HhhMjZykfNsvWU99C9pGFhS1hfc2qulYfG8O2bBTPSB!bXi9VOtVQWcAH76LefizbTi4
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: WolfFan - Sun, 18 Jun 2023 23:28 UTC

On Jun 15, 2023, Peter wrote
(in article <u6ff1v$fdr5$1@dont-email.me>):

> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages but it's not worth buying OCR software for just one
> file.
>
> Is there a way to upload the PDF to the net for others to see what it is?

1. go to your fac site which can give you a free email address

2. get one

3. go to Adobe’s site, look for Acrobat Reader, download the free trial of
the full Acrobat

4. use the free email address to sign up

5. fire up Acrobat, do your OCR

6. delete Acrobat and kill the free email address.

Adobe deserves to get thumped. Thump them.

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable)

<u7a4jv$ire7$2@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=14263&group=rec.photo.digital#14263

  copy link   Newsgroups: rec.photo.digital comp.text.pdf alt.comp.os.windows-10
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: kelown@privacy.invalid (kelown)
Newsgroups: rec.photo.digital,comp.text.pdf,alt.comp.os.windows-10
Subject: Re: Is there a Windows program to OCR one PDF which is an IMAGE (text
isn't selectable)
Date: Sun, 25 Jun 2023 14:30:06 -0500
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <u7a4jv$ire7$2@dont-email.me>
References: <u6ff1v$fdr5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 25 Jun 2023 19:30:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="60cae3afccdf76fa3f8de7262ac3a038";
logging-data="617927"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+xECnROGp8oUCd1VdoKtyv"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:5.0) Aura/20220509
Interlink/52.9.8165
Cancel-Lock: sha1:go+fSB17NB3TJAF337Zjn4Zzds0=
Content-Language: en-US
In-Reply-To: <u6ff1v$fdr5$1@dont-email.me>
 by: kelown - Sun, 25 Jun 2023 19:30 UTC

> Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
> selectable).
>
> It's about 200 pages but it's not worth buying OCR software for just one
> file.

PDF-XChange Editor Portable (free)
https://portableapps.com/apps/office/pdf-xchange-editor-portable

Convert -> OCR Page(s) -> All

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor