Page 1 of 1

Oddball text question

Posted: Sun Feb 18, 2024 11:33 am
by doobes
So for reasons I cannot explain, I've wound up with paper copies of some VBA code that I want to use.

I've gone through a scanning process and corresponding OCR, (all on a Linux box, BTW) which works pretty well....

but, and there is always a but;

Some of the text is obviously OCR'ed in a non-standard text encoding.

I've imported it into the VBA IDE and the odd line simply will not run.

If I type the exact same line, character by character, it will run, thus the encoding conclusion.

Any thoughts as to a process to run through to ensure the text encoding is the one that the VBA IDE wants to see?

I tried changing the encoding to ANSI in Notepad++ but that had no effect.

Thanks for any thoughts.

Re: Oddball text question

Posted: Sun Feb 18, 2024 11:40 am
by gupta9665
I have seen issues with the double quotes using similar process, so I type them and then all is good. So, check if this is the case with your line.

Re: Oddball text question

Posted: Sun Feb 18, 2024 1:13 pm
by ryan-feeley
I expect @gupta9665 is on to something with the quotations.

Be careful on ANSI (8 bits) v ASCII (7 bits). They overlap on the first 128 characters, but ANSI has another 100 or so odd-balls. These include a few variations on quotation marks and other stuff that isn't on a typical keyboard. I could easily see your scanner picking something obscure. I'd try again with ASCII encoding and see if that works.

Re: Oddball text question

Posted: Mon Feb 19, 2024 8:16 am
by JSculley
What OCR software are you using on the Linux machine?

Also, can you upload one of the misbehaving files?

Re: Oddball text question

Posted: Mon Feb 19, 2024 10:54 am
by doobes
ryan-feeley wrote: Sun Feb 18, 2024 1:13 pm I expect @gupta9665 is on to something with the quotations.

Be careful on ANSI (8 bits) v ASCII (7 bits). They overlap on the first 128 characters, but ANSI has another 100 or so odd-balls. These include a few variations on quotation marks and other stuff that isn't on a typical keyboard. I could easily see your scanner picking something obscure. I'd try again with ASCII encoding and see if that works.
Bingo

I'm using OCRFeeder - https://wiki.gnome.org/Apps/OCRFeeder

Switching to ANSI via Notepad++ converted some of the quotation marks into weird characters.

Thank you!