NO: Saturday night and I am curious......
Stephen Miller
stephenmiller1958 at gmail.com
Mon Mar 13 04:41:03 UTC 2023
Thanks for the feedback.
The reason I want to do this is twofold.
Firstly I want to analyse all my records with my bank for the complete
history of my relationship with the bank. One account goes back to 1993!
and the other three commenced in 2001! So I have a lot of pdfs to deal
with. (More than 1500!!!!)
Essentially, like a lot of people I guess, there are a lot of inter account
transfers as well as the usual cheques (checks) in the earlier years,
monthly periodical payments etc.
In other words it will be great personally and maybe of some value
historically. (I don't imagine this has ever been done before).
Thanks to accidentally being with a great bank, the St George, I am able to
access online the pdfs back to 2012. These load fine into Acrobat Pro or
Nitro and the OCR is perfect and saving as Excel compatible does a great
job of identifying the transaction table in the statements. So extracting
the transaction lines for these is simple to load into a database,
The OCR issues became problematic for the earlier years, here remember we
are talking about doing these in bulk, not one page at a time on a home
scanner etc.
That led to my research online and I discovered two amazing products.
The first is an Chinese Australian guy with a small startup who is living
in Hong Kong and is doing this for a living. His OCR's and coding are
written for bank statements and are very accurate. The other is a
Californian based company which is using AI machine learning and AI with a
user controllable environment to teach the system how to interpret a bank
statement, or any other document for that matter.
So I am excited to get some hands-on experience with real AI systems but in
both cases I am not interested in my personal details going to India in one
case and Hong Kong in the other.
So that led me to the obfuscation problem and the possibility of a software
product as a result as I would not be the only one with these concerns.
So into the rabbit hole.
On the internet there is much complaint about Adobe not having a global
search and replace. Their answer is correct in my view, that pdf is a
global document interchange format and they are not pretending to be MS
Word which is commercially a very wise decision.
So to test these two on-line products I will need to create a list of items
to be obfuscated then to a bulk convert to Excel (xml) using Nitro then use
TextEdit Pro to bulk search and replace, then write an Omnis app to exactly
reproduce the Statements and then print as Pdfs and then test these online
systems with those. (Of course if I just wanted the data I could just load
and insert from the Xml).
Still the little bit of extra work seems worth it test these onl;ine
systems, especially the meaty machine learning AI one;
Any more suggestions gratelully received.
On Sat, 11 Mar 2023 at 22:46, Mike Matthews - Omnis via omnisdev-en <
omnisdev-en at lists.omnis-dev.com> wrote:
> Interesting. Does Acrobat Pro help with maybe a scripting dictionary?
>
> Some good prices on TVs right now, clearing out Christmas stock :)
>
> Mike Matthews
>
> Lineal Software Solutions
> Commercial House, The Strand<x-apple-data-detectors://1/1> Barnstaple,
> Devon, EX31 1EU<x-apple-data-detectors://1/1>
>
> omnis at lineal.co.uk<mailto:mike.matthews at lineal.co.uk>
>
> www.lineal.co.uk<http://www.lineal.co.uk/>
>
> www.sqlworks.co.uk<http://www.sqlworks.co/>
>
>
>
> On 11 Mar 2023, at 08:22, Stephen Miller <stephenmiller1958 at gmail.com
> <mailto:stephenmiller1958 at gmail.com>> wrote:
>
> Caution: This is a message which has originated from outside the
> organisation. Ensure the sender is trusted and the content is safe before
> opening links or attachments.
>
>
>
> Hi All
>
> My challenge for this Saturday night is the following...
>
> I have a whole lot, 100's, of pdfs that all use Times New Roman.
>
> It appears that the pdf standard does store a ASCII value for the letter
> "A" or a string using the same font and size like "ABC" it stores a pointer
> to the specific Winansi Font Table position for this character.
>
> Now I know why Adobe only lets you replace one at a time, no replace all,
> as it depends on the font.
>
> Now in my case they are all Times New Roman so that hopefully, as I know
> the size and style of the Font from looking in edit mode in Acrobat Pro, I
> should be able to use a Hex Editor such as Neo to do a global find and
> replace???
>
> For those that have some idea what I am talking about I think the wikipedia
> page on "Windows-1252", and the table of that page is useful correct?
>
> Please note all the documents are the same, bank statements, but there are
> hundreds of them and I want to obfuscate the identity data of the account
> holder and all accounts this person has transactions with?
>
> Possible or should I take up drinking again and buy a televison?
>
> Kind Regards,
>
> Stephen Miller
>
> 0455461581
> _____________________________________________________________
> Manage your list subscriptions at
> https://linkprotect.cudasvc.com/url?a=https%3a%2f%2flists.omnis-dev.com&c=E,1,Hp91LLgNl88u3Fccn-QBczyP7dzYRhrejHdCNvJtZCTjf85x9_LjsUP-iZa7XxBvQF8UTty33dqpnw5kacMqFBn2jtJqHOzJfrClzQ-nsT8liHX4iEbl&typo=1
> Start a new message -> mailto:omnisdev-en at lists.omnis-dev.com
>
> _____________________________________________________________
> Manage your list subscriptions at https://lists.omnis-dev.com
> Start a new message -> mailto:omnisdev-en at lists.omnis-dev.com
>
More information about the omnisdev-en
mailing list