Omnis Datafile to PostgreSQL
Lou Picciano
loupicciano at comcast.net
Fri Dec 17 05:31:25 EST 2010
Hi Nick!
I really wanted to weigh in on this a bit sooner, so glad you've got it handled. Your findings hit on exactly the point I wanted to make about statement preparation (below).
But first, as David, Alex, (others?) have suggested, the PG COPY command is by far the winner in terms of speed. We use it commonly, and preferentially, on datasets in the hundreds of thousands of records.
I'd make a couple of points:
The Good news: COPY handles everything as one transaction.
The Bad News: COPY handles everything as one transaction.
This point can bite the intrepid Omnis Developer right in his dot notation, as it were, as you'd get only a single message in case of even one erroneously-formatted character (we fiddle a lot with Unicode, so you can imagine...)
On the glass-half-full side of that: The COPY command becomes a teeny data verification process for you (!)
Another point on COPY: Once you start handling very large import files, you will start learning more than you may care to about the various memory-intensive options of PG. The PG Docs will become your friend!
Last - VERY important: Friends don't let friends index while importing...
Import data _without_ indexes. Build the indexes after the fact. The impact of this will be even more pronounced in a sequential step-through-list approach from Omnis (see below).
(as someone has already mentioned: in a production setting, you will benefit from chopping up your import source files into bite-sized morsels. On the other hand, we've been having some 'fun With PostgreSQL' lately, in testing really large imports; we are intentionally pushing limits of our environments to stay facile with all the ins and outs. Try it! We've been impressed at how rock-solid PostgreSQL is.)
If you are going to use a step-though approach from the Omnis List, though:
(Ostensibly) in order to keep Omnis' list functions neatly compartmentalized, safe and 'scope'-able, the default functions from the list - eg, the $insert() on a row variable - execute the full, normal sequence of code, which includes, in PG terms, three steps: PREPARE, INSERT and DEALLOCATE. The INSERT - the one you ultimately care most about, after all - is typically the fastest of the three steps. This all makes sense, in terms of tight control of the environment, if you are working in a 'traditional' user interface environment, with relatively small lists.
However, the PREPARE and DEALLOCATE steps are very cycle-and-memory-expensive, and the $insert() on row does these for _each_ row...! On the other hand, for something as simple as a straight-up data import, you may not need all that 'extra sauce' in there.
In pseudocode:
Step through List
PREPARE
INSERT
DEALLOCATE
Until End of List
Cannot begin to compete against:
PREPARE
Step through (List?) DataRecords?
INSERT
Until End of (List?) DataRecords?
DEALLOCATE
Your finding: 'It turns out the INSERT statement with Bind Variables is much faster than using the standard Omnis $insert() on a Row variable.' is a spot-on illustration of above. And COPY is faster still.
You start to pay this processing cost as soon as you use the list's 'native' functions for writing data. These costs are of much less consequence in the typical production setting, but your experiment has multiplied the impact of, and brought into sharp focus, the list functions' 'fixed costs', if you will.
If your individual record inserts also require an insertion/build of the corresponding SEQUENCES, these costs are only multiplied. There's a lot of memory handling, hash table building, etc. going on in there for each insert...
Along these lines: We're really encouraging our Cloud collaborators to 'get friendly' with PG Prepared Statements, for use in virtually any re-used query. Will you allow me a brief digression?
Consider the following (silly example, but it makes the point):
At the open of your application's 'people' module:
PREPARE STATEMENT NicksFavoriteStatement AS SELECT lname,fname FROM people WHERE (lname=$1 and country=$2);
During the session, repeated (n times)?:
EXECUTE NicksFavoriteStatement (variable1, variable2);
All Done? DEALLOCATE (to recover server resources, if necessary). Or, since Prepared Statements are session-specific, let your log off deallocate for you.
Why bother with all this?
Simply: Because Prepared Statements, used repeatedly within a session, will get faster and faster - especially for more-complex queries. (You won't see a great advantage using a simple query such as in the example above).
How so? The PG Query Planner is tightly integrated with the server's Statistics Collector. As soon as a query is prepared, the planner starts optimizing its use, based on these statistics, with _each_ call. IE, for complex queries, PG will have a mechanism for actually modifying the execution of your query.
IMPORTANT: Omnis' 'default' behavior in the list completely abrogates this advantage, as soon as it DEALLOCATEs!
Couple above with the EXPLAIN ANALYZE features of PG, in optimizing your queries in the first place, and you've got some powerful tools at your disposal.
>From this concept also flows lots of discussion of offloading lots of other functionality to much-faster, server-side processing (Topic for another day...) But, think: PG 'windows', server-side cursors, memory-resident temporary tables, etc.
Regards, Lou Picciano
----- Original Message -----
From: "Nick Renders" <omnis1 at arcict.com>
To: "OmnisDev List - English" <omnisdev-en at lists.omnis-dev.com>
Sent: Friday, December 10, 2010 10:02:02 AM
Subject: Omnis Datafile to PostgreSQL
Hi List,
I was wondering if anyone had some tips about exporting data from an
Omnis Datafile into a PostgreSQL database?
I used drag&drop in the Omnis Studio SQL browser for most tables, but
there is one table with over 3 million records.
After 3 days, the transfer was still only at 20%. I am hoping there
is a faster way.
Regards,
Nick Renders
ARC - your ICT service partner
H. D. Saviolaan 8
1700 Dilbeek
T: (00 32) (0)2 466 50 00
F: (00 32) (0)2 466 88 33
http://www.arcict.com
_____________________________________________________________
Manage your list subscriptions at http://lists.omnis-dev.com
More information about the omnisdev-en
mailing list