Just to follow up on the discussion of loading data into SQL engines, the use of keys, and the differences between normalized and unnormalized databases... Firstly, normalized v.s. unnormalized. Luke covered the topic quite well. Normalization of data is the act of breaking up data into categorical groups and assigning identifiers to the individual records of each group. Take for example, an address book. Any individual you may wish to place in an address book may have more than one email address, snail mail address, phone number, or URL. For maximum flexibility and maximum reuse of data, you break these elements into their respective tables. First, the "header" table for the individual, which contains very static information, such as their name, birthdate (if you want it), nickname, is created. A unique identifier is assigned to this individual so that you can relate this record to the other tables in the database. You then proceed to make tables for each of the other categories, address, phone, email, etc. Now, when the individual gains a new address, you simply add a record to the address table and associate it with the individual. If you had implemented this address book as one large table that contained all of these fields, what is refered to as a "wide table", you would have to edit the individual's record simply to change the address. There are other advantages to having a normalized database over a "wide" database. Changes to a normalized database have less impact on the performance of the database as a whole. Indexing, maintenance, or data locking can be segregated to a smaller portion of the database, making such activities less intrusive. To change an address in a wide database, you may have to lock the table that others may be trying to access at the same time... problem. That is unless you have more advanced locking mechanisms, such as "row-level" locking, or concurrent versioning. Yes, I know. Lots of buzz words. Just bear with me. Check out the documentation on http://www.postgres.org/ if you want to know what I'm talking about. Anyway, there are disadvantages to having a normalized database, such as the more complex queries you have to write, the greater complexity added to maintaining referential integrity (making sure you didn't delete someone from the database while leaving their address or other data just floating around). Queries may actually take longer, depending upon how many tables you're trying to join together for your query result set. (I want to see all of the addresses, phone numbers, and email addresses for Ben Kochie. I will have to join four tables to see all of this data at once. etc...) There are techniques to making these types of "reports" quicker, such as maintaining a "report table" or using views. (Views are pretty cool... Basically a virtual table.) So the real challenge in writing a database schema is deciding whether normalization is right for the application functionality you're trying to provide. For ultimate flexibility, normalization is the key. For ultimate speed, either wide tables or a combination of normalization and wide "report" tables may be the answer. What does this all have to do with Tim's problem? Not much. He doesn't get to design the database schema, and unless he wants to transform the data he gets into a normalized database (if it isn't already.) Well, I wrote enough for now... Didn't touch on keys and such, but that's a discussion for another day. -- Chad "^chewie, gunnarr" Walstrom <chewie at wookimus.net> http://wookimus.net/chewie -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 233 bytes Desc: not available Url : http://shadowknight.real-time.com/pipermail/tclug-list/attachments/20000625/b00799f9/attachment.pgp -------------- next part -------------- --------------------------------------------------------------------- To unsubscribe, e-mail: tclug-list-unsubscribe at mn-linux.org For additional commands, e-mail: tclug-list-help at mn-linux.org