?

Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Working on the SQL - shadows of echoes of memories of songs — LiveJournal
j4
j4
Working on the SQL
Read 13 | Write
Comments
rysmiel From: rysmiel Date: August 18th, 2008 07:04 pm (UTC) (Link)
I have been doing this for a living for nineteen years OMG how did I get so damned old, and I only don't have a good reference text to point you at because the one I wrote a chapter for last year is taking forever to grind through the presses, so, summary of the summary of the summary:

The point from which you start is, what questions are people going to be asking it most often ? Any design that needs horribly mangled SQL to do things people are doing with any regularity is broken.

If you will have something like a regular web interface, with specific questions people will ask with some regularity, it helps a lot to organise the set of data that match each standard answer format into one table, so that the query that produces the answer you want is simple; helps with both speed and resilience.

If your incoming data is in a different format, regular or no, somewhere in the process there will be a rearrangement step. Insofar as is humanly possible this should be done between one table and another under your control, rather than when your user queries things. That way if something breaks it breaks in a situation under your control.

The number one bad habit I have to train people working with me on this out of is that they are still being taught from textbooks which say "duplicating data BAD" and they want to make complex cross-queries and so forth, which, once your dataset reaches any reasonable size is a net loss. Almost all the time these days, disk space is less of an issue than speed of response; so build the tables that your ussers will query in order to make the questions they will ask simple queries. [ The more control you exercise over what your users can do, the better. Ideally one convinces them that this was their idea. ]

I can probably provide you with something more concrete, actually; expect mail shortly.
From: vatine Date: August 18th, 2008 07:29 pm (UTC) (Link)
I've always interpreted "duplicating data BAD" as meaning that it's too damned easy ending up with inconsistent data sets if you have data duplicated everywhere. If you need to duplicate data to get sufficient speed, by all means do this, but...
rysmiel From: rysmiel Date: August 18th, 2008 07:51 pm (UTC) (Link)
This is what table constraints, integrity checks, and well-defined population procedures are for.
mobbsy From: mobbsy Date: August 18th, 2008 07:55 pm (UTC) (Link)

The number one bad habit I have to train people working with me on this out of is that they are still being taught from textbooks which say "duplicating data BAD" and they want to make complex cross-queries and so forth, which, once your dataset reaches any reasonable size is a net loss. Almost all the time these days, disk space is less of an issue than speed of response; so build the tables that your ussers will query in order to make the questions they will ask simple queries. [ The more control you exercise over what your users can do, the better. Ideally one convinces them that this was their idea. ]


Gak! Normalised database design isn't about saving disk space, it's about data integrity. With data duplicated all across the schema, you're asking for inconsistency. If you need to denormalise for performance, do so in materialized views not in the basic schema.
From: (Anonymous) Date: August 18th, 2008 08:07 pm (UTC) (Link)
Yes! Please listen to Mobbsy here. Database schemas should represent the data and its existing relationships, not be built 'in order to make the questions they will ask simple queries'. You exercise control over users by building them applications (or at the very least procedures and views) so they don't query and manipulate tables directly.
Read 13 | Write