Quantcast

DBAs - normalized vs relational DBs

jacksonpt

Turbo Monkey
Jul 22, 2002
6,791
59
Vestal, NY
I'm not really a DBA, but I do play one at work sometimes (I'm the only IT guy in my department, so I wear a lot of hats). I'm in the midst of revamping some of our apps, and with that I'm re-evaluating some things.

To this point we've used a flat or normalized DB, but I'm wondering if a relational DB might be more effective, especially in managing duplicate records.

Can someone give me some pros/cons of relational DBs vs normalized DBs?

I can give you some background info on the application if that would help.
 

binary visions

The voice of reason
Jun 13, 2002
22,149
1,250
NC
I'm wondering if a relational DB might be more effective, especially in managing duplicate records.
Hmm? The point of a normalized database is that you never have to worry about duplicate records because duplicate data is not stored in multiple places. A relational database makes it harder to manage duplicate records, doesn't it? Am I not understanding what you're saying?

Relational databases can be faster since not everything has to be looked up in a hundred different tables, but the actual speed increase depends primarily on how normalized your database is (I don't think I've ever seen a database that was - what do they call it, level 5 normalization? Most databases have some redundant data), and also the overall speed of your database. The danger in a relational database is that people need to be very careful in updating information in the database, ensuring there are no conflicting records created. If there are a select few people changing the tables, then sometimes this can be effective because those people can be disciplined in how they modify data.

Relational databases also consume more drive space, but drive space is cheap.

Why are you considering this move? What advantages do you believe it will bring you? Or, conversely, what limits are you running into now using a normalized database?

I'm not a DBA either, but I did stay at a Holiday Inn Express last night.
 

jacksonpt

Turbo Monkey
Jul 22, 2002
6,791
59
Vestal, NY
OK, here's some background info...

I work at a college in the admissions offer. We have daily campus tours, and specialized open house programs in the spring and fall. Students can register for these events on-line. Because of security concerns, we do not give students the ability to update their registration if they need to change the number of people coming, update their e-mail address, etc. As such, many students simply reregister. We are not going to change this policy, so duplicate data is going to be an issue regardless.

We bascially keep 3 "types" of data:
  • personal data - name, phone number, address, etc.
  • event data - date of visit, number of people coming, etc
  • processing data - all kinds of stuff that we need for internal use

Just to clerify, when I talk about managing duplicate records, I mean manually managing them, as in I query the db, review the apparent duplicates, then delete as necessary.

As it stands, all this data (54 fields, aproaching 10,000 records) is stored in one table in one database. I was thinking that if I went with 3 tables (person, event, processing) based on the 3 types of data we collect, it would keep data better organized, and thus easier to manually review/manage.

Though I could be very very wrong in that thought...
 

binary visions

The voice of reason
Jun 13, 2002
22,149
1,250
NC
Hm. I think one of us is confusing our terms :)

You have a flat database, but it doesn't sound like it's normalized. Normalization actually implies splitting data into multiple tables to avoid duplication. As it is, I'm understanding that you have something like this:


So if John Doe comes to visit twice or changes his data, his name and phone number and any other fixed data is stored twice. That is not a normalized database, since a normalized database would pull John Doe into a separate table with a phone number, address and any other fixed data associated with him.

That's my understanding anyway.

I see your problem and from my database time I would think that if you are consistently querying the database to manually delete duplicate entries, it seems to me that a flat database like you have would be the absolute easiest way to do it. You only have to delete records out of one table. If you split it up, you'll have to delete John Doe out of the personal data table, delete his event out of the event table...
 

jacksonpt

Turbo Monkey
Jul 22, 2002
6,791
59
Vestal, NY
yes, it does sound like the terms are confusing us.

But yes, your example is correct. If the DB were relational, I would have one entry in the person table for John Doe (that contained his name/address/phone, for example), then multiple entries in the event table for all of his registrations. I wouldn't want to delete John Doe from the person table, only the duplicates from the event table. So when managing duplicates, I wouldn't even have to query the person table, only the event table. When viewing event table records, it doesn't matter who in the person table they are associated with, only that they are associated with the same person.

The kicker in all of this, which I didn't think of until just now, is that all the fields in my db are fixed length (it's a technical requirement, don't ask), so tables get big in a hurry due to a lot of extraneous space being stored. Thus, queries take a long time. I'm thinking/hoping that breaking up the data into 3 tables will make for faster queries as only 1/3 of the data is being queried at a time.
 

binary visions

The voice of reason
Jun 13, 2002
22,149
1,250
NC
Ahh, okay I gotcha.

You are actually trying to make your database more normalized by adding the relations to it. The "normalized vs relational" thing threw me.

http://en.wikipedia.org/wiki/Database_normalization

What you're suggesting makes a lot of sense. I tended to take the normalization thing to an extreme when I was designing tables, moreso than I should (compulsive tendencies), but most databases should have a certain amount of normalization.