Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
Normalization eliminates the potential for update anomalies and the enforcement of data integrity. But what does that mean?
This link is a step by step guide to normalization.
Que: Step-by-step Guide to Publishing Your Data
As with most IT topics, there are many books available on the subject. This is a good book for beginning database designers:
Beginning Database Design Solutions, by Rod Stephens
-
0:00
After being introduced to the process of database normalization,
-
0:04
you might be wondering why go to all that trouble?
-
0:08
Think about a spreadsheet with millions of rows, maybe sales data.
-
0:12
Each of those rows contain some sort of descriptive value,
-
0:16
maybe it's the product name.
-
0:18
All the storage space used to store the text of those product names over and
-
0:22
over again can really add up.
-
0:25
If you were to condense the list of product names down to a discrete
-
0:29
distinct list and store it separately from the main dataset,
-
0:34
you would drastically reduce the amount of disk storage space required.
-
0:39
Instead of storing the product names over and
-
0:41
over again, you can store the products IDs.
-
0:45
Now, imagine that inefficiency repeated over and over again for
-
0:49
the other columns in the spreadsheet.
-
0:52
Such as the product category, store location and so on.
-
0:55
Normalizing each of these lists of text becomes crucial to
-
0:59
reducing disk space usage.
-
1:01
In the 1970s and 80s when companies were fast investing in the technology to
-
1:06
computerize their business systems, this forward thinking design
-
1:10
approach literally saved millions of dollars in hardware costs.
-
1:15
This storage cost have come down dramatically since then, but
-
1:18
we're storing more and more data than ever before.
-
1:21
So normalization is still a requirement.
-
1:24
Let's think about our original large spreadsheet again.
-
1:28
We see all the products listed out along with the rest of the cells data,
-
1:33
let's say one of the product name's changes.
-
1:36
We need to change it.
-
1:37
Look up all the other places would need to make a change too.
-
1:40
We hope we'd better not miss one.
-
1:42
What happens if someone queries this dataset before
-
1:46
we are completed with our updates?
-
1:48
They could possibly see two different spellings for the same product.
-
1:52
The data is inconsistent which is never good.
-
1:55
In our normalized state, we can just change it in one place,
-
2:00
in the products table.
-
2:02
Changing one row of a thousands is,
-
2:05
as you can imagine, a more optimal database operation.
-
2:09
The problem with poorly structured data that I just described
-
2:13
is called an update anomaly.
-
2:15
This is a huge plus to normalization, eliminating the potential for
-
2:19
update anomalies and the enforcements of data integrity.
-
2:24
It's also more efficient for an applications to make an update to one row
-
2:28
in a table opposed to millions of rows.
-
2:31
In the world of split second application response times
-
2:35
that time difference is critical to provide a positive user experience.
-
2:40
By now, you should be starting to see the importance of good database normalization.
-
2:46
There is a lot of great information online regarding normalization, be sure to check
-
2:50
the teacher's notes for some links and suggested reading to learn more.
You need to sign up for Treehouse in order to download course files.
Sign up