Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Now that we know how to generally define an algorithm let's talk about what it means to have a good algorithm
Glossary
- Time Complexity - A measure of the running time of an algorithm
- Space Complexity - A measure of the storage taken up by an algorithm
Now that we know how to
generally define an algorithm,
0:00
let's talk about what it means
to have a good algorithm.
0:02
An important thing to keep in mind is
that there's no one single way to measure
0:05
whether an algorithm is the right
solution because it is all about context.
0:10
Earlier we touched on two concepts,
correctness and efficiency.
0:15
Let's define correctness more clearly,
because before we can evaluate
0:20
an algorithm on efficiency,
we need to ensure its correctness.
0:24
Before we define our algorithms,
we start by defining our problem.
0:28
In the definition of that problem
we have a clearly defined input
0:32
satisfying any pre-conditions and
a clearly defined output.
0:37
An algorithm is deemed correct if
on every run of the algorithm,
0:41
against all possible values in the input
data, we always get the output we expect.
0:45
Part of correctness means that for
any possible input,
0:51
the algorithm should always terminate,
or end.
0:55
If these two are not true,
then our algorithm isn't correct.
0:58
If you were to pick up
an algorithm's textbook and
1:03
look up correctness, you will run
into a bunch of mathematical theory.
1:06
This is because traditionally algorithm
correctness is proved by mathematical
1:10
induction, which is a form of reasoning
used in mathematics to verify
1:15
that a statement is correct.
1:20
This approach involves writing
what it called a specification and
1:22
a correctness proof.
1:26
We won't be going into
that in this course.
1:27
Proof through induction is an important
part of designing algorithms.
1:30
But we're confident that you can
understand algorithms both in terms of how
1:33
and when to use them,
without getting into the math.
1:38
So if you pick up a textbook and
feel daunted, don't worry, I do, too.
1:41
But we can still figure
things out without it.
1:45
All right, so
once we have a correct algorithm,
1:47
we can start to talk about how
efficient an algorithm is.
1:50
Remember that this efficiency ultimately
matters because they help us solve
1:54
problems faster and
1:59
deliver a better end-user
experience in a variety of fields.
2:00
For example, algorithms are used
in the sequencing of DNA.
2:04
And more efficient sequencing
algorithms allow us to research and
2:08
understand diseases better and faster.
2:13
But let's not get ahead of ourselves.
2:15
We'll start simple,
2:17
by evaluating John's linear search
algorithm in terms of its efficiency.
2:19
First, what do we mean by efficiency?
2:24
There are two measures of efficiency when
it comes to algorithms, time and space.
2:26
Sounds really cool and very sci-fi?
2:31
Efficiency, measured by time, something
you'll hear called time complexity,
2:34
is a measure of how long it
takes the algorithm to run.
2:39
Time complexity can be understood
generally outside the context of
2:43
coding computers,
2:47
because how long it take to complete a job
is a universal measure of efficiency.
2:48
The less time you take,
the more efficient you are.
2:52
The second measure of efficiency
is called space complexity, and
2:55
this is pretty computer-specific.
2:59
It deals with the amount of
memory taken up on the computer.
3:01
Good algorithms need to balance between
these two measures to be useful.
3:05
For example, you can have a blazingly
fast algorithm, but it might not
3:10
matter if the algorithm consumes
more memory than you have available.
3:14
Both of these concepts,
time and space complexity,
3:18
are measured using the same metric.
3:21
But it is a very technical
sounding metric, so
3:24
let's build up to it slowly and
start simple.
3:27
A few videos ago,
I played a game with Britney and
3:30
John where they tried to guess
the number I was thinking of.
3:32
Effectively, they were searching for
a value.
3:36
So how do we figure out how
efficient each algorithm is and
3:38
which algorithm was more
suited to our purposes?
3:42
If we consider the number of tries
they took to guess or search for
3:46
the value as an indicator of the time
they take to run through the exercise,
3:50
this is a good indicator of how long the
algorithm runs for a given set of values.
3:55
This measurement is called
the running time of an algorithm and
4:01
we'll use it to define time complexity.
4:04
In the game we played four rounds.
4:06
Let's recap those here,
focusing on John's performance.
4:08
In round 1, we had 10 values,
the target was 3, and John took 3 turns.
4:12
In round 2, we had 10 values,
the target was 10, and John took 10 turns.
4:18
In round 3, we had 100 values,
the target was 5, John took 5 tries.
4:22
And finally in round 4,
when the target was 100,
4:27
given 100 values, John took 100 tries.
4:31
On paper, it's hard to gauge
anything about this performance.
4:34
When it comes to anything with numbers
though, I like to put it up on a graph and
4:38
compare visually.
4:42
On the vertical, or Y axis, let's measure
the number of tries it took John to
4:43
guess the answer, or
the running time of the algorithm.
4:48
On the horizontal, or
X axis, what do we put?
4:51
For each turn we have a number of
values as well as a target value.
4:55
We could plot the target value
on the horizontal axis, but
5:01
that leaves some context and
meaning behind.
5:04
It's far more impressive that John took
5 tries when the range went up to 100,
5:07
than when he took 3 tries for
a maximum of 10 values.
5:13
We could plot the maximum range of values,
but
5:17
then we're leaving out
the other half of the picture.
5:19
There are data points, however,
that satisfy both requirements.
5:22
If we only plot the values where the
target, the number John was looking for,
5:26
was the same as the maximum
range of values, we have a data
5:31
point that includes both the size of
the dataset as well as his effort.
5:35
There's an additional benefit
to this approach as well.
5:40
There are three ways we can
measure how well John does or,
5:43
in general, how well any algorithm does.
5:46
First, we can check how well
John does in the best case or
5:49
good scenarios from
the perspective of his strategy.
5:52
In the range of 100 values,
the answer being a lone number,
5:56
like 3 at the start of the range,
is a good scenario.
6:00
He can guess it fairly quickly.
6:03
One is his best case scenario.
6:05
Or we could check how
well he does on average.
6:07
We could run this game a bunch of times
and average out the running time.
6:10
This would give us a much better picture
of John's performance over time, but
6:14
our estimates would be too high
if the value he was searching for
6:18
was at the start of the range or far
too low if it was the end of the range.
6:22
Let's imagine a scenario were Facebook
naively implements linear search when
6:26
finding friends.
6:30
They looked at the latest US census,
saw that 50% of names start with
6:31
the letters A through J,
which is the first 40% of the alphabet.
6:35
And thought, okay, on average,
linear search serves us well.
6:40
But what about the rest of those whose
names start with the letter after J in
6:44
the alphabet?
6:48
Searching for my name would take
longer than the average and
6:49
much longer for someone whose
name starts with the letter Z.
6:52
So while measuring the run time of
an algorithm on average might seem like
6:55
a good strategy, it won't necessarily
provide an accurate picture.
7:00
By picking the maximum in the range,
7:04
we're measuring how our algorithm
does in the worse case scenario.
7:06
Analyzing the worse case scenario's
quite useful because it indicates that
7:10
the algorithm will never
perform worse than we expect.
7:15
There's no room for surprises.
7:18
Back to our graph, we're going to
plot the number of tries, a proxy for
7:20
running time of the algorithm,
against the number of values in the range,
7:25
which we'll shorten to n.
7:29
n here also represents
John's worst-case scenario.
7:31
When n is 10, he takes 10 turns.
7:34
When n is 100, he take 100 turns.
7:37
But these two values alone
are insufficient to really get any sort of
7:40
visual understanding.
7:44
Moreover, it's not realistic.
7:46
John may take a long time to
work through 100 numbers, but
7:48
a computer can do that in no time.
7:52
To evaluate the performance of linear
search in the context of a computer,
7:53
we should probably throw some harder and
larger ranges of values at it.
7:58
The nice thing is by evaluating
a worst-case scenario,
8:03
we don't actually have to do that work.
8:06
We know what the result will be.
8:08
For a given value of n
using linear search,
8:11
it will take n tries to find
the value in the worst-case scenario.
8:13
So let's add a few values in
here to build out this graph.
8:17
Okay, so we have a good picture of
what this is starting to look like.
8:21
As the values get really large,
8:24
the running time of the algorithm
gets large as well.
8:26
Well, we sort of already knew that.
8:30
Before we dig into this run time
any deeper, let's switch tracks and
8:32
evaluate Britney's work.
8:35
By having something to compare against,
8:37
it should become easier to build
a mental model around time complexity.
8:39
You need to sign up for Treehouse in order to download course files.
Sign up