Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Video Player
00:00
00:00
00:00
- 2x 2x
- 1.75x 1.75x
- 1.5x 1.5x
- 1.25x 1.25x
- 1.1x 1.1x
- 1x 1x
- 0.75x 0.75x
- 0.5x 0.5x
Even with secure, encrypted email and internet traffic, information about your online activity can still say a lot about you. Learn the aspects of your online presence that can identify you as uniquely as your fingerprint.
New Terms:
- Metadata -- Data about data. The additional information associated with a message or communication besides its direct content.
- IP Address -- Internet Protocol Address. A unique (enough) identifier that allows internet to route traffic to and from the right places.
- Anonymized Dataset -- A collection of information about people where the personally identifiable information has been stripped.
- De-anonymization -- A strategy in data-mining where an anonymized dataset is cross-referenced with other available data to re-identify the sources.
Further Reading:
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
In the last video,
0:00
we explored how common internet traffic
is exposed in different scenarios.
0:01
Our previous example was the fairly
innocuous movie show times.
0:05
But, what if it was about a health
condition, or a political organization?
0:09
Even with Google's HTTPS search,
an eavesdropper on an open
0:14
Wi-Fi can see that someone went to
Google.com, just not the content.
0:18
It maybe somewhat anonymous,
0:23
as the traffic doesn't say the name of the
person, but it is certainly not private.
0:24
In a very similar manner,
going to Google.com is fairly generic, but
0:29
what if it was plannedparenthood.com or
a substance abuse support website?
0:33
What if you were the only other
person in the coffee shop?
0:38
Metadata is essentially data about data.
0:42
In the previous video,
we showed a search for
0:45
movie showtimes from a coffee
shop in Portland, Oregon, and
0:47
the results provided by Google were for
movies in the Portland area.
0:51
Google was able to find out more
information about that searched based on
0:55
metadata attached to the request.
0:59
This includes the IP address,
or internet protocol address,
1:01
which essentially lets the WiFi network,
ISP, and Google know the source
1:05
of the search request, to be able to send
the results back to the right place.
1:10
Other metadata could include date,
time, location, the browser used,
1:15
the device used, its operating system,
the network used, etc.
1:20
It's important to consider that even
though none of this information contains
1:26
your personal information specifically,
like your name or
1:29
your home address, it really doesn't
have to in order to track you.
1:32
In fact, there's so much metadata
attached to almost all internet traffic,
1:37
it can often identify you as
uniquely as your own fingerprint.
1:42
This is another website demo that
I encourage you to try yourself,
1:46
it's called Panopticlick.
1:50
And it's from the Electronic Frontier
Foundation, a great non-profit
1:51
organization dedicated to protecting
our security and privacy rights.
1:54
Just click one button, and this site will
collect as much information from you
1:59
as it has available, which is as much
as almost any site has available.
2:02
It will detect things like the device's
operating system, the browser and
2:07
version, and
even the fonts installed on the device.
2:10
These are all meant to assist the browser
in rendering web pages, things like
2:14
showing the right screen size, whether
you're on a cell phone or a laptop.
2:18
But because these details can be
aggregated with other metadata,
2:22
including your IP address, which is
specific to certain regions of the world.
2:25
The various combinations of them all end
up being fairly statistically unique.
2:29
Going back to the coffee shop example,
let's say you were not the only one
2:35
online while someone was sniffing, or
eavesdropping on the internet traffic.
2:39
Without this metadata associated
with everyone's traffic,
2:44
you'd be anonymous within
the coffee shop crowd.
2:47
But what if you were the only
one using a Mac laptop, or
2:50
what about a recognizable
model of Android phone?
2:53
This is effectively a form
of deanonymization, and
2:56
it's a real threat to your
ability to stay private.
2:59
Another form of deanonymization can occur
when organizations publish datasets that
3:03
they have anonymized, by removing personal
information like name and demographics.
3:07
Unfortunately, when combined
with other public data,
3:13
these datasets can reveal
surprisingly specific information.
3:16
Examples of De-anonymization Failures.
3:20
An academic paper, Simple Demographics
Often Identify People Uniquely,
3:23
showed that a birth date, gender, and zip
code is enough to identify most people.
3:28
Uber published, and since removed, a blog
post demonstrating how they could detect
3:33
when riders had had one night stands.
3:38
An anonymized New York City taxi
dataset was cross-referenced
3:41
with publically available photos
from news and tabloid publications.
3:45
To reveal the home addresses of
celebrities, the clubs they visited,
3:49
and even which of them tipped well.
3:53
Metadata equals surveillance,
it's that simple.
3:55
Personally, I find it pretty
convenient that my search for
3:59
showtimes, or a search for library hours
would show results local to my area.
4:03
This is a tradeoff with my security and
privacy that I accept.
4:07
Other metadata collection,
I strongly oppose.
4:12
The important thing is to understand what
metadata is, when it's being collected,
4:15
and to make that choice for yourself.
4:19
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up