Mind Your Metadata4:22 with Greg Stromire
Even with secure, encrypted email and internet traffic, information about your online activity can still say a lot about you. Learn the aspects of your online presence that can identify you as uniquely as your fingerprint.
- Metadata -- Data about data. The additional information associated with a message or communication besides its direct content.
- IP Address -- Internet Protocol Address. A unique (enough) identifier that allows internet to route traffic to and from the right places.
- Anonymized Dataset -- A collection of information about people where the personally identifiable information has been stripped.
- De-anonymization -- A strategy in data-mining where an anonymized dataset is cross-referenced with other available data to re-identify the sources.
In the last video, 0:00 we explored how common internet traffic is exposed in different scenarios. 0:01 Our previous example was the fairly innocuous movie show times. 0:05 But, what if it was about a health condition, or a political organization? 0:09 Even with Google's HTTPS search, an eavesdropper on an open 0:14 Wi-Fi can see that someone went to Google.com, just not the content. 0:18 It maybe somewhat anonymous, 0:23 as the traffic doesn't say the name of the person, but it is certainly not private. 0:24 In a very similar manner, going to Google.com is fairly generic, but 0:29 what if it was plannedparenthood.com or a substance abuse support website? 0:33 What if you were the only other person in the coffee shop? 0:38 Metadata is essentially data about data. 0:42 In the previous video, we showed a search for 0:45 movie showtimes from a coffee shop in Portland, Oregon, and 0:47 the results provided by Google were for movies in the Portland area. 0:51 Google was able to find out more information about that searched based on 0:55 metadata attached to the request. 0:59 This includes the IP address, or internet protocol address, 1:01 which essentially lets the WiFi network, ISP, and Google know the source 1:05 of the search request, to be able to send the results back to the right place. 1:10 Other metadata could include date, time, location, the browser used, 1:15 the device used, its operating system, the network used, etc. 1:20 It's important to consider that even though none of this information contains 1:26 your personal information specifically, like your name or 1:29 your home address, it really doesn't have to in order to track you. 1:32 In fact, there's so much metadata attached to almost all internet traffic, 1:37 it can often identify you as uniquely as your own fingerprint. 1:42 This is another website demo that I encourage you to try yourself, 1:46 it's called Panopticlick. 1:50 And it's from the Electronic Frontier Foundation, a great non-profit 1:51 organization dedicated to protecting our security and privacy rights. 1:54 Just click one button, and this site will collect as much information from you 1:59 as it has available, which is as much as almost any site has available. 2:02 It will detect things like the device's operating system, the browser and 2:07 version, and even the fonts installed on the device. 2:10 These are all meant to assist the browser in rendering web pages, things like 2:14 showing the right screen size, whether you're on a cell phone or a laptop. 2:18 But because these details can be aggregated with other metadata, 2:22 including your IP address, which is specific to certain regions of the world. 2:25 The various combinations of them all end up being fairly statistically unique. 2:29 Going back to the coffee shop example, let's say you were not the only one 2:35 online while someone was sniffing, or eavesdropping on the internet traffic. 2:39 Without this metadata associated with everyone's traffic, 2:44 you'd be anonymous within the coffee shop crowd. 2:47 But what if you were the only one using a Mac laptop, or 2:50 what about a recognizable model of Android phone? 2:53 This is effectively a form of deanonymization, and 2:56 it's a real threat to your ability to stay private. 2:59 Another form of deanonymization can occur when organizations publish datasets that 3:03 they have anonymized, by removing personal information like name and demographics. 3:07 Unfortunately, when combined with other public data, 3:13 these datasets can reveal surprisingly specific information. 3:16 Examples of De-anonymization Failures. 3:20 An academic paper, Simple Demographics Often Identify People Uniquely, 3:23 showed that a birth date, gender, and zip code is enough to identify most people. 3:28 Uber published, and since removed, a blog post demonstrating how they could detect 3:33 when riders had had one night stands. 3:38 An anonymized New York City taxi dataset was cross-referenced 3:41 with publically available photos from news and tabloid publications. 3:45 To reveal the home addresses of celebrities, the clubs they visited, 3:49 and even which of them tipped well. 3:53 Metadata equals surveillance, it's that simple. 3:55 Personally, I find it pretty convenient that my search for 3:59 showtimes, or a search for library hours would show results local to my area. 4:03 This is a tradeoff with my security and privacy that I accept. 4:07 Other metadata collection, I strongly oppose. 4:12 The important thing is to understand what metadata is, when it's being collected, 4:15 and to make that choice for yourself. 4:19
You need to sign up for Treehouse in order to download course files.Sign up