Page Content | Main Menu | Section Menu | Support Us | Contact Us
Center for Democracy and Technology
Working for Democratic Values in a Digital Age
Support CDT
Contact Us
PolicyBeta - Digital Policy in Process
This Section

Netflix Needs to Put “Privacy Risks” in Their Queue

September 30th, 2009 by Erica Newland

Netflix recently announced winners of the one-million-dollar “Netflix Prize” and its plans for a new competition, creatively dubbed “Netflix Prize 2.” Although details of this second contest still haven’t been made public, the New York Times has reported that competitors will be challenged to “model individuals’ ‘taste profiles’,” based on a dataset that will hold “demographic and behavioral data,” including information about members’ ages, gender, ZIP codes, genre preferences, as well as their rental histories and movie ratings.

The announcement of Netflix Prize 2 illustrates the continued blinders that companies have about the ease with which individuals listed in supposedly anonymized datasets can be identified. The ostensibly anonymized dataset released to the public for Netflix Prize was limited to the video rental histories of 480,000 Netflix subscribers, the ratings (1-5 stars or “no rating”) that subscribers gave each movie, and the date that subscribers rated each movie. As law professor and CDT Academic Fellow Paul Ohm has pointed out, between 2006 and 2008, researchers Arvind Narayanan and Vitaly Shmatikov showed that if you know just a little information about a friend (or enemy’s) movie-viewing habits, using the Netflix Prize database you can likely uncover every movie your friend ordered through Netflix and how she rated it.

Add demographic and behavioral data into the mix and how do you even take seriously claims that the data has been and will remain de-identified?

Netflix is far from the only company releasing easily de-anonymized data under pretenses that the sets’ contents are unidentifiable. From the user identifications that came out of AOL’s infamous data release to Latanya Sweeney’s use of Massachusetts residents’ ZIP code, birth date, and gender – all found in public voter rolls – to identify individuals whose “anonymized” hospital records had been publicly released, the pretense of anonymization is time and again revealed as false. Companies need to own up to the privacy risks inherent in releasing such data and find more comprehensive and robust methods for truly de-identifying their data and protecting their customers.


This entry was posted on Wednesday, September 30th, 2009 at 12:18 pm and is filed under CDT, Consumer Privacy. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

About the Blog

    PolicyBeta is a forum for CDT experts to discuss news and developments in the technology policy arena. Visitors are encouraged to comment on the blog or email the authors.

    Our goal with PolicyBeta is to foster thoughtful discussion regarding technology policy as it relates to civil liberties and democratic values. While we encourage comments, we must insist that they be focused, relevant and written in a tone that is respectful of other posters. For more information, please feel free to contact PolicyBeta editor Brock Meeks.

    Check the main CDT site for complete, up-to-date information on CDT initiatives and activities.

Search Blog
       Top
Privacy Policy | Feedback