Trust, but Verify - The Ethics of Data Collection

August 7, 2014

In this era, there is little that cannot be tracked in our online lives. Apps now make use of GPS devices in mobile phones, microphones, and other aspects of our phones and computers to collect data on users. Google is eavesdropping, Snapchats are going undeleted, and Apple uses an unencrypted file stored in iTunes accounts to track movements of individual iPhone users. While this kind of service might prove attractive to those interested in tapping into mobile social networks, it also creeps out most.

Companies are steadily gaining new ways to capture information about us. They now have the technology to make sense of massive amounts of unstructured data, using natural language processing, machine learning, and software architectures such as Hadoop, which handles high volumes of simultaneous search queries. Data of this kind is now the target of data mining. So is the information generated by social networks, user profiles and posts. A report from the market intelligence firm IDC estimates that in 2009 stored information totaled 0.8 zetabytes, the equivalent of 800 billion gigabytes. IDC predicts that by 2020, 35 zetabytes of information will be stored globally, and that much of that will be customer information. As the store of data grows, the analytics available to draw inferences from it will only become more sophisticated.

For all the privacy concerns, the online economy creates enormous value by using customer information. In 2009, according an ad industry study cited by the Wall Street Journal, the average price of an untargeted ad online was $1.98 per thousand views. The average price of a targeted ad was $4.12 per thousand. Retailers, along with big players such as Facebook and Yahoo, are using the technology of startups to sort through behavioral information they've compiled over years.

Facebook is one example of how extensive this kind of tracking can be. Its "like" button has become ubiquitous online. Click "like" and you can instantly share something that pleases you with your friends. But visit a page with a "like" button on it while you're logged in to Facebook, and Facebook can track what you do there. The potential dark side of data collection and use suggests the need for a code of ethical principles. But how to structure them?

Clarity on Practices: When data is being collected, let users know about it, maybe even in real time. This addresses the issue of hidden files and unauthorized tracking. Giving users access to what a company knows about them could go a long way toward building trust. For example, if you want to know what Google knows about you, go to www.google.com/ads/preferences, and you can see both the data it has collected and the inferences it, and third parties, have drawn from what you've done.

Simplicity of Settings: Give users a chance to figure out for themselves what level of privacy they really want.

Privacy by Design: Some might argue that neither clarity nor simplicity is sufficient. In that case, perhaps organizations should incorporate privacy protections into everything they do. Meaning that while they continue to collect customer information, customer privacy becomes a guiding principle.

Exchange of Value: Walk into a your favorite coffee shop and you'll always feel flattered when the barista remembers your name and drink of choice. Arguably, something similar could apply online: the more a service provider knows about you, the greater the chance that you'll like the service. Transparency could make it easier for online businesses to show customers what they will get in exchange for sharing their personal information. For example, Netflix uses users' movie-viewing histories to provide targeted, and thus more useful, recommendations.

Until greater clarity, simplicity, or privacy by design becomes a mandate, we'll just have to watch our own Web use, and watch how the Web uses us.