The revelation earlier this year that 100s of thousands of Facebook users were unknowingly subjects in a psychology experiment in 2012 caused widespread negative reaction. According to this WSJ article “Researchers from Facebook and Cornell University manipulated the news feed of nearly 700,000 Facebook users for a week in 2012 to gauge whether emotions spread on social media.” Another interesting read comes from Doug Henschen of InformationWeek titled “Mining WiFi Data: Retail Privacy Pitfalls”. In this article Doug speaks to the value that retailers can realize by mining Wifi data but also the potential pitfalls of being able to track and store the minute behaviors of individuals.
So of course Facebook is not the only organization with a burgeoning wealth of personal customer data; every business looking to gain an edge in its industry is looking to store every piece of data it generates (including data on every single customer interaction) and at some point gain valuable insight from it. Every business with a Big Data initiative needs to carefully consider data privacy and security ramifications. And beyond the ethical decisions around use of data that must be considered is how technology supports governance of data – how is access to data limited and tracked, how do you know what personal data you are storing and how do you mask it?
The critical importance of governance for the success of a Big Data initiative is something IBM recognized very early and something it has invested heavily in for its BigInsights Hadoop offering. I wanted to take a few posts to take a closer look at capabilities for governance included in BigInsights – where they come from, how they work and the business problems they address.