There are three main risks of Big Data:
- The paralysis of privacy
- Punishment through propensity
- Fetishization of and dictatorship through data
Here I continue my summary of Mayer-Schonberger and Cuker (2017) Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight’.
Three Risks of Big Data
Firstly, simply because so much data is collected on individuals – not only via state surveillance but also via Amazon, Google, Facebook and Twitter,it means that protecting privacy is more difficult -especially when so much of that data is sold on to be analysed for other purposes.
Secondly, there is the possibility of penalties based on propensities – the possibility of punishing people even before they have done anything wrong..
Finally, we have the possibility of a dictatorship of data – whereby information becomes an instrument of the powerful and a tool of repression.
The value of big data lies in its reuse, quite possibly in ways that are have not been imagined at the time of collecting it. In terms of personal information, if we are to re-purpose people’s personal data than they cannot give informed consent in any meaningful sense of the phrase – because in order to so you need to know what data a company is collecting and what use they are going to put it to.
The only way big data can work is for companies to ask customers to agree to have their data collected ‘for any purpose’, which undermines the concept of informed consent.
There are still possible ways to protect privacy – for example opting out and anonymisation.
Opting out is simply where some individuals choose not to have their data collected – however, opting out can itself identify certain things about the users – for example, when certain people opted out of Google’s street view and their houses were blurred – they were still noticeable as people who had ‘opted out’ (and thus maybe had more valuable stuff to steal!)/
Anonymisation is where all personal identifiers are stripped from data – such as national insurance number, date of birth and so on, but here people can still be identified – when AOL released its data set of 20 million search queries from over 650K users in 2006, researchers were able to pick individual people out – simply by looking at the content of searches they could deduce that someone was single, female, lived in a certain areas, purchased certain things – then it’s just a matter of cross referencing to find the particular individual.
In 2006 Netflix released over 100 million rental records of half a million users – again anonymised, and again researchers managed to identify one specific Lesbian living in a conversative area by comparing the dates of movies rented with her entries onto the IMD.
Big data, it appears, aids de-anonymisation because we collect more data and we combine more data.
Of course it’s not just private companies collecting data… it’s the government too, The U.S. collects an enormous amount of data – amounts that are unthinkably large – and today it is possible to tell a lot about people by looking at how they are connected to others.
Probability and Punishment
This section starts with a summary of the introductory scene of minority report…
We already see the seeds of this type of pre-crime control through big data:
Parole boards in more than half the states of the US use big data predictions to inform their parole decisions.
A growing number of precincts use ‘Predictive Policing’ – using big data analysis to select which streets to parole and which individuals to harass..
A research project called FAST – Future Attribute Screening Technology – tries to identify potential terrorists by monitoring people’s vital signs.
Cukier now outlines the argument for big-data profiling – mainly pointing out that we’ve taken steps to prevent future risks for years (e.g. seat-belts) and we’ve profiled for years with small data (insurance!) – the argument for big data profiling is that it allows us to be more granular than previously – we can make our profiling more individualised – thus there’s no reason to stop every Arab man under 30 with a one way ticket from boarding a plane, but if that man has done a-e also, then there is a reason.
However, there is a fundamental problem of punishing people based on big data – that is, it undermines the very foundations of justice – that of individual choice and responsibility – by disallowing people choice – big data predictions about parole re offending are accurate 75% of the time – which means that if we use the profiling 100% of the time we are wrongly punishing 1 in 4 people.
Dictatorship of Data
The problem with relying on data to inform policy decisions is that the underlying quality of data can be poor – it can be biased, mis-analysed or used misleadingly. It can also fail to capture what is actually supposed to measure!
Education is a good example of a sector which is governed by endless testing – which only measure a slither of intelligence – the ability to demonstrate knowledge (predetermined by a curriculum) and show analytical and evaluative skills as an individual, in written form, all under timed conditions.
Google, believe it or not, is an example of a company that in the past has been paralysed by data – in 2009 its top designer, Douglas Bowman, resigned because he had to prove whether a border should be 3,4, or 5 pixels wide, using data to back up his view. He argued that such a dictatorship by data stifled any sense of creativity.
The problem with the above, in Steve Jobs’ words: it isn’t the consumers’ job to know what they want’.
In his book Seeing Like a State, the anthropologist James Scott documents the way in which governments make people’s lives a misery by fetishizing quantitative data:they use maps to reorganise communities rather than asking people on the ground for example.
The problem we face in the future is how to harness the utility of big data without becoming overly relying on its predictions.