Big data - ReviseSociology

Knowing Capitalism and Lively Data

Knowing Capitalism and Lively Data

Nigel Thrift (2005) developed the concept of ‘knowing capitalism’ to denote a new form of global economy which depends not only on technologies which generate large amounts of digital data, but also on the commodification of that data: a big data economy in which power operates through modes of communication, and

Digital data have become especially valuable as forms of knowledge, especially when they are aggregated into big data sets, and are seen as having huge potential to offer new insights into a range of human behaviours, and to disrupt various industries: from health care to education.

One key change in the age of ‘knowing capitalism’ is that there has been a shift from commodifying workers’ physical labour to profiting from information collected on people’s preferences – which online users willingly give when they create and upload digital content online, download and use geolocation apps, shop online, and like various content.

In this digital age, prosumption is the new norm – people simultaneously consuming and generating online content and In commercial circles, the user of online technologies is ‘the product’, because the information they give off when online is so valuable.

This is why so many applications, such as Facebook, are free to use – because they are really just platforms to harvest valuable data (why charge?)… and the Four big tech companies excercise huge power by virtue of the sheer amount of big data they have already, and continue to collect on their users.

Central to portrayals of the digital data economy is the idea that digital data are lively, mutable, and hybrid. Metaphors of liquidity are very commonly used:

Flows
Streams
Rivers
Floods
Tsunamis

In the digital data economy flows of information are generated and engage in non-linear movement, and according to THrift (2014) new hybrid beings emerge with the mixture of data, objects and bodies….and bodies and identities are fragmented and reassembled through a process of reconfiguration.

Furthermore, digital data and the algorithmic analytics used to interpret them are beginning to have determining effects on people’s lives, influencing their life chances and opportunities.

There is a mobile dimension to how we interact with data too.

Data can become stuck, for example when a company hoards it, or when people do not know how to use it!

Data materialisations constitute an important dimension of knowing capitalism – data is lively, in flux, but it needs to be frozen to be used – in 2D (infographics) or 3D… through printers.

Where 2D data visualisations are concerned, a lot of emphasis is placed on their aesthetic quality, and how the meaning of the data is structured.. And behind this process lies decisions about what to include and what to exclude, and limitations on what can be shown due to software used…. This there are many contingencies framing the way we understand big data in knowing capitalism!

Sources

Summarised from:

Lupton, Deborah (2017) The Quantified Self, Polity

Big Data: Controlling its Use

Changes in the way we interact and communicate lead to changes in the way we govern ourselves and just as with the invention of the printing press resulting in the evolution of copyright and libel laws, so the emergence of big data will result in new laws to govern the new ways in which this information is collect, analysed and utilized.

In this final chapter of the main section of Viktor Mayer-Schonberger and Kenneth Cukier’s (2017) ‘Big Data’: The Essential Guide to Life and Learning in the Age of Insight – the authors suggest four ways in which we might control the use of Big Data in the coming years….

Firstly, Crozier suggests we will need to move from ‘privacy by consent’ to ‘privacy by accountability. Because old privacy laws by consent don’t work in the big data age (See here for why), we will effectively have to trust companies to make informed judgments about the risks of re-purposing the data they hold. If they deem there to be an element of risk of harm to people, they may have to administer a second round of ‘consent of use’, if the risk is very small, they can just go ahead and use it.

If is also possible to deliberately blur data so that it becomes fuzzy and you cannot see individuals in it – so you can set analytical programmes to return aggregate results only -an approach known as differential privacy.

Comment: NB – this sounds dubious – we just trust companies more….the problem here being that we can only really trust them to do one thing – put their profits before everything else, including people’s privacy rights.

Secondly, we will also need to ensure that we do not judge people based on propensity by aggregate. In the big data era of justice, we need to hold people account for their individual actions – i.e. for what they have actually done as individuals, rather than what the big data says people like them are likely to do.

Comment: NB – all he seems to be saying here is that we carry on doing what we already do (in most 9cases at least!)

Thirdly (which stems from the problem that big data can be something of a ‘black box’ – that is to say the number of variables which go into making up predictions and the algorithms which calculate them defy ordinary human understanding) – we will need a new series of experts called algorithmists to be on hand to analyse big data findings if and when individuals feel wronged by them. Crozier argues that these will take a ‘vow of impartiality’ in monitoring and reviewing the accuracy of big data predictions, and sees a role for both internal and external algorithmists.

Comment: this doesn’t half sound like something August Comte, the founding father of Positivism, would say!

Crozier argues this is just the same as new specialists emerging in law, medicine and computer security as these field developed in complexity.

Fourthy and finally, Crozier suggests we will need to develop some sort of new anti-trust laws to ensure that one company does not come to have a monopoly on data.

Comment: Fair enough!

Overall Comment

I detect a distinct pro-market tone in the authors’ analysis of big data – basically we trust companies to use it (but avoid monopoly power), but we mistrust governments – precisely what you’d expect from the Silicon Valley set!

The Risks of Big Data

There are three main risks of Big Data:
the paralysis of privacy,
punishment through propensity,
the fetishization of and dictatorship through data

There are three main risks of Big Data:

The paralysis of privacy
Punishment through propensity
Fetishization of and dictatorship through data

Here I continue my summary of Mayer-Schonberger and Cuker (2017) Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight’.

Three Risks of Big Data

Firstly, simply because so much data is collected on individuals – not only via state surveillance but also via Amazon, Google, Facebook and Twitter,it means that protecting privacy is more difficult -especially when so much of that data is sold on to be analysed for other purposes.

Secondly, there is the possibility of penalties based on propensities – the possibility of punishing people even before they have done anything wrong..

Finally, we have the possibility of a dictatorship of data – whereby information becomes an instrument of the powerful and a tool of repression.

Paralyzing Privacy

The value of big data lies in its reuse, quite possibly in ways that are have not been imagined at the time of collecting it. In terms of personal information, if we are to re-purpose people’s personal data than they cannot give informed consent in any meaningful sense of the phrase – because in order to so you need to know what data a company is collecting and what use they are going to put it to.

The only way big data can work is for companies to ask customers to agree to have their data collected ‘for any purpose’, which undermines the concept of informed consent.

There are still possible ways to protect privacy – for example opting out and anonymisation.

Opting out is simply where some individuals choose not to have their data collected – however, opting out can itself identify certain things about the users – for example, when certain people opted out of Google’s street view and their houses were blurred – they were still noticeable as people who had ‘opted out’ (and thus maybe had more valuable stuff to steal!)/

Anonymisation is where all personal identifiers are stripped from data – such as national insurance number, date of birth and so on, but here people can still be identified – when AOL released its data set of 20 million search queries from over 650K users in 2006, researchers were able to pick individual people out – simply by looking at the content of searches they could deduce that someone was single, female, lived in a certain areas, purchased certain things – then it’s just a matter of cross referencing to find the particular individual.

In 2006 Netflix released over 100 million rental records of half a million users – again anonymised, and again researchers managed to identify one specific Lesbian living in a conversative area by comparing the dates of movies rented with her entries onto the IMD.

Big data, it appears, aids de-anonymisation because we collect more data and we combine more data.

Of course it’s not just private companies collecting data… it’s the government too, The U.S. collects an enormous amount of data – amounts that are unthinkably large – and today it is possible to tell a lot about people by looking at how they are connected to others.

Probability and Punishment

This section starts with a summary of the introductory scene of minority report…

We already see the seeds of this type of pre-crime control through big data:

Parole boards in more than half the states of the US use big data predictions to inform their parole decisions.

A growing number of precincts use ‘Predictive Policing’ – using big data analysis to select which streets to parole and which individuals to harass..

A research project called FAST – Future Attribute Screening Technology – tries to identify potential terrorists by monitoring people’s vital signs.

Cukier now outlines the argument for big-data profiling – mainly pointing out that we’ve taken steps to prevent future risks for years (e.g. seat-belts) and we’ve profiled for years with small data (insurance!) – the argument for big data profiling is that it allows us to be more granular than previously – we can make our profiling more individualised – thus there’s no reason to stop every Arab man under 30 with a one way ticket from boarding a plane, but if that man has done a-e also, then there is a reason.

However, there is a fundamental problem of punishing people based on big data – that is, it undermines the very foundations of justice – that of individual choice and responsibility – by disallowing people choice – big data predictions about parole re offending are accurate 75% of the time – which means that if we use the profiling 100% of the time we are wrongly punishing 1 in 4 people.

Dictatorship of Data

The problem with relying on data to inform policy decisions is that the underlying quality of data can be poor – it can be biased, mis-analysed or used misleadingly. It can also fail to capture what is actually supposed to measure!

Education is a good example of a sector which is governed by endless testing – which only measure a slither of intelligence – the ability to demonstrate knowledge (predetermined by a curriculum) and show analytical and evaluative skills as an individual, in written form, all under timed conditions.

Google, believe it or not, is an example of a company that in the past has been paralysed by data – in 2009 its top designer, Douglas Bowman, resigned because he had to prove whether a border should be 3,4, or 5 pixels wide, using data to back up his view. He argued that such a dictatorship by data stifled any sense of creativity.

The problem with the above, in Steve Jobs’ words: it isn’t the consumers’ job to know what they want’.

In his book Seeing Like a State, the anthropologist James Scott documents the way in which governments make people’s lives a misery by fetishizing quantitative data:they use maps to reorganise communities rather than asking people on the ground for example.

The problem we face in the future is how to harness the utility of big data without becoming overly relying on its predictions.

The Big Data Value Chain

There are three types of company in the big-data value chain: the companies who collect the data, data-analytics companies, and data-ideas companies. This new ‘organisational landscape’ will change the power-relations between businesses enormously, at least according to Viktor Mayer-Schonberger and Kenneth Cukier (2017) in ‘Big Data’: The Essential Guide to Life and Learning in the Age of Insight;.

‘Pure’ data companies are those which have the data, or at least access to it, but not necessarily have the right skills to extract the value from the data. A good example of such a company is Twitter, which has masses of data but licences it out to independent firms to use.

Data analytics companies are those with the statistical, programming, and communication skills necessary to mining insights from data – Teradata is a good exmaple of such a company.

Finally there are those companies with the ‘big-data mindset’ whose founders and employees have unique ideas about how to unlock and combine data to find new forms of value – for example, Pete Warden, the co-founder of Jetpac, which makes travel recommendations based on the photos users upload to the site.

Data analytics has recently been touted as being in the ‘prime position’ in the big-data value chain: there has been a lot of recent talk of the shortage of ‘data scientists’ in the age of ever increasing amount of data…. The McKinsey Global Institute has talked about this for example, and Google’s chief economist Hal Varian famously called statistician the ‘sexist job around’.

We have been given the impression that we are wallowing in data, but lack sufficient people with the skills to mine this data.

Cukier, however, thinks such claims are exaggerated because it is likely that this skills gap will close. Interestingly, in a recent talk on big data science, this view also seemed to be the consensus.

He predicts that what is more likely to happen is that firms controlling access to the data will start to charge more for it, and big data innovators will be be where the real money is…

Hyrbid Data Companies

Companies such as Google and Amazon stretch across all three links in the data value chain. Google collects data like search-query typos, uses it to create a spell-checker and employs people in-house to do the analytics. Such vertical integration is no doubt precisely why Google is today one of the world’s largest companies.

The New Data Intermediaries

Cukier also predicts that there are certain business sectors which will benefit from giving their data to third parties, because keeping it in-house will not be as beneficial to them as sharing their data and combining it with others – third parties are needed to facilitate trust – for example, travel firms will benefit from such an arrangement, not to mention the banking and finance sectors – where more data is better.

The Demise of the Expert

Cukier also predicts that big data analytics will see specialists in different fields being replaced with those with data-science skills able to manage whatever field based on data. He argues that ‘mathematics, statistics, perhaps with a sprinkling of programming and network science, will be as foundational to the modern workplace as numeracy was a century ago and literacy before that’.

Big Winners, Medium Sized Losers..

Large data companies such as Google and Amazon will continue to soar, but big data presents a challenge to the victors of small-world data such as Walmart, Nestle, Boeing…. How these will adapt remains to be seen.

There are, of course, opportunities for ‘smart and nimble start-ups’, but also individuals might start to sell their own data, possibly through new third party firms.

Problems with the fusion of big data and education

The first problem is that it will be more difficult for us to forget and escape our past….

While we as individuals grow, evolve and change, comprehensive educational data collected through the years remains unchanged – there is a problem that as the amount of data collected on us through our formative years, we might be judged in the future by this historic data – creating a kind of ‘permanence of the past’.

Our historic data record might show a future employer that we were enrolled in a remedial math class in our first year of university, and this fact alone might put them off calling us for interview, even if our maths has evolved in the intervening years, which means we might get credit for how we have evolved in our later years.

The problem with data is that it is unlikely to tell anyone about the context in which it takes place – if test scores are low during particular years, for example, the data alone is unlikely to tell us what was going on more broadly in our lives at that time – unlike today, when we can effectively forget low-periods in our lives, in the forthcoming age of big data, they will always be on display for anyone to scrutinise, without access to the more in-depth context.

Employers already track Facebook posts, if there is more educational data, then they might well delve into that too.

A second problem is that our big data record might fix our future…

Today schools make predictions based on ‘small data’, yet students can argue against the paths suggested by such small data (GCSEs etc) because it is precisely that, small, collected at only a few points in time, clearly not telling the whole story.

In the Big data age, however, predictions based on more data may become so accurate that they lock students into educational tiers of particular programmes of study – some universities are already experimenting with ‘e-advisors’ – since the University of Arizona implemented such a system in 2007, the proportion of students moving on from one year to the next has increased from 77% to 84%…. In future these systems may evolve to advise, or prevent, students from undertaking particular courses of study deemed to be too difficult for them.

This may lock-in students to pre-determined study and career paths, which may have a detrimental effect on equality of opportunity.

A third problem, largely dismissed by Cukier, is that the fusion between big data and educational institutions will only work if students and parents consent to tech companies having access to their children’s private data. For some reason he cannot see the problems with this, which suggests more than anything else he’s an industry-insider.

How will Big Data Change Social Research?

Big data will change the nature of social research – more data will do away with the need for sampling (and eradicated the biases that emerge with sampling); big data analysis will be messier, but this will lead to more insights and allow for greater depth of analysis; and finally it will move us away from a limiting hypothesis-led search for causality, to non-causal analysis based on correlation.

At least according to Mayer-Schonberger and Cuker (2017) Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight.

Big Data Research — A third of social science researchers are already working with Big Data

Below I outline their summary of how Cukier thinks big data will change social research:

You might like to read my summary of the introduction to ‘Big Data’ first

More Data

The ability to collect and analyse large amounts of data in real time has many advantages:

It does away with the need for sampling, and all the problems that can emerge with biased sampling.

More data enables us to make accurate predictions down to smaller levels – as with the case of Google’s flu predictions being able to predict the spread of flu on a city by city basis across the USA.

It enables us to use outliers to spot interesting trends – for example credit card companies can use it to detect fraud if too many transactions for a particular type of card originate in one particular area.

When we use all the data, we are more likely to find things which we never expected to find…

Cukier uses Steven Levitt’s analysis of all the data from 11 years worth of Sumo bouts as a good example of the interesting insights to be gained through big data analysis.

A suitable analogy for big data may be the Lytro camera, which captures not just a single plane of light, as with conventional cameras, but rays from the entire light field… the photographer decides later on which element of light to focus on in the digital file…. And he can reuse the same information in different ways.

One of the areas that is most dramatically being shaken up by big data is the social sciences, which have traditionally made use of sampling techniques. This monopoly is likely to be broken by big data firms and the old biases associated with sampling should disappear.

Albert-Laszlo Barabasi examined social networks using logs of mobile phones from about one fifth of an unidentified European country’s population – which was the first analysis done on networks at the societal level using a dataset in the spirit of n = all. They found something unusual – if one removes people with lots of close links in the local area the societal network remains intact, but if one removes people with links outside their community, the social network degrades.

Messier

All other things being equal, big data is ‘messier’ than small data – because the more data you collect, the higher the chance that some of it will be inaccurate. However, the aggregate of all the data should provide more breadth and frequency of data than smaller data sets.

Cukier uses the analogy of measuring temperature in a vineyard to illustrate this – if we have just one temperature gauge, we have to make sure it is working perfectly, but it we have a thousand, we will have more errors, but a much wider breadth of data, and if we take measurements with greater frequency, we will have a more sensitive measurement of changes over time.

When using big data, analysts are generally happy sacrificing some accuracy for knowing the general trend – in the big data world, it is OK if 2+2 = 3.9.

More data is sometimes all we need for 100% accuracy, for example chess games with fewer than 6 pieces on the board have all been mapped out in their entirety, thus a human will never be able to beat a computer again once this point has been reached.

The fact that messiness doesn’t matter that much is evidenced in Google’s success with its translation software – Google employed a relatively simply algorithm but fed it trillions of words from across the internet – all of the messy data it could find – this proves that simple models and lot of data trump smart models and less data.

We see messiness in action all over the internet – it lies in ‘tagging’ and likes being rounded up – none of this is precise, but it works, it provides us with usable information.

Ultimately big data means we are going to have to become happier with uncertainty.

Correlation

It might be hard to fathom today, but when Amazon started up it actually employed book critics and editors to write reviews of books and make recommendations to customers.

Then the CEO Jeff Bezos had the idea of making specific recommendations to customers based on their individual shopping preferences and employed someone called Greg Linden to develop a recommendation system – in 19898 he and his colleagues applied for a patent on ‘item to item’ collaborative filtering – which allowed Amazon to look for relationships between products.

As a result, Amazon’s sales shot up, they sacked the human advisors, and today about 1/3rd of all its sales are based on their recommendations systems. Amazon was an early adopter of big data analytics to drive up sales, and today many other companies such as Netflix also use it as one of the primary methods to keep profits rolling in.

These companies don’t need to know why consumers like the products that they do, knowing that there’s a relationship between the products people like is enough to drive up sales.

Predictions and Predilections

In the big data world, correlations really shine – we can use them to gain more insights extremely rapidly.

At its core, a correlation quantifies the statistical relationship between two data values. A strong correlation means that when one of the data values changes, the other is highly likely to change as well.

Correlations let us analyse a phenomenon not by shedding light on its inner workings, but by identifying a useful proxy for it.

In the small data age, researchers needed to use hypotheses to select one or a handful of proxies to analyse, and hence hard statistical evidence on the relationship between variables was collected quite slowly; with the increase in computational power we don’t need hypothesis-driven analysis, we can simply analyse billions of data points and ‘stumble upon’ correlations.

In the big-data age we can use a data-driven approach to collecting data, and our results should be less biased and more accurate, and we should also be able to get them faster.

One such example of where this data-driven approach has been applied and strong big data correlations was the case of Google’s flu predictions. We didn’t need to know what flu search terms were the best proxy for ‘people with flu symptoms’, in this case, the data simply showed us which search terms were the best proxies.

With correlations there is no certainty, only probability, but this can still provide us with actionable data, as with the case of Amazon above, and there are many other examples of where data driven big data analytics are changing our lives. (p56)

We can use correlations to predict the future – for example, Wal-Mart noticed a correlation between Hurricanes and Flash Light sales, but also pop tarts, so when a Hurricane is predicted, it moves the pop tarts to the front of store and further boosts its sales.

Probably the most notorious use of big data correlations to make predictions is the American discount retailer, Target, who use their data on the products women buy as a proxy for pregnancy – women tend to buy non scented body lotions around the third month of pregnancy and then various vitamin supplements around the 6 month mark – big data even allows predictions about the approximate birth date to be made!

Finding proxies in social contexts is only one way that big-data techniques are being employed – another use is through ‘predictive analytics’, which aims to forsee events before they happen.

One example of predictive analytics is the shipping company UPS using them to monitor its fleet of 10s of 1000s of vehicles – to replace parts just before they wear out, saving them millions of dollars.

Another use is in health care – one piece of research by Dr Carolyn McGregor, with IBM,, used 16 different data streams to track the stats of premature babies – and found that there was a correlation between certain stats and an infection occurring 24 hours later. Interestingly this research found that an infant’s stability was a predictor of a forthcoming infection, which flew in the face of convention – again we don’t know why this is, but the correlation was there.

Illusions and Illuminations

Big data also makes it easier to find more complex, non-linear relationships than when working within a hypothesis-limiting small data paradigm.

One example of a non-linear relationship uncovered by big data analysis is that of the relationship between income and happiness – that happiness increases with income (up until about $30K per year, but then it levels out – once we have ‘enough’ adding on more money doesn’t make us any happier…

Big data also opens up more possibilities for exploring networks – by analyzing how ideas spread through the nodes of networks such as Facebook, for example.

In network analysis, it is very difficult to attribute causality, because everything is connected to everything else, and big data analysis is typically non-causal, just looking for correlations not ‘causation’.

Does big data mean the end of theory?

In 2008 Wired magazine’s chief editor argued that in the ‘Petabyte age’ we would be able to do away with theory – that correlation would be enough for us to understand reality – citing as examples Google’s search engine and gene sequencing – where simply huge amounts of data and applied mathematics replace every other tool that might be brought to bear.

However, this view is problematic because big data is itself founded on theory – it employs mathematical and statistical theories for example, and humans still select data, or at least the tools which select data, which in turn are often driven by convenience and economic concerns.

Having said that, Big Data does potentially move us away from theory and closer to empiricism than in the small data age.

How will Big Data Change Education?

Big Data will make Feedback more focussed on effective teaching rather than student progress, it will make learning more individualised, and it will enable us to make probabilistic predictions about what programmes are best for different students.

This is according to Big Data enthusiasts Meyer-Schonberger and Cukier in their (2017) reprint of their 2013 original ‘Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight…

This post is a summary of the section at the back of this book, which focuses on big data and education (introduction to this section is here).

An excellent counter point to the outrageous, almost entirely speculative and sweepingly general claims made in this book is Neil Selwyn’s ‘Is Technology Good for Education?‘ – the later is based on stacks of peer-reviewed evidence, the former on speculation only.

How will big data change feedback in education?

In the small data age, data collection in schools was largely limited to test scores and attendance, focussing on collecting standardised data on student performance, with feedback being almost exclusively in one direction – from the teachers to the schools to the kids and their parents – what is not measured is how well we teach our kids, or how effective different teaching techniques are in facilitating student progress.

Big data changes this by datafying the learning process – for example, e-books allow us to track how students read books, what they take notes one, at what point the give up reading, what sections they go back and check – thus we can measure how effective different books are, or different passages within books are, at helping students to understand knowledge, which can be used as a basis for immediate and differentiated intervention by teachers.

We could also use e-books in conjunction with testing to measure the relationship between different textual materials and the ‘decay curve’ – the rate at which students forget knowledge, which might be useful in improving test scores.

Companies such as Pearsons and Kaplan are very involved in producing e-books, but at time of writing (2017) even in America only 5% of school text books are digital.

Individualisation

In schools, the education which we are exposed to is standardised into a one size fits all package, tailored to a mythical average student. Learning has barely evolved from the industrial era – the materials students are given are identical, and the learning process still works essentially like an assembly line, with all students being paced through a syllabus at the same rate and learning benchmarked against a series of standardised tests.

All of this is tailored towards the needs of the teachers and the system, not the needs of the students.

However, in the Big Data age, following the American economist Tyler Cowen, ‘average is over’, and following Khan Academy founder Sal Khan ‘one size fits few’. The problem with the current, industrial era education system is that very few people actually benefit from it – the bright student is bored, while the weaker understands nothing. What we need is a means of flexibly adapting the pace and content of teaching to better fit the needs of individual students.

Tailoring education to each student has long been the aim of adaptive-learning software – an example of this is Carnegie Learning’s ‘Cognitive Tutor’ for school mathematics which decides which math questions to ask based on how students answered previous questions. This way it can identify problem areas and drill them, rather than try to cover everything but miss holes in their knowledge, as happens with the traditional system.

Another example is New York City’s ‘School of One’, a math programme in which students get their own personalised ‘playlist’ determined by an algorithm, each day, with maths problems for them suited to their needs.

Such individualised learning systems are dynamic — the learning materials change and adapt as more data is collected, analysed and transformed into feedback. More advanced material is only provided once students have mastered the fundamentals.

All of this is based on the idea of the ‘student as consumer/ client’ – one argument is that ‘if we can rip our favourite music and burn it into our own playlist’, why can’t we do this with education? A second argument is that in any other field of business, consumers provide feedback on products and the manufacturers improve (and increasingly personalise) the products to meet the demands of diverse consumers…. Adaptive learning should transform education into something which is more responsive to the needs of students/ consumers, rather than it being led by unresponsive systems and teachers.

Supporting evidence for adaptive learning:

In a trial of 400 high school freshmen in Oklahoma, the Cognitive Tutor system helped them achieve the same level of math proficiency in 12% less time than students learning math in the traditional way.

According to Bill Gates, talking in 2013, students on remedial education courses using adaptive software outperformed students in conventional courses and colleges benefitted from a 28% reduction in the cost per student.

Probabilistic Predictions

Big data will provide us with insights into how people in aggregate learn, but more importantly, into how each of us individually acquires knowledge. These insights are not perfect – they do not give us cause and effect relationships – Big data insights are probabilistic:

For example, we may spot that teaching materials of a certain sort will improve a particular person’s tests scores by 95%, but if we make a recommendation based on this, it will not work in 5% of cases.

This is something we are going to have to learn to live with, and parents and students are going to have to bear the risk – for example, all Big Data can do is to tell ‘clients’ that if they study this particular course, then there is a 70-80% chance they’ll see ‘x’ amount of improvement.

However, some probabilities will be more certain than others, and so for at least some specific recommendations, we can act with reasonable certainty.

We are going to have to get over seeing through the world through the lens of cause and effect…

Criticisms of Mayer-Schonberger and Cukier’s views on how Big Data will transform education

Personally, as a teacher myself I’m sceptical when non-experts start making sweeping predictions about the future of education based on speculation, especially when one of the claims for the Big Data is that it provides empirical insights, such speculation is hypocritical, precisely because it’s not based on any actual data!

The idea that transnational technology companies are going to help everyone in education is nonsense – they are profit driven, the fact that profit comes first, and that this will be a limiting factor in how data is used in the future is not even mentioned.

They see ‘teachers as the enemy’ – as a barrier to Big data, this is highly dismissive of a group of people who have gone into a job to benefit children, where I doubt that people for tech companies do not have this as their primary motive – also see below, for an alternative explanation of their criticism to ‘teachers as a barrier’ to ed tech companies playing more of a role in education.

The ‘one size fits all model’ might be dominant in education because with a teacher student ration of 1-100 (in colleges) teachers literally cannot meet the individual needs of individual students. There simply isn’t time for this, along with the need for teachers to keep on top of the knowledge themselves, and keep up to date with technological changes, institutional-legal requirements, and do all of the (still necessary) marking of students work.

Related to the above point, making teachers analogous to other professionals with clients, I don’t believe there’s any other field of work where professionals are expected to deal with 100 clients at a time and personally interact with each of them every single day in a meaningful way… dealing with diverse and complex knowledge (rather than specialising in one particular thing, i.e. a haircut, or a financial advice for example) – while it might be fair to expect teachers to respond to ‘clients’ demands, 1 teacher cannot do this with 100 students. The ratio needs altering (1-10 maybe?).

The authors cit very few examples of peer-reviewed evidence to back up their claims.

Further Reading…

Problems of the role of technology companies in education

China’s Social Credit System: Big Data meets Big Brother

Most of us are used to having our daily activities constantly monitored and evaluated – what we buy, how much tax we pay (or not), what television programmes we watch, what websites we visit, where we go, how ‘active’ we are’, who our friends are and how we interact with them – such monitoring is now done routinely via Amazon, Facebook, and Google.

Now, imagine if all of that ‘big data’ was fed to the central government, and mashed into a single number which would be our ‘citizen score’ which in turn would measure the value of our contribution to our nation and which would inform everyone of how patriotic, politically sound and trustworthy we are as a person.

And imagine further if that ‘citizen score’ determined our eligibility for certain jobs, our creditworthiness, where our children could go to school, or even our chances of getting a date.

This isn’t fantasy, China is in the process of developing such a Social Credit System, which will be mandatory by 2020. Presently, the Chinese government is liaising with various big data companies and trialing out schemes in order to figure out what kinds of data to collect, and what algorithms to use to determine an individual’s final ‘citizen score’.

The Trial Run…

One company which is set to be a major player in running China’s social credit system is Alibaba, which is currently trialling a ‘credit ranking scheme’ which people can voluntarily sign up to.

The scheme gives people a score of between 350 and 950, based on data collected from five major categories…

Credit history – does the person pay their bills on time?
Ability to fulfill contractual obligations on time
Personal information – mobile phone number, address
Behaviour and preference – such as what products someone buys – people who buy nappies are given a higher score, because parents tend to be more responsible, people who spend 10 hours a day playing video games are given a lower score.
Interpersonal relationships – who your friends are and what you say on social media — those who ‘big up the Chinese economy’ get a higher score, for example.

It’s the the fourth and fifth categories above which are the most interesting… the first three are pretty standard (insurance companies in most countries will use these to assess premiums), but the last two involve turning personal comments into social and political capital…. they really politicize the personal!

When China’s social credit system ‘goes live’ in 2020, private companies will essentially be spying for the Chinese government – and negative tweets about Tiananmen Square for example, will hurt your social credit score.

And if your friends post negative tweets about Tiananmen Square, well, that will also make your score go down!

Rewards and Punishments

Volunteers who are currently signed up to Alibaba’s trial get rewards if they get a high credit score – preferential access to loans if they get a score above 600, and if they get to 650 they get faster check-ins at hotels and airports.

When the system eventually goes live in 2020, people with lower citizen scores will be punished – with slower internet speeds, restricted access to restaurants and will lose the right to freely travel abroad, for example.

As the government states the social credit system will ‘allow they trustworthy to roam everywhere under heaven while making it hard for the discredited to take a single step’.

Is it that different to what we’ve got in the West?

While this may look like a horrific meeting between George Orwell’s 1984 and Pavlov’s dogs, maybe this isn’t that different to western big data management systems?

We’ve had credit scoring for 70 years now, that doesn’t exist in China yet, so this could just be a rapid development of what here has evolved by stealth.

And as to using personalized data….. individuals already rate restaurants, movies and books, and each other!, and various companies routinely scrutinize big data….maybe we are also getting closer to the Chinese concept of ‘life scoring’ as our real world and online worlds merge.

Sources

Modified from The Week, November 2017

Book – Rachel Botsman (2017) Who Can You Trust: How Technology Brought Us Together – and Why it Could Drive Us Apart

What is Big Data?

Big data refers to things one can do at a large scale that cannot be done at a smaller scale. Big data analysis typically uses all available information and billions of data points to identify correlations which reveal new insights about human behaviour which are simply not available when using smaller data sets.

Big data has emerged with the widespread digitisation of information which has made it easier to store and process the increasing volume of information available to us.

Big data is also dependent on the emergence of new data processing tools such as Hadoop which are not based on the rigid hierarchies of the ‘analogue’ age, in which data was typically collected with specific purposes in mind. The rise of big data is likely to continue given that society is increasingly engaged in a process of ‘datification’ – there is an ongoing process of companies collecting data about all things under the sun.

Big data is also fundamentally related to the rise of large information technology companies, most obviously Google, Facebook and Amazon, who collect huge volumes of data and see that data as having an economic value.

A good example of ‘big data analysis’ is Google’s use of its search data to predict the spread of the H1N1 flue virus in 2009, based on the billions of search queries which it receives every day. They took 50 million of the most search terms and compared them with CDC (Centre for Disease Control) data, and found 45 search terms which were correlated with the official figures on the spread of flu.

As a result, Google was able to tell how the H1N1 virus was spreading in real time in 2009 without relying on the reporting-lag which came with CDC data, which is based on people visiting doctors to report flu, a method which can only tell us about the spread of flu some days after it has already spread.

A second useful example is Oren Etzioni’s ‘Farecast company’ – which evolved to use 200 billion flight-price records to predict when the best time for consumers would be to buy plane tickets. The technology he evolved to crunch the data today forms the basis of sites such as Expedia.

There are three shifts in information analysis that occur with Big Data

Big data analysts seek to use all available data rather than relying on sampling. This is especially useful for gaining insights into niche subcategories.
Big data analysts give up on exactitude at the micro level to gain insight at the macro level – they look for the general direction rather than measuring exactly down to the single penny or inch.
Big data analysis looks for correlations, not causation – it can tell us that something is happening rather than why it is happening.

Cukier uses two analogies to emphasise the differences of working with big data compared to the ‘sampled data’ approach of the analogue age.

Firstly, he likens it the shift from painting as a form of representation to movies – the later is fundamentally different to a still painting.

Secondly, he likens it to the fact that at the subatomic level materials act differently to how they do at the atomic level – a whole new system of laws seem to work at the micro level.

Big Data – don’t forget to be sceptical!

This post is only intended to provide a simple, starting point definition of big data, and the above summary is taken from a best selling book on big data (source below) – this book is very pro-big data – extremely biased, overwhelmingly in favour of it – if you buy it and read it, keep this in mind! Big data also has its critics, but more of that later.

Sources

Based on chapter 1 of ‘Mayer-Schonberger and Cuker (2017) Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight’.