Social media - ReviseSociology

Contemporary sociology: false news spreads faster than true news

A recent MIT study led by Sinan Aral, published in the journal Science in early March (2018) found that ‘false news’ spreads much more quickly than real news—and it seems to be humans, more than bots, who are responsible for the imbalance.

Fake political news stories spread the fastest, but the findings also applied to stories on urban legends, business, terrorism, science, entertainment, and natural disasters.

Aral’s team of researchers looked at sample of 4.5 million tweets created by about 3 mmillion people over an 11 year period. Together these tweets formed 126,000 “cascades” of news stories, or uninterrupted retweet chains. The researchers compared to spread of false vs. true news stories, verified by using sites such as factcheck.org.

The main findings

false stories were 70% more likely to be retweeted,
while true stories never reached past a ‘cascade-depth’ of 10, false stories spread to a depth of 19,
false studies reached a cascade “depth” of 10 about 20 times faster than true ones.
true news stories about six times as long to reach 1,500 readers as false ones did,
“False political news traveled deeper and more broadly, reached more people, and was more viral than any other category of false information,”
humans were more likely to spread the false news than bots,
Fake news tended to be associated with fear, disgust, and surprise, whereas true stories triggered anticipation, sadness, joy, and trust.

Why do people spread fake news?

The authors of the study offer a ‘neutral’ explanation – simply that fake news is more ‘novel, novelty attracts more human attention, and ‘novel news’ is more valuable – individuals gain more status for being the ones who share novetly (or at least peopel think they will gain more status) and novel information tends to be more useful in helping us make decisions about how to act in society.

Ironically, spreading false news tends to have the opposite effect: it makes individuals who spread it look stupid and may lead to us taking fewer risks and to a misallocation of resources as we attempt to mitigate this (non-real) risks.

Relevance to A-level Sociology?

This is a great example of hyperreality…. to paraphrase Baudrillard, False News never happened… but it has real consequences.

It’s worth noting the limits of the study too… it’s limited to Twitter and doesn’t really help us to understand where fake news comes from, for example.

The fact that it’s humans, not bots spreading false news means that interventions will be more difficult and more complicated, because it’s unlikely that we’ll be able to find a technological fix for the problem.

I could imagine that Gomm and Gouldner would criticise this study as being ‘too neutral’… it could have looked more at the ideological bias of the political fake news stories, and the profiles of those spreading fake news, for example.

Sources

The Spread of True and False News Online – Science, March 2018

Fake News Spreads Faster than the Truth on Twitter – Forbes

Twitter Users by Occupation and Social Class

The middle classes and especially those in creative industries are more likely to be on twitter, but finding this out is more difficult than you might think, at least according to some recent research:

Who Tweets?: Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data

This post is a brief summary of the methods and findings of the above.

Introduction/ Context/ Big Data

90% of the world’s data has been generated in the past 2 years and the trend is apparently exponential, the key challenges of harnessing this data (known as the 5Vs: volume,veracity, velocity, variety and value) are not so easily overcome.

The primary criticism of such data is that it is there to be collected and analysed before the question is asked and, because of this, the data required to answer the research question may not be available with important information such as demographic characteristics being absent.

The sheer volume of data and its constant, flowing, locomotive nature provides an opportunity to take the ‘pulse of the world’ every second of the day rather than relying on punctiform and time-consuming terrestrial methods such as surveys. Only 1% of Twitter users in the UK amounts to around 150,000 users. Even a tiny kernel of ‘useful’ data can still amount to a sample bigger than some of the UK’s largest sample surveys

However, social media data sources are often considered to be ‘data-light’ as there is a paucity of demographic information on individual content producers.

Yet, as Savage and Burrows argue, sociology needs to respond to the emergence of these new data sources and investigate the ways in which they inform us of the social world. One response to this has been the development of using ‘signatures’ in social media as proxies for real world events and individual characteristics

This paper builds on this work conducted at the Collaborative Online Social Media Observatory (COSMOS),through proposing methods and processes for estimating two demographic variables: age and occupation (with associated class).

How Do Twitter Users Vary by Occupation and Social Class – Methods

The researchers used a sample 32, 032 twitter profiles collected by COSMOS, relying on the entry in the ‘profile’ box to uncover occupation and class background.

They took the occupation with the most number of words as the primary occupation, and, if multiple occupations are listed, they took the first occupation as the primary occupation.

They then randomly selected 1,000 cases out of the 32,032 to which an occupation was assigned and three expert coders visually inspected the results of 1000 twitter profiles in anticipation of inaccuracies and errors.

They found that 241 (so 24%) had been misclassified, with a high level of inter-rater reliability.

The main problems of identification stemmed from the multiple meanings of many words related to occupations, Hobbies, and with obscure occupations. For example, people might refer to themselves as a ‘Doctor Who fan’ or a ‘Dancer trapped in a software engineer’s body’.

So what is the class background of twitter users?

The table below shows you three different data sets – the class backgrounds as automatically derived from the entire COSMOS sample of profiles, the class background of the 32 000 sample the researcher used and the class backgrounds of the 1000 that were visually verified by the three expert coders (for comments on the differences see ‘validity problems’ below).

There is a clear over representation of NS-SEC 2 occupations in the data compared with the general UK population which may be explained by the confusion between occupations and hobbies and/or the use of Twitter to promote oneself or one’s work. NS-SEC 2 is where occupations such as ‘artist’, ‘singer’, ‘coach’, ‘dancer’ and ‘actor’ are located and the utility of the tool for identifying occupation for this group is further exacerbated by the fact that this is by far the most populous group for Twitter users and the largest group in the general UK population by 10% points. Alternatively, if the occupation of these individuals has been correctly classified then we can observe that they are over represented on Twitter by a factor of two when using Census data as a baseline measure.

Occupations such as ‘teacher’, ‘manager’ and ‘councillor’ are not likely to be hobbies but there is an unusually high representation of creative occupations which could also be pursued as leisure interests with 4% of people in the dataset claiming to be an ‘actor’, 3.5% an ‘artist’ and 3.5% a ‘writer’. An alternative explanation is that Twitter is used by people who work in the creative industries as a promotional tool.

Validity problems with the social-class demographics of twitter data

Interestingly, the researchers rejected the idea that people would just outright lie about their occupations noting that ‘previous research [has] indicated that identity-play and the adoption of alternative personas was often short-lived, with ‘real’ users’ identities becoming dominant in prolonged interactions. The exponential uptake of the Internet,beyond this particular group of early adopters,was accompanied with a shift in the presentation of self online resulting in a reduction in online identity-play’.

The COSMOS engine does automatically identify occupation, but it identifies occupation inaccurately – and the degree of inaccuracy varies with social class background. The researchers note:

‘unmodified occupation identification tool appears to be effective and accurate for NS-SEC groups in which occupational titles are unambiguous such as professions and skilled trades (NS-SEC 1,3,4 and 5). Where job titles are less clear or are synonymous with alternative activities (NS-SEC 2, 6 and7) the requirement for human validation becomes apparent as the context of the occupational term must betaken into account such as the difference between “I’m a dancer in a ballet company”and “I’m a dancer trapped in the body of a software engineer’.

The researchers note that the next step is to further validate their methodology through establishing the ground-truth via ascertaining the occupation of tweeters through alternative means, such as social surveys (an on-going programme of work for the authors).

Comments

In some ways the findings are not surprising – that the middle class professionals and self-employed are over-represented on twitter, but if we are honest, we don’t know by how much, because of the factors mentioned above. It seems fairly likely that many of the people self-identifying on twitter as ‘actors’ and so on don’t do this as their main job, but we just can’t access this method by twitter alone.

Thus this research is a reminder that hyper reality is not more real than actual reality. In hyper-reality these people are actors, in actual reality, they are frustrated actors. This is an important distinction, and this alone could go some way to explaining why virtual worlds can be so much meaner than real-worlds.

This research also serves as a refreshing reminder of how traditional ‘terrestrial’ methods such as surveys are still required to ascertain the truth of the occupations and social class backgrounds of twitter users. As it stands if we left it to algorithms we’d end up with 25% of people bring incorrectly identified, which is a huge margin of error. If we leave these questions up to twitter, then we are left with a very misleading picture of ‘who tweets’ by social class background.

Having said this, it is quite possible for further rules to be developed and applied to algorithms which could increase the accuracy of automatic demographic data-mining.

How Old are Twitter Users?

‘Who Tweets’ is an interesting piece of recent research which attempts to determine some basic demographic characteristics of Twitter users, relying on nothing but the data provided by the users themselves in their twitter profiles.

Based on a sample of 1470 twitter profiles* in which users clearly stated** their age, the authors of ‘Who Tweets’ found that 93.9% of twitter users were under the age of 35. The full age-profile of twitter users (according to the ‘Who Tweets’/ COSMOS data) compared to the actual age profile taken from the UK Census is below:

The age profiles of Twitter users - really? — The age profiles of Twitter users – really?

Compare this to the Ipsos MORI Tech Tracker report for the third quarter of 2014 (which the above research draws on) which used face to face interviews based on a quota sample of 1000 people.

Ages of twitter users according to a face to face Mori Poll

Clearly this shows that only 67% of media users are under the age of 35, quite a discrepancy with the user-defined data!

The researchers note that:

‘We might… hypothesis that young people are more likely to profess their age in their profile data and that this would lead to an overestimation of the ‘youthfulness’ of the UK Twitter population. As this is a new and developing field we have no evidence to support this claim, but the following discussion and estimations should be treated cautiously.

Looking again at the results from the Technology Tracker study conducted by Ipsos MORI, nearly two thirds of Twitter users were under 35 years of age in Q3 of 2014 whereas our study clearly identifies 93.9% as being 35 or younger. There are two possible reasons for this. The first is that the older population is less likely to state their age on Twitter. The second is that the age distribution in the survey data is a function of sample bias (i.e. participants over the age of 35 in the survey were particularly tech-savvy). This discrepancy between elicited (traditional) and naturally occurring (new) forms of social data warrants further investigation…’

Comment

This comparison clearly shows how we get some very different data on a very basic question (‘what is the age distribution of twitter users’?) depending on the methods we use, but which is more valid? The Ipsos face to face poll is done every quarter, and it persistently yields results which are nothing like COSMOS, and it’s unlikely that you’re going to get a persistent ‘tech savy’ selection bias in every sample of over 35 year olds, so does that mean it’s a more accurate reflection of the age profile of Twitter users?

Interestingly the Ipsos data shows a definite drift to older users over time, it’d be interesting to know if more recent COSMOS data reflects this. More interestingly, the whole point of COSMOS is to provided us with more up to date, ‘live’ information – so where is it?!? Sort of ironic that the latest public reporting is already 12 months behind good old Ipsos –

Age profiles of Twitter users in final quarter of 2015 according to MORI

At the end of the day, I’m not going to be too harsh about the above ‘Who Tweets’ study, it is experimental, and many of the above projects are looking at the methodological limitations of this data. It would just be nice if they, err, got on with it a bit… come on Sociology, catch up!

One thing I am reasonably certain about is that the above comparison certainly shows the continued importance of terrestrial methods if we want demographic data.

Of course, one simple way of checking the accuracy of the COSMOS data is simply to do a face to face survey and ask people what there age is and whether they state this in their Twitter profiles, then again I’m sure they’ve thought of that… maybe in 2018 we’ll get a report?

*drawn from the Collaborative Online Social Media Observatory (COSMOS)

**there’s an interesting discussion of the rules applied to determine this in the ‘Who Tweets’ article.