twitter - ReviseSociology

New social norms revealed by Twitter data?

It is possible to analyse qualitative social media data to reveal social trends in attitudes.

Twitter recently released an analysis of the content of 4 billion tweets made over the past three years, from users based in the United States. (Source)

The fastest growing theme which Twitter users are talking about is ‘creator culture’, with people tweeting about products they create in order to sell to make a living…

They claim that the content of tweets reveal that the U.S. population has become increasingly interested in six major cultural themes over the last 4 years (from 2016 to 2020):

Tweets about ‘Creator Culture’ are up 462% – which includes tweets about creative currency, ‘hustle life’ and connecting through video.
One Planet tweets are up – 285% – includes tweets on the themes of the ethical self, sustainability, and clean corporations

Tweets about Well Being are up 225% – tweets about digital monitoring, holistic health and being well together
Tech Life tweets are up – 166% – on the topics of blended realities, future tech and ‘tech angst’.
‘My Identity’ tweets are up – 167% – fandom, gender redefined and ‘representing me’ are the main themes here.

Tweets about ‘Everyday Wonders’ are up 161% – a theme which includes DIY spirituality, awe of nature and cosmic fascination.

Sustainability is another large twitter conversation growth area.

The 2020 report by twitter (here) was produced for marketing purposes, but nonetheless reveals what twitter users are becoming increasingly interested in, and there are no real surprises here.

The report is broken down into several sections which include the nice infographics I’ve put up in this post, there are many more available in the reports.

Intuitively I’m not surprised to see any of the above trends emerging from this analysis – I’m sure that as a population as a whole, we are generally more interested in all of the above in 2020, compared to 2016.

The limitations of using Twitter data to reveal cultural trends

There may be a lot of data, but there are possible problems with representativeness – twitter users tend to be younger and more educated than the wider population. (Source).

There’s also a problem with the motivations behind the data being collected – this was done for marketing purposes, to be useful to companies wishing to advertise on Twitter – so this analysis wouldn’t show any more negative trends which may have been tweeted about.

A limitation of the way this data is published is that we’re not told the raw numbers – so we know how much more a particular trend is being tweeted about in percentages, but we don’t know about the actual numbers. Some of these may have started from a very low base in 2016, in which case a 250% increase in 4 years still wouldn’t be that signficant!

This analysis paints Twitter as a wholly positive place where people are full of wonder and fascination, and are creative and positive. In reality we all know there’s a darker side to Twitter!

Relevance to A-level Sociology?

Twitter data is a source of secondary qualitative data (public rather than private data) and so is relevant to the research methods part of the course.

Students really should be considering how valid, reliable and representative twitter data is in terms of what it can tell us about broader cultural, political and economic values.

You may well decide that it’s NOT a valid data source at all, but that’s fine as Twitter gives you something to be critical of, and being critical is all part of A-level sociology!

Censorship on Facebook and Twitter – supporting the dominant ideology?

Pluralists would argue that social media platforms such as Facebook and Twitter are just neutral sites through which anyone is free to express their opinion, however, Marxists would suggest that they work with governments and corporations to suppress views which go against the dominant ideology by censoring anything which challenges the mainstream capitalist world view.

The recent banning of the anti-media group from Facebook and Twitter seem to have gone further than just banning hate-groups and fake-news creators, and suggests support for the Marxist view of the media.

The banning of anti media from Facebook and Twitter

Anti-Media is an alternative news outlet that offers information that runs counter to the often pro-government narratives of traditional media outlets.

Their content is heavily critical of the current political system which they believe has heavily indebted ordinary people, and increasingly infringed on individual rights while expanding its reach and power.

They focus mainly on challenging unjust government corruption, oppression, and authority – criticising both ‘right’ and ‘left’ wing governments and part of their stated agenda is to awaken people from their passive subservience to big government and corporatism.

At their peak in 2016, Anti-media were reaching tens of millions of people per week, offering an alternative to the mainstream news, but their reach then declined, according to them, due to algorithmic changes following Trump coming to power.

Then in October 2018, the anti-media Facebook page was unpublished altogether, along with its Twitter feed shortly afterwards. A number of its employee’s twitter accounts were also suspended, including that of Carey Wedler, whose video about the issue, published on the censorship resistant site @dtube is well worth a watch.

NB her personal Twitter account was suspended without warning, and despite appealing this four months ago, she still hasn’t received a legitimate explanation of why her account was suspended.

The official Facebook and Twitter line was that they removed anti-media as part of a wider purge of “spam” and “fake accounts” that targeted users with the intent of misleading them, by trying to do such things as driving them to ad farms to profit.

HOWEVER, neither anti-media nor their employees did any of this, they were dedicated to evidence-based factual reporting without sensationalism

Along with Anti-Media, dozens of pro-freedom libertarian pages were also deleted during the Facebook and Twitter purge back in October 2018, pages such as:

“Police the Police,”

“Hemp,”
The Free Thought Project,

Some interesting analysis by Rolling Stone’s Matt Taibbi, who investigated these purges suggests that this isn’t about purging left or right political views, as both types of page and account were purged, but rather it was about censoring the following themes which are touched on by both ends of the political spectrum:

Anti-war content
Focus on police brutality and misuse of state power
Disinterest in two party politics

Relevance to A-level sociology

This seems to be a straightforward example of Facebook and Twitter censoring media content because of ideological reasons – anything which upsets the status quo by challenging the state too vociferously (whether from left or right ends of the political spectrum) and/ or draws attention to state violence is more likely to get censored.

In short, this suggests support for any perspective which argues that mainstream media companies collude to support the dominant ideology by gatekeeping out views which are hyper-critical of those in power.

If you don’t study the media, this is still yet more supporting evidence for Marxism in general, and relevant to Theory and Methods!

Sources

LA Times article

Rolling Stone article
Carey Wedler – The Purge is Here, on Minds

The Internet as an Object of Content Analysis

Websites, social media posts and similar virtual documents are all forms of secondary data, and thus amenable to both quantitative and qualitative content analysis.

global internet use stats.png — The sheer number of internet users creating online documents makes researching them a challenge

There are, however, many difficulties in using web sites as sources of content analysis. Following Scott’s (1990) four criteria of assessing the quality of documents, we need consider why a web site is constructed in the first place, whether it is there for commercial purposes, and whether it has a political motive.

In addition, we also need to consider the following potential problems of researching web sites:

Finding websites will probably require a search engine, and search engines only ever provide a selection of available web sites on a topic, and the sample they provide will be biased according to algorithm the engine uses to find its websites. It follows that use of more than one search engine is advisable.
Related to the above point, a search is only as good as the key words the researcher inputs into the search engines, and it could be time consuming to try out all possible words and combinations.
New web sites are continually appearing while old ones disappear. This means that by the time research is published, they may be based on web sites which no longer exist and not be applicable to the new ones which have emerged.

Similar to the above point, existing web sites are continually being updated.
The analysis of web sites is a new field which is very much in flux. New approaches are being developed at a rapid rate. Some draw on traditional ways of interpreting documents such as discourse analysis and qualitative content analysis, others have been developed specifically in relation to the Web, such as the examination of hyperlinks between websites and their significance.

Most researchers who use documents accept the fact that it can be difficult to determine the population from which they are sampling, and when researching documents online, the speed of development and change of the Web accentuate this problem. The experience of researching documents online can be like trying to hit a moving target that not only moves, but is in a constant state of metamorphosis.

Three examples of content analysis of documents online

Boepple and Thompson (2014) conducted quantitative analysis of 21 ‘healthy living blogs’. Their sampling frame was only blogs which had received an award, and from those, they selected the blogs with the largest number of page views.

They found that content emphasised appearance and disordered messages about food/ nutrition,with five bloggers using very negative language about being fat or overweight and four invoking admiration for being thin. They concluded that these blogs spread messages that are ‘potentially problematic’ for anyone changing their behaviour on the basis of advice contained in them.

Davis et al (2015) conducted an analysis of postings that followed a blog post concerning a cyberbullying suicide y a 15 year old named Amanda Todd. There were 1094 comments of which 482 contained stories about being bullied, 12% about cyberbullying, 75% about traditional bullying, the rest a mixture of both.

The research found that the main reason victims of bullying are targeted is because they do not conform in one way or another to society’s mainstream norms and values, with the most common specific reason for bullying being a victim’s physical appearance.

Humphries et al (2014) conducted content analysis on the kinds of personal information disclosed on Twitter. The authors collected an initial sample of users and they searched friends of this initial sample. In total the collected 101, ,069 tweets and took a random sample of 2100 tweets from this.

One of their findings was that Twitter users not only share information about themselves, they frequently share information about others too.

Concluding Thoughts

Researching documents online may be challenging, but it is difficult to see how sociologists can avoid it as more and more of our lives are lived out online, so researching documents such as web sites, and especially blogs and social media postings is, I think, very much set to become a growth area in social research.

Twitter Users by Occupation and Social Class

The middle classes and especially those in creative industries are more likely to be on twitter, but finding this out is more difficult than you might think, at least according to some recent research:

Who Tweets?: Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data

This post is a brief summary of the methods and findings of the above.

Introduction/ Context/ Big Data

90% of the world’s data has been generated in the past 2 years and the trend is apparently exponential, the key challenges of harnessing this data (known as the 5Vs: volume,veracity, velocity, variety and value) are not so easily overcome.

The primary criticism of such data is that it is there to be collected and analysed before the question is asked and, because of this, the data required to answer the research question may not be available with important information such as demographic characteristics being absent.

The sheer volume of data and its constant, flowing, locomotive nature provides an opportunity to take the ‘pulse of the world’ every second of the day rather than relying on punctiform and time-consuming terrestrial methods such as surveys. Only 1% of Twitter users in the UK amounts to around 150,000 users. Even a tiny kernel of ‘useful’ data can still amount to a sample bigger than some of the UK’s largest sample surveys

However, social media data sources are often considered to be ‘data-light’ as there is a paucity of demographic information on individual content producers.

Yet, as Savage and Burrows argue, sociology needs to respond to the emergence of these new data sources and investigate the ways in which they inform us of the social world. One response to this has been the development of using ‘signatures’ in social media as proxies for real world events and individual characteristics

This paper builds on this work conducted at the Collaborative Online Social Media Observatory (COSMOS),through proposing methods and processes for estimating two demographic variables: age and occupation (with associated class).

How Do Twitter Users Vary by Occupation and Social Class – Methods

The researchers used a sample 32, 032 twitter profiles collected by COSMOS, relying on the entry in the ‘profile’ box to uncover occupation and class background.

They took the occupation with the most number of words as the primary occupation, and, if multiple occupations are listed, they took the first occupation as the primary occupation.

They then randomly selected 1,000 cases out of the 32,032 to which an occupation was assigned and three expert coders visually inspected the results of 1000 twitter profiles in anticipation of inaccuracies and errors.

They found that 241 (so 24%) had been misclassified, with a high level of inter-rater reliability.

The main problems of identification stemmed from the multiple meanings of many words related to occupations, Hobbies, and with obscure occupations. For example, people might refer to themselves as a ‘Doctor Who fan’ or a ‘Dancer trapped in a software engineer’s body’.

So what is the class background of twitter users?

The table below shows you three different data sets – the class backgrounds as automatically derived from the entire COSMOS sample of profiles, the class background of the 32 000 sample the researcher used and the class backgrounds of the 1000 that were visually verified by the three expert coders (for comments on the differences see ‘validity problems’ below).

There is a clear over representation of NS-SEC 2 occupations in the data compared with the general UK population which may be explained by the confusion between occupations and hobbies and/or the use of Twitter to promote oneself or one’s work. NS-SEC 2 is where occupations such as ‘artist’, ‘singer’, ‘coach’, ‘dancer’ and ‘actor’ are located and the utility of the tool for identifying occupation for this group is further exacerbated by the fact that this is by far the most populous group for Twitter users and the largest group in the general UK population by 10% points. Alternatively, if the occupation of these individuals has been correctly classified then we can observe that they are over represented on Twitter by a factor of two when using Census data as a baseline measure.

Occupations such as ‘teacher’, ‘manager’ and ‘councillor’ are not likely to be hobbies but there is an unusually high representation of creative occupations which could also be pursued as leisure interests with 4% of people in the dataset claiming to be an ‘actor’, 3.5% an ‘artist’ and 3.5% a ‘writer’. An alternative explanation is that Twitter is used by people who work in the creative industries as a promotional tool.

Validity problems with the social-class demographics of twitter data

Interestingly, the researchers rejected the idea that people would just outright lie about their occupations noting that ‘previous research [has] indicated that identity-play and the adoption of alternative personas was often short-lived, with ‘real’ users’ identities becoming dominant in prolonged interactions. The exponential uptake of the Internet,beyond this particular group of early adopters,was accompanied with a shift in the presentation of self online resulting in a reduction in online identity-play’.

The COSMOS engine does automatically identify occupation, but it identifies occupation inaccurately – and the degree of inaccuracy varies with social class background. The researchers note:

‘unmodified occupation identification tool appears to be effective and accurate for NS-SEC groups in which occupational titles are unambiguous such as professions and skilled trades (NS-SEC 1,3,4 and 5). Where job titles are less clear or are synonymous with alternative activities (NS-SEC 2, 6 and7) the requirement for human validation becomes apparent as the context of the occupational term must betaken into account such as the difference between “I’m a dancer in a ballet company”and “I’m a dancer trapped in the body of a software engineer’.

The researchers note that the next step is to further validate their methodology through establishing the ground-truth via ascertaining the occupation of tweeters through alternative means, such as social surveys (an on-going programme of work for the authors).

Comments

In some ways the findings are not surprising – that the middle class professionals and self-employed are over-represented on twitter, but if we are honest, we don’t know by how much, because of the factors mentioned above. It seems fairly likely that many of the people self-identifying on twitter as ‘actors’ and so on don’t do this as their main job, but we just can’t access this method by twitter alone.

Thus this research is a reminder that hyper reality is not more real than actual reality. In hyper-reality these people are actors, in actual reality, they are frustrated actors. This is an important distinction, and this alone could go some way to explaining why virtual worlds can be so much meaner than real-worlds.

This research also serves as a refreshing reminder of how traditional ‘terrestrial’ methods such as surveys are still required to ascertain the truth of the occupations and social class backgrounds of twitter users. As it stands if we left it to algorithms we’d end up with 25% of people bring incorrectly identified, which is a huge margin of error. If we leave these questions up to twitter, then we are left with a very misleading picture of ‘who tweets’ by social class background.

Having said this, it is quite possible for further rules to be developed and applied to algorithms which could increase the accuracy of automatic demographic data-mining.

How Old are Twitter Users?

‘Who Tweets’ is an interesting piece of recent research which attempts to determine some basic demographic characteristics of Twitter users, relying on nothing but the data provided by the users themselves in their twitter profiles.

Based on a sample of 1470 twitter profiles* in which users clearly stated** their age, the authors of ‘Who Tweets’ found that 93.9% of twitter users were under the age of 35. The full age-profile of twitter users (according to the ‘Who Tweets’/ COSMOS data) compared to the actual age profile taken from the UK Census is below:

The age profiles of Twitter users - really? — The age profiles of Twitter users – really?

Compare this to the Ipsos MORI Tech Tracker report for the third quarter of 2014 (which the above research draws on) which used face to face interviews based on a quota sample of 1000 people.

Ages of twitter users according to a face to face Mori Poll

Clearly this shows that only 67% of media users are under the age of 35, quite a discrepancy with the user-defined data!

The researchers note that:

‘We might… hypothesis that young people are more likely to profess their age in their profile data and that this would lead to an overestimation of the ‘youthfulness’ of the UK Twitter population. As this is a new and developing field we have no evidence to support this claim, but the following discussion and estimations should be treated cautiously.

Looking again at the results from the Technology Tracker study conducted by Ipsos MORI, nearly two thirds of Twitter users were under 35 years of age in Q3 of 2014 whereas our study clearly identifies 93.9% as being 35 or younger. There are two possible reasons for this. The first is that the older population is less likely to state their age on Twitter. The second is that the age distribution in the survey data is a function of sample bias (i.e. participants over the age of 35 in the survey were particularly tech-savvy). This discrepancy between elicited (traditional) and naturally occurring (new) forms of social data warrants further investigation…’

Comment

This comparison clearly shows how we get some very different data on a very basic question (‘what is the age distribution of twitter users’?) depending on the methods we use, but which is more valid? The Ipsos face to face poll is done every quarter, and it persistently yields results which are nothing like COSMOS, and it’s unlikely that you’re going to get a persistent ‘tech savy’ selection bias in every sample of over 35 year olds, so does that mean it’s a more accurate reflection of the age profile of Twitter users?

Interestingly the Ipsos data shows a definite drift to older users over time, it’d be interesting to know if more recent COSMOS data reflects this. More interestingly, the whole point of COSMOS is to provided us with more up to date, ‘live’ information – so where is it?!? Sort of ironic that the latest public reporting is already 12 months behind good old Ipsos –

Age profiles of Twitter users in final quarter of 2015 according to MORI

At the end of the day, I’m not going to be too harsh about the above ‘Who Tweets’ study, it is experimental, and many of the above projects are looking at the methodological limitations of this data. It would just be nice if they, err, got on with it a bit… come on Sociology, catch up!

One thing I am reasonably certain about is that the above comparison certainly shows the continued importance of terrestrial methods if we want demographic data.

Of course, one simple way of checking the accuracy of the COSMOS data is simply to do a face to face survey and ask people what there age is and whether they state this in their Twitter profiles, then again I’m sure they’ve thought of that… maybe in 2018 we’ll get a report?

*drawn from the Collaborative Online Social Media Observatory (COSMOS)

**there’s an interesting discussion of the rules applied to determine this in the ‘Who Tweets’ article.