Donald Trump’s recent retweets of inflammatory anti-Muslim videos posted by the far right group ‘Britain First’ sparked outrage last week, a row which intensified when Theresa May said it was wrong for him to do so, which in turn prompted a twitter rebuke from Donald Trump in which he said suggested she should be focusing on the destructive Radical Islam in the UK rather than criticizing him.
The videos purported to show Muslims pushing a boy off a roof, destroying a statue of the Virgin Mary, and beating a boy on crutches.
Trump’s tweets prompted The Guardian to suggest that his proposed state visit should be cancelled, because it would be inappropriate to extend such a formal welcome to such a racist bigot
However, the Daily Mail points out that state visits have been extended to all sorts of immoral characters in the past – such as Nicolae Ceausescu of Romania and Robert Mugabe of Zimbabwe.
NB IMO the above statement from the Daily Mail is a great example of something which isn’t (technically) an argument) – that we shouldn’t cancel a proposed state visit because we have a tradition of setting a low-ethical bar for people invited to past state visits isn’t a rational reason for not changing current policy – it’s an irrational appeal to tradition/ emotion, thus not logical, thus not an argument.
According to Max Hastings in the Daily Mail, Trump’s tweets also reveal something about the ‘special relationship’ between Britain and the USA – namely that the UK likes to flatter itself that there is one, but the reality is that this special relationship never actually amounts to much in terms of the USA doing anything for the UK… This might be a warning about not relying on the USA as one of our post-Brexit saviors.
As to why Trump posted those tweets, besides being an impetuous Racist, there may have been a self-interested political motive – these tweets may have been aimed at his own far-right American audience…. and he needs their support for his ‘Mexican Wall’ project.
So, all in all, as shocking as Trump’s Tweets were in terms of their revealing his horrible racism, the deeper-reality behind the tweets is even worse…
OK OK I know this is a shameless ‘plug my online constructed self page’, but I’ve just spent two hours consolidating my social media profiles, so that’s it for today folks!
It’s probably worth noting that this blog is the main ‘hub site’ I use to post stuff: and most of the other sites are just what I use to publicize what I post on this blog – so cycling through all the above sites will give your the most wonderful feeling of an inward looking cycle of self-referentiality.
Also this is something of an experiment with the ‘contact details’ part of my C.V. which I’m currently trying to reinvent for the 21st century, and that is honestly about as much fun as it sounds!
Websites, social media posts and similar virtual documents are all forms of secondary data, and thus amenable to both quantitative and qualitative content analysis.
There are, however, many difficulties in using web sites as sources of content analysis. Following Scott’s (1990) four criteria of assessing the quality of documents, we need consider why a web site is constructed in the first place, whether it is there for commercial purposes, and whether it has a political motive.
In addition, we also need to consider the following potential problems of researching web sites:
Finding websites will probably require a search engine, and search engines only ever provide a selection of available web sites on a topic, and the sample they provide will be biased according to algorithm the engine uses to find its websites. It follows that use of more than one search engine is advisable.
Related to the above point, a search is only as good as the key words the researcher inputs into the search engines, and it could be time consuming to try out all possible words and combinations.
New web sites are continually appearing while old ones disappear. This means that by the time research is published, they may be based on web sites which no longer exist and not be applicable to the new ones which have emerged.
Similar to the above point, existing web sites are continually being updated.
The analysis of web sites is a new field which is very much in flux. New approaches are being developed at a rapid rate. Some draw on traditional ways of interpreting documents such as discourse analysis and qualitative content analysis, others have been developed specifically in relation to the Web, such as the examination of hyperlinks between websites and their significance.
Most researchers who use documents accept the fact that it can be difficult to determine the population from which they are sampling, and when researching documents online, the speed of development and change of the Web accentuate this problem. The experience of researching documents online can be like trying to hit a moving target that not only moves, but is in a constant state of metamorphosis.
Three examples of content analysis of documents online
Boepple and Thompson (2014) conducted quantitative analysis of 21 ‘healthy living blogs’. Their sampling frame was only blogs which had received an award, and from those, they selected the blogs with the largest number of page views.
They found that content emphasised appearance and disordered messages about food/ nutrition,with five bloggers using very negative language about being fat or overweight and four invoking admiration for being thin. They concluded that these blogs spread messages that are ‘potentially problematic’ for anyone changing their behaviour on the basis of advice contained in them.
Davis et al (2015) conducted an analysis of postings that followed a blog post concerning a cyberbullying suicide y a 15 year old named Amanda Todd. There were 1094 comments of which 482 contained stories about being bullied, 12% about cyberbullying, 75% about traditional bullying, the rest a mixture of both.
The research found that the main reason victims of bullying are targeted is because they do not conform in one way or another to society’s mainstream norms and values, with the most common specific reason for bullying being a victim’s physical appearance.
Humphries et al (2014) conducted content analysis on the kinds of personal information disclosed on Twitter. The authors collected an initial sample of users and they searched friends of this initial sample. In total the collected 101, ,069 tweets and took a random sample of 2100 tweets from this.
One of their findings was that Twitter users not only share information about themselves, they frequently share information about others too.
Researching documents online may be challenging, but it is difficult to see how sociologists can avoid it as more and more of our lives are lived out online, so researching documents such as web sites, and especially blogs and social media postings is, I think, very much set to become a growth area in social research.
The middle classes and especially those in creative industries are more likely to be on twitter, but finding this out is more difficult than you might think, at least according to some recent research:
Who Tweets?: Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data
This post is a brief summary of the methods and findings of the above.
Introduction/ Context/ Big Data
90% of the world’s data has been generated in the past 2 years and the trend is apparently exponential, the key challenges of harnessing this data (known as the 5Vs: volume,veracity, velocity, variety and value) are not so easily overcome.
The primary criticism of such data is that it is there to be collected and analysed before the question is asked and, because of this, the data required to answer the research question may not be available with important information such as demographic characteristics being absent.
The sheer volume of data and its constant, flowing, locomotive nature provides an opportunity to take the ‘pulse of the world’ every second of the day rather than relying on punctiform and time-consuming terrestrial methods such as surveys. Only 1% of Twitter users in the UK amounts to around 150,000 users. Even a tiny kernel of ‘useful’ data can still amount to a sample bigger than some of the UK’s largest sample surveys
However, social media data sources are often considered to be ‘data-light’ as there is a paucity of demographic information on individual content producers.
Yet, as Savage and Burrows argue, sociology needs to respond to the emergence of these new data sources and investigate the ways in which they inform us of the social world. One response to this has been the development of using ‘signatures’ in social media as proxies for real world events and individual characteristics
This paper builds on this work conducted at the Collaborative Online Social Media Observatory (COSMOS),through proposing methods and processes for estimating two demographic variables: age and occupation (with associated class).
How Do Twitter Users Vary by Occupation and Social Class – Methods
The researchers used a sample 32, 032 twitter profiles collected by COSMOS, relying on the entry in the ‘profile’ box to uncover occupation and class background.
They took the occupation with the most number of words as the primary occupation, and, if multiple occupations are listed, they took the first occupation as the primary occupation.
They then randomly selected 1,000 cases out of the 32,032 to which an occupation was assigned and three expert coders visually inspected the results of 1000 twitter profiles in anticipation of inaccuracies and errors.
They found that 241 (so 24%) had been misclassified, with a high level of inter-rater reliability.
The main problems of identification stemmed from the multiple meanings of many words related to occupations, Hobbies, and with obscure occupations. For example, people might refer to themselves as a ‘Doctor Who fan’ or a ‘Dancer trapped in a software engineer’s body’.
So what is the class background of twitter users?
The table below shows you three different data sets – the class backgrounds as automatically derived from the entire COSMOS sample of profiles, the class background of the 32 000 sample the researcher used and the class backgrounds of the 1000 that were visually verified by the three expert coders (for comments on the differences see ‘validity problems’ below).
There is a clear over representation of NS-SEC 2 occupations in the data compared with the general UK population which may be explained by the confusion between occupations and hobbies and/or the use of Twitter to promote oneself or one’s work. NS-SEC 2 is where occupations such as ‘artist’, ‘singer’, ‘coach’, ‘dancer’ and ‘actor’ are located and the utility of the tool for identifying occupation for this group is further exacerbated by the fact that this is by far the most populous group for Twitter users and the largest group in the general UK population by 10% points. Alternatively, if the occupation of these individuals has been correctly classified then we can observe that they are over represented on Twitter by a factor of two when using Census data as a baseline measure.
Occupations such as ‘teacher’, ‘manager’ and ‘councillor’ are not likely to be hobbies but there is an unusually high representation of creative occupations which could also be pursued as leisure interests with 4% of people in the dataset claiming to be an ‘actor’, 3.5% an ‘artist’ and 3.5% a ‘writer’. An alternative explanation is that Twitter is used by people who work in the creative industries as a promotional tool.
Validity problems with the social-class demographics of twitter data
Interestingly, the researchers rejected the idea that people would just outright lie about their occupations noting that ‘previous research [has] indicated that identity-play and the adoption of alternative personas was often short-lived, with ‘real’ users’ identities becoming dominant in prolonged interactions. The exponential uptake of the Internet,beyond this particular group of early adopters,was accompanied with a shift in the presentation of self online resulting in a reduction in online identity-play’.
The COSMOS engine does automatically identify occupation, but it identifies occupation inaccurately – and the degree of inaccuracy varies with social class background. The researchers note:
‘unmodified occupation identification tool appears to be effective and accurate for NS-SEC groups in which occupational titles are unambiguous such as professions and skilled trades (NS-SEC 1,3,4 and 5). Where job titles are less clear or are synonymous with alternative activities (NS-SEC 2, 6 and7) the requirement for human validation becomes apparent as the context of the occupational term must betaken into account such as the difference between “I’m a dancer in a ballet company”and “I’m a dancer trapped in the body of a software engineer’.
The researchers note that the next step is to further validate their methodology through establishing the ground-truth via ascertaining the occupation of tweeters through alternative means, such as social surveys (an on-going programme of work for the authors).
In some ways the findings are not surprising – that the middle class professionals and self-employed are over-represented on twitter, but if we are honest, we don’t know by how much, because of the factors mentioned above. It seems fairly likely that many of the people self-identifying on twitter as ‘actors’ and so on don’t do this as their main job, but we just can’t access this method by twitter alone.
Thus this research is a reminder that hyper reality is not more real than actual reality. In hyper-reality these people are actors, in actual reality, they are frustrated actors. This is an important distinction, and this alone could go some way to explaining why virtual worlds can be so much meaner than real-worlds.
This research also serves as a refreshing reminder of how traditional ‘terrestrial’ methods such as surveys are still required to ascertain the truth of the occupations and social class backgrounds of twitter users. As it stands if we left it to algorithms we’d end up with 25% of people bring incorrectly identified, which is a huge margin of error. If we leave these questions up to twitter, then we are left with a very misleading picture of ‘who tweets’ by social class background.
Having said this, it is quite possible for further rules to be developed and applied to algorithms which could increase the accuracy of automatic demographic data-mining.
‘Who Tweets’ is an interesting piece of recent research which attempts to determine some basic demographic characteristics of Twitter users, relying on nothing but the data provided by the users themselves in their twitter profiles.
Based on a sample of 1470 twitter profiles* in which users clearly stated** their age, the authors of ‘Who Tweets’ found that 93.9% of twitter users were under the age of 35. The full age-profile of twitter users (according to the ‘Who Tweets’/ COSMOS data) compared to the actual age profile taken from the UK Census is below:
Compare this to the Ipsos MORI Tech Tracker report for the third quarter of 2014 (which the above research draws on) which used face to face interviews based on a quota sample of 1000 people.
Clearly this shows that only 67% of media users are under the age of 35, quite a discrepancy with the user-defined data!
The researchers note that:
‘We might… hypothesis that young people are more likely to profess their age in their profile data and that this would lead to an overestimation of the ‘youthfulness’ of the UK Twitter population. As this is a new and developing field we have no evidence to support this claim, but the following discussion and estimations should be treated cautiously.
Looking again at the results from the Technology Tracker study conducted by Ipsos MORI, nearly two thirds of Twitter users were under 35 years of age in Q3 of 2014 whereas our study clearly identifies 93.9% as being 35 or younger. There are two possible reasons for this. The first is that the older population is less likely to state their age on Twitter. The second is that the age distribution in the survey data is a function of sample bias (i.e. participants over the age of 35 in the survey were particularly tech-savvy). This discrepancy between elicited (traditional) and naturally occurring (new) forms of social data warrants further investigation…’
This comparison clearly shows how we get some very different data on a very basic question (‘what is the age distribution of twitter users’?) depending on the methods we use, but which is more valid? The Ipsos face to face poll is done every quarter, and it persistently yields results which are nothing like COSMOS, and it’s unlikely that you’re going to get a persistent ‘tech savy’ selection bias in every sample of over 35 year olds, so does that mean it’s a more accurate reflection of the age profile of Twitter users?
Interestingly the Ipsos data shows a definite drift to older users over time, it’d be interesting to know if more recent COSMOS data reflects this. More interestingly, the whole point of COSMOS is to provided us with more up to date, ‘live’ information – so where is it?!? Sort of ironic that the latest public reporting is already 12 months behind good old Ipsos –
At the end of the day, I’m not going to be too harsh about the above ‘Who Tweets’ study, it is experimental, and many of the above projects are looking at the methodological limitations of this data. It would just be nice if they, err, got on with it a bit… come on Sociology, catch up!
One thing I am reasonably certain about is that the above comparison certainly shows the continued importance of terrestrial methods if we want demographic data.
Of course, one simple way of checking the accuracy of the COSMOS data is simply to do a face to face survey and ask people what there age is and whether they state this in their Twitter profiles, then again I’m sure they’ve thought of that… maybe in 2018 we’ll get a report?