There are, however, many difficulties in using web sites as sources of content analysis. Following Scott’s (1990) four criteria of assessing the quality of documents, we need consider why a web site is constructed in the first place, whether it is there for commercial purposes, and whether it has a political motive.
In addition, we also need to consider the following potential problems of researching web sites:
- Finding websites will probably require a search engine, and search engines only ever provide a selection of available web sites on a topic, and the sample they provide will be biased according to algorithm the engine uses to find its websites. It follows that use of more than one search engine is advisable.
- Related to the above point, a search is only as good as the key words the researcher inputs into the search engines, and it could be time consuming to try out all possible words and combinations.
- New web sites are continually appearing while old ones disappear. This means that by the time research is published, they may be based on web sites which no longer exist and not be applicable to the new ones which have emerged.
- Similar to the above point, existing web sites are continually being updated.
- The analysis of web sites is a new field which is very much in flux. New approaches are being developed at a rapid rate. Some draw on traditional ways of interpreting documents such as discourse analysis and qualitative content analysis, others have been developed specifically in relation to the Web, such as the examination of hyperlinks between websites and their significance.
Most researchers who use documents accept the fact that it can be difficult to determine the population from which they are sampling, and when researching documents online, the speed of development and change of the Web accentuate this problem. The experience of researching documents online can be like trying to hit a moving target that not only moves, but is in a constant state of metamorphosis.
Three examples of content analysis of documents online
Boepple and Thompson (2014) conducted quantitative analysis of 21 ‘healthy living blogs’. Their sampling frame was only blogs which had received an award, and from those, they selected the blogs with the largest number of page views.
They found that content emphasised appearance and disordered messages about food/ nutrition,with five bloggers using very negative language about being fat or overweight and four invoking admiration for being thin. They concluded that these blogs spread messages that are ‘potentially problematic’ for anyone changing their behaviour on the basis of advice contained in them.
Davis et al (2015) conducted an analysis of postings that followed a blog post concerning a cyberbullying suicide y a 15 year old named Amanda Todd. There were 1094 comments of which 482 contained stories about being bullied, 12% about cyberbullying, 75% about traditional bullying, the rest a mixture of both.
The research found that the main reason victims of bullying are targeted is because they do not conform in one way or another to society’s mainstream norms and values, with the most common specific reason for bullying being a victim’s physical appearance.
Humphries et al (2014) conducted content analysis on the kinds of personal information disclosed on Twitter. The authors collected an initial sample of users and they searched friends of this initial sample. In total the collected 101, ,069 tweets and took a random sample of 2100 tweets from this.
One of their findings was that Twitter users not only share information about themselves, they frequently share information about others too.
Researching documents online may be challenging, but it is difficult to see how sociologists can avoid it as more and more of our lives are lived out online, so researching documents such as web sites, and especially blogs and social media postings is, I think, very much set to become a growth area in social research.