Univariate Analysis in Quantitative Social Research

Univariate analysis reviews one variable at a time and typically uses frequency tables and diagrams like bar charts and pie charts. Measures of central tendency and dispersion are tools for analyzing data, with central tendency often involving the mean, median, or mode while dispersion relies on range and standard deviation. Understanding these statistical methods aids in the comprehension of data distribution in areas of interest such as wealth statistics.

Univariate analysis refers to the analysis of one variable at a time.

The most common approaches are:

  • Frequency tables.
  • Diagrams: bar charts, histograms, pie charts.
  • Measures of central tendency: mean, median, mode.
  • Measures of dispersion: range and standard deviation.

Frequency Tables

A frequency table provides the number of cases and the percentages belonging to each of the categories for a variable. Frequency tables can be used for all the different types of variable.

Below is a simple example of a frequency table showing the number of schools in three different categories of the ‘type of school’ variable for the 2022-2023 academic year. I rounded the percentages below.

Type of schoolNumber of schoolsPercent
Local Authority1185848
Academy 1017642
Independent 240810
Total24442100
Number of schools by type of school, England and Wales 2022-23.

Analysts usually clean from raw data to make frequency tables so people can understand and visualise them more easily.

Frequency tables are the starting point for generating diagrams which put the data into visual form making trends stand out.

Diagrams

Diagrams representing quantitative data in visual form to make data easier to understand and interpret. Bar charts and pie charts are two of the most commonly used visual representations of quantitative data.

Bar charts

The chart below shows the same data as in the frequency table above. Each bar represents one of the three school types.

The bar chart below shows the largest category is Local Authority (LA) maintained schools with academies the second largest category. You can also see there are relatively few independent schools.

Bar chart showing different school types in England and Wales.

Pie Charts

The main advantaged of a pie chart is that you can see the proportion of each category in relation to the total. A pie chart shows this sense of relation to the whole more clearly than a bar chart.

For example you can clearly see below that LA Maintained schools make up nearly 50% of the total. This doesn’t stand out as much in the bar chart.

You can also see that Independent schools represent around 10% of schools from the pie chart.

Pie chart showing number of LA maintained schools, academies and independent schools in England and Wales.

Frequency tables and diagrams: final thoughts

Diagrams are useful to make frequency tables easer to understand.

Bar charts are more useful when you want to look at proportions in relation to each other. Pie charts are more useful when you want to look at proportions in relation to the whole.

Keep in mind however that charts are only as useful as the data. For example, one limitation with the above data is that it tells you nothing about pupil numbers, only school numbers!

Sources

Gov.UK (accessed July 2023) Schools, Pupils and their Characteristics 2022-23.

Measures of Central Tendency

Measures of central tendency encapsulate in one figure a value which is typical for a distribution of values. In effect, we are seeking out an average for a distribution.

Quantitative social research analysts recognise three different forms of average:

  • mean
  • median
  • mode
the difference between mean, median and mode shown in a bar chart.
Diagram 1: Mean, median and mode for a random distribution of ages.

Arithmetic mean

The mean is the sum of all values in a distribution divided by the number of values.

In diagram one above, we add ALL the ages together and divide by 20 which is the total number of ages in the sample. This gives us a mean of 51.6.

The mean should be applied to interval/ ratio variables. It can also be applied to ordinal variables too.

Median

The median is the mid-point in a distribution of values. We arrive at the median by lining up all the values smallest to largest and then finding the middle value.

Whereas the mean is vulnerable to outliers which are extreme values at either end of the distribution. Outliers can greatly increase or decrease the mean, but they have much less of an affect on the median.

We see this in diagram one above, where the median point is 45.5, considerably lower than the mean of 51.6. In the case above the mean is higher because the oldest four people skew the mean average upwards. The four oldest are a lot older than the people in the middle, compared to the average ages of the rest of population.

The median can be used in relation to both interval/ratio and ordinal variables.

Mode

The mode is simply the value that occurs most frequently in a distribution. The mode can be applied to all types of variable.

In the diagram above, the mode is 28, because that is the only age which occurs twice.

Median more useful than the Mean?

With social data it is often more useful to know the median rather than the mean. This is especially true with wealth statistics in the UK.

Wealth and income distribution are of special interest to sociologists, because there is a lot of variation in distribution. Neither wealth nor income are equally distributed. Understanding how they are distributed has significant implications for life chances and social policy.

raw data showing UK wealth distribution
Table showing household wealth distribution in the UK by decile, 2018 to 2020.

Visualising the total wealth in a bar chart looks like this:

Bar chart showing UK wealth distribution 2018-2020.

Here you can clearly see a skew towards the top two deciles, especially the first decile. The richest 10% of households have an average of almost £2 million in wealth, which 8 times more than even the 4th decile.

In cases where there is a lot of variation in data, in terms of a large skew showing up at one end, as above, then get the mean and median being very different.

in the chart above the mean is £489 000, pulled up by the huge relative wealth of the top 20%.

The median wealth is only £280 000 and 50% of people have less than this.

Mean wealth in the UK gives you a misleading picture of the amount of wealth most people in the UK have!

Sources

ONS: Household Wealth in the UK, 2018-2022.

Measures of Dispersion

Measures of dispersion show the variation in a distribution.

Two measures of dispersion include:

  • the range (the simplest)
  • the standard variation.

Range

The range of data is the distance between minimum and maximum values in a distribution. Like the mean, outliers can greatly affect the range.

The range of household wealth (grouped by decile) in the UK is £1.9 Million (see chart below).

This is a very simple measure which doesn’t tell us vary much about how much wealth ordinary people.

For example it doesn’t tell us that the top decile of households are almost twice as wealthy as the next decile down.

Standard Deviation

We calculate the standard deviation by taking the difference in each value in a distribution from the mean and then dividing the total of the differences by the number of values.

The standard deviation is the average amount of deviation around the mean.

For example, the standard deviation of wealth in the UK (grouped by decile) is £575 211.

Outliers don’t affect the standard deviation as much as the range. The impact of outliers on the standard deviation is offset by dividing by the number of values.

Box Plots

Box plots are popular for showing dispersion for interval/ratio variables.

The box plot provides an indication of both the central tendency (median) and dispersion (outliers).

The box plot of wealth below treats the top richest decile as an outlier. It clearly shows you the skew is the top.

The box shows you where the middle 50% of households sit: between £800 000 and £50 000.

The line in the box shows you the median value of household wealth: £280 000.

Box plot of UK wealth.
Box plot of wealth, UK 2018-2020

The shape of a box plot will vary depending on whether cases tend to be high or low in relation to the median. They show us whether there is more or less variation above or below the median.

Sources

ONS: Household Wealth in the UK, 2018-2022.

Boxplot generator.

Signposting and related posts

This material is most relevant to the Research Methods module. It might be a little advanced for A-level sociology. You are more likely to need this during a first year university statistical methods course.

To return to the homepage – revisesociology.com

How do Londoners feel about the ULEZ charge?

The Ultra Low Emission Zone (ULEZ) is going to be expanded to outer London boroughs on 31st August 2023. Anyone not driving a low emission vehicle will have to pay £10 a day to drive in those areas.

This will affect around 15% of car drivers and almost 50% of van drivers. These are primarily people driving older vehicles. (1)

Labour London mayor Sadiq Khan is enacting ULEZ. It has been blamed for Labour losing the recent Uxbridge by-election by just 500 votes.

Ironically for Labour this is a policy which affects the poor disproportionately. Anyone who could afford to buy a newer, lower emission vehicle would buy one, and drive it with no penalty.

Unfortunately for the environment and younger people this issue has become a battle ground for the coming 2024 election.

The Tories are now thinking of scrapping commitments to ULEZs in order to win votes. They are thinking of trashing environmental policies to try and stay in power.

But what do Londoners actually think about this issue?

What do Londoners think of Ultra Low Emission Zones?

There have been several polls on this issue. The results differ depending on how the questions about ULEZ are framed.

If you ask a question purely about ULEZ, then most people support the expansion. However, if you include reference to charges in relation to ULEZ then most people are against it.

This is a useful example of framing bias in social surveys.

ULEZ Survey, charges not mentioned

Question: “To tackle air pollution in the capital, the Mayor of London and Transport for London are proposing to expand the Ultra Low Emission Zone (ULEZ) London-wide. The proposed implementation date for this is 29th August 2023. Which, if any, of the following comes closest to your view?”

Support for ULEZ

ULEZ Survey: charges mentioned

Question: “To tackle air pollution, some places across the UK like London, are introducing charges in Low Emission Zones (sometimes called Ultra Low Emission Zones or ULEZ) where those who drive the more polluting cars or vehicles have to pay a fee or charge to drive into these areas. Would you support or oppose a similar ULEZ charge in your local area? “

Opposition to ULEZ

Attitudes to ULEZ Conclusions

I’d be inclined to say the question which mentions charges is a more accurate reflection of public opinion. This is because it includes more specific details so people can provide a more informed response.

We can also see from the above that lower social classes are more likely to be against ULEZ. This makes sense because these are the people who can’t afford to buy newer vehicles.

We also see that younger people are more in favour, which reflects attitudes to the environment more generally.

It is shameful that the Tories are prepared to use the environment as a political tool. They are sacrificing the future of younger people to win a few more seats in the hope of staying in power.

The fact that they are prepared to do this shows us they are no longer worthy to govern.

Sources AND signposting

This is mainly relevant to the social research methods topic. It is a good example of how social surveys are more useful with more detailed questions.

(1) On London (July 2023) How do Londoners really feel about Sadiq Khan’s ULEZ expansion scheme?

Cross National Comparison Research Task

cross national comparisons are a useful way for students to learn more about the strengths and limitations of quantitative data and positivist methods.

Below is a task students of A-level sociology can usefully do to give them a feel for doing Cross National Research.

The main aim of this research task is to illustrate some of the strengths and limitations of doing cross national comparisons.

Cross National comparisons are one of the main methods used by positivists and so doing this will help to get students thinking like positivists!

Select any one of the questions below and use the resources nuder the relevant headings below to explore these questions

  1. Why are some countries richer than others?
  2. Why do some countries have higher levels of gender equality than others?
  3. Why do some countries perform better in the PISA tests than others?
  4. Why are some countries happier than others?
  5. Why are some countries more peaceful than others?

Why are some countries richer than others?

This is a list of countries by Gross National Income per Capita, provided by the World Bank. The countries should appear listed in order.

Look at the top 10 countries, the bottom 10 countries, and look at ten in the middle.

NB you may need to screen out certain odd countries (such as those which are Islands with very small populations for example!)

Using your own knowledge, and further research on these countries if necessary, try to find out if any of the above three groups (top 10, middle 10, bottom 10) have anything in common.

Can you come up with theory for why rich countries are rich and poor countries are poor?

Why do some countries have higher levels of gender equality than others?

Go to the World Economic Forum’s Global Gender Gap Report, 2023.

Look at the top 10 countries, the bottom 10 countries, and look at ten in the middle.

NB you may need to screen out certain odd countries (such as those which are Islands with very small populations for example!)

Using your own knowledge, and further research on these countries if necessary, try to find out if any of the above three groups (top 10, middle 10, bottom 10) have anything in common.

Can you come up with theory for why some countries are more gender equal than others?

Why do some countries perform better in the PISA tests than others?

The Programme for International Student Assessment assess students from dozens of countries in their ability in maths, reading and science. All students do the same test and so we get national league tables as a result.

This is the hub page for the 2018 PISA results (results are only released every four years). Have a look at the countries at the top of the league tables compared to those at the bottom – can you think of a theory for why students in some countries do better than students others?

Why are some countries happier than others?

This is a link to the World Happiness Report 2023.

Can you think of a theory for why people in some countries report higher levels of happiness than people in others?

Why are some countries more peaceful than others?

The Global Peace Index uses around 30 indicators to measure how peaceful countries are and reports every year. This is a link to the 2023 Peacefulness results.

table showing top ten most peaceful countries in 2023

Can you think of a theory which explains why countries such as Iceland are at the top, which countries such as Afghanistan are at the bottom?

Try to think of why some countries might be more prone to war and conflict than others.

Please click here to return to the homepage – ReviseSociology.com

Variables in quantitative reserach

What is the difference between interval/ ratio, ordinal, nominal and categorical variables? This post answers this question!

Interval/ ratio variables

Where the distances between the categories are identical across the range of categories.

For example, in question 2, the age intervals go up in years, and the distance between the years is same between every interval.

Interval/ ratio variables are regarded as the highest level of measurement because they permit a wider variety of statistical analyses to be conducted.

There is also a difference between interval and ratio variables… the later have a fixed zero point.

Ordinal variables

These are variables that can be rank ordered but the distances between the categories are not equal across the range. For example, in question 6, the periods can be ranked, but the distances between the categories are not equal.

NB if you choose to group an interval variable like age in question 2 into groups (e.g. 20 and under, 21-30, 31-40 and so on) you are converting it into an ordinal variable.

Nominal or categorical variables

These consist of categories that cannot be rank ordered. For example, in questions 7-9, it is not possible to rank subjective responses of respondents here into an order.

Dichotomous variables

These variables contain data that have only two categories – e.g. ‘male’ and ‘female’. Their relationship to the other types of variable is slightly ambiguous. In the case of question one, this dichotomous variable is also a categorical variable. However, some dichotomous variables may be ordinal variables as they could have one distinct interval between responses – e.g. a question might ask ‘have you ever heard of Karl Marx’ – a yes response could be regarded as higher in rank order to a no response.

Multiple-indicator measure such as Likert Scales provide strictly speaking ordinal variables, however, many writers argue they can be treated as though they produce interval/ ratio variables, if they generate large number of categories.

In fact Bryman and Cramer (2011) make a distinction between ‘true’ interval/ ratio variables and those generated by Likert Scales.

A flow chart to help define variables

*A nominal variable – aka categorical variable! 

Questionnaire Example 

This section deals with how different types of question in a questionnaire can be designed to yield different types of variable in the responses from respondents.

If you look at the example of a questionnaire below, you will notice that the information you receive varies by question

Some of the questions ask for answers in terms of real numbers, such as question 2 which asks ‘how old are you’ or questions 4 and 5 and 6 which asks students how many hours a day they spend doing sociology class work and homework. These will yield interval variables.

Some of the questions ask for either/ or answers or yes/ no answers and are thus in the form of dichotomies. For example, question 1 asks ‘are you male or female’ and question 10 asks students to respond ‘yes’ or ‘no’ to whether they intend to study sociology at university. These will yield dichotomous variables.

The rest of the questions ask the respondent to select from lists of categories:

The responses to some of these list questions can be rank ordered – for example in question 6, once a day is clearly more than once a month! Responses to these questions will yield ordinal variables. 

Some other ‘categorical list’ questions yield responses which cannot be ranked in order – for example it is impossible to say that studying sociology because you find it generally interesting is ranked higher than studying it because it fits in with your career goals.  These will yield categorical variables.

These different types of response correspond to the four main types of variable above.