Univariate analysis refers to the analysis of one variable at a time.
The most common approaches are:
- Frequency tables.
- Diagrams: bar charts, histograms, pie charts.
- Measures of central tendency: mean, median, mode.
- Measures of dispersion: range and standard deviation.
Frequency Tables
A frequency table provides the number of cases and the percentages belonging to each of the categories for a variable. Frequency tables can be used for all the different types of variable.
Below is a simple example of a frequency table showing the number of schools in three different categories of the ‘type of school’ variable for the 2022-2023 academic year. I rounded the percentages below.
Type of school | Number of schools | Percent |
Local Authority | 11858 | 48 |
Academy | 10176 | 42 |
Independent | 2408 | 10 |
Total | 24442 | 100 |
Analysts usually clean from raw data to make frequency tables so people can understand and visualise them more easily.
Frequency tables are the starting point for generating diagrams which put the data into visual form making trends stand out.
Diagrams
Diagrams representing quantitative data in visual form to make data easier to understand and interpret. Bar charts and pie charts are two of the most commonly used visual representations of quantitative data.
Bar charts
The chart below shows the same data as in the frequency table above. Each bar represents one of the three school types.
The bar chart below shows the largest category is Local Authority (LA) maintained schools with academies the second largest category. You can also see there are relatively few independent schools.
Pie Charts
The main advantaged of a pie chart is that you can see the proportion of each category in relation to the total. A pie chart shows this sense of relation to the whole more clearly than a bar chart.
For example you can clearly see below that LA Maintained schools make up nearly 50% of the total. This doesn’t stand out as much in the bar chart.
You can also see that Independent schools represent around 10% of schools from the pie chart.
Frequency tables and diagrams: final thoughts
Diagrams are useful to make frequency tables easer to understand.
Bar charts are more useful when you want to look at proportions in relation to each other. Pie charts are more useful when you want to look at proportions in relation to the whole.
Keep in mind however that charts are only as useful as the data. For example, one limitation with the above data is that it tells you nothing about pupil numbers, only school numbers!
Sources
Gov.UK (accessed July 2023) Schools, Pupils and their Characteristics 2022-23.
Measures of Central Tendency
Measures of central tendency encapsulate in one figure a value which is typical for a distribution of values. In effect, we are seeking out an average for a distribution.
Quantitative social research analysts recognise three different forms of average:
- mean
- median
- mode
Arithmetic mean
The mean is the sum of all values in a distribution divided by the number of values.
In diagram one above, we add ALL the ages together and divide by 20 which is the total number of ages in the sample. This gives us a mean of 51.6.
The mean should be applied to interval/ ratio variables. It can also be applied to ordinal variables too.
Median
The median is the mid-point in a distribution of values. We arrive at the median by lining up all the values smallest to largest and then finding the middle value.
Whereas the mean is vulnerable to outliers which are extreme values at either end of the distribution. Outliers can greatly increase or decrease the mean, but they have much less of an affect on the median.
We see this in diagram one above, where the median point is 45.5, considerably lower than the mean of 51.6. In the case above the mean is higher because the oldest four people skew the mean average upwards. The four oldest are a lot older than the people in the middle, compared to the average ages of the rest of population.
The median can be used in relation to both interval/ratio and ordinal variables.
Mode
The mode is simply the value that occurs most frequently in a distribution. The mode can be applied to all types of variable.
In the diagram above, the mode is 28, because that is the only age which occurs twice.
Median more useful than the Mean?
With social data it is often more useful to know the median rather than the mean. This is especially true with wealth statistics in the UK.
Wealth and income distribution are of special interest to sociologists, because there is a lot of variation in distribution. Neither wealth nor income are equally distributed. Understanding how they are distributed has significant implications for life chances and social policy.
Visualising the total wealth in a bar chart looks like this:
Here you can clearly see a skew towards the top two deciles, especially the first decile. The richest 10% of households have an average of almost £2 million in wealth, which 8 times more than even the 4th decile.
In cases where there is a lot of variation in data, in terms of a large skew showing up at one end, as above, then get the mean and median being very different.
in the chart above the mean is £489 000, pulled up by the huge relative wealth of the top 20%.
The median wealth is only £280 000 and 50% of people have less than this.
Mean wealth in the UK gives you a misleading picture of the amount of wealth most people in the UK have!
Sources
ONS: Household Wealth in the UK, 2018-2022.
Measures of Dispersion
Measures of dispersion show the variation in a distribution.
Two measures of dispersion include:
- the range (the simplest)
- the standard variation.
Range
The range of data is the distance between minimum and maximum values in a distribution. Like the mean, outliers can greatly affect the range.
The range of household wealth (grouped by decile) in the UK is £1.9 Million (see chart below).
This is a very simple measure which doesn’t tell us vary much about how much wealth ordinary people.
For example it doesn’t tell us that the top decile of households are almost twice as wealthy as the next decile down.
Standard Deviation
We calculate the standard deviation by taking the difference in each value in a distribution from the mean and then dividing the total of the differences by the number of values.
The standard deviation is the average amount of deviation around the mean.
For example, the standard deviation of wealth in the UK (grouped by decile) is £575 211.
Outliers don’t affect the standard deviation as much as the range. The impact of outliers on the standard deviation is offset by dividing by the number of values.
Box Plots
Box plots are popular for showing dispersion for interval/ratio variables.
The box plot provides an indication of both the central tendency (median) and dispersion (outliers).
The box plot of wealth below treats the top richest decile as an outlier. It clearly shows you the skew is the top.
The box shows you where the middle 50% of households sit: between £800 000 and £50 000.
The line in the box shows you the median value of household wealth: £280 000.
The shape of a box plot will vary depending on whether cases tend to be high or low in relation to the median. They show us whether there is more or less variation above or below the median.
Sources
ONS: Household Wealth in the UK, 2018-2022.
Signposting and related posts
This material is most relevant to the Research Methods module. It might be a little advanced for A-level sociology. You are more likely to need this during a first year university statistical methods course.
To return to the homepage – revisesociology.com