Bivariate analysis involves analysing two variables at a time in order to uncover whether the two variables are related.

Exploring relationships between variables means searching for evidence that the variation in one variable coincides with variation in another variable.

There are a variety of techniques you can use to conduct bivariate analysis but their use depends on the nature of the two variables being analysed.

Type of variable | Nominal | Ordinal | Interval/ Ratio | Dichotomous |

Nominal | Contingency table + chi-square + Cramer’s V | Contingency table + chi-square +Cramer’s V | Contingency table + chi-square +Cramer’s V, compare means and eta | Contingency table + chi-square +Cramer’s V |

Ordinal | Contingency table + chi-square + Cramer’s V | Spearmans’ rho | Spearmans’ rho | Spearmans’ rho |

Interval/ ratio | Contingency table + chi-square +Cramer’s V, compare means and eta | Spearmans’ rho | Pearson’s R | Spearmans’ rho |

Dichotomous | Contingency table + chi-square + Cramer’s V | Spearmans’ rho | Spearmans’ rho | phi |

*Bivariate analysis for different types of variable*

## Bivariate Analysis: Relationships, not causality

If there is a relationship between two variables, this does not necessarily mean one causes the other.

Even if there is a causal relationship, we need to take care to make sure the direction of causality is correct. Researchers must be careful not to let their assumptions influence the direction of causality.

For example, Sutton and Rafaeli (1998) conducted bivariate analysis on the relationship between the display of positive emotions by retail staff and levels of retail sales.

Common sense might tell you that positive staff sell more, however Sutton and Rafaeli found that the relationship was the other way around: higher levels of sales resulted in more positive emotions among staff. This was unexpected, but also makes sense.

Sometimes you can infer the direction of causality with 100% certainty. For example with the relationship between age and voting patterns. Younger people are less likely to vote, and thus age must be the independent variable. There is no way voting patterns can influence age.

## Contingency Tables

A contingency table is like a frequency table but it allows two variables to be analysed simultaneously so that relationships between them can be examined.

They usually contain percentages since these make the relationships easier to see.

Male | Female | |||

Number | Percent | Number | Percent | |

Sociology | 60 | 30 | 120 | 40 |

Maths | 20 | 10 | 60 | 20 |

English | 20 | 10 | 60 | 20 |

Dance | 100 | 50 | 60 | 20 |

200 | 100 | 300 | 100 |

*Students studying subjects in one college, by gender.*

The table above contains both the numbers of the variables and their percentages as a proportion of the total next to them.

The percentages are column percentages: they calculate the number in each cell as a percentage of the total number in that column. Hence why the percent columns add up to 100!

In the above table we can see that there are more female students than male students and females dominate in every subject other than dance, because dance is much more popular among male students. (It’s quite an unusual college!)

Contingency tables can be applied to all types of variable, but they are not always an efficient method.

## Pearson’s R

Pearon’s R is a method for examining relationships between interval/ ratio variables. The main features of this method of analysis are:

- The coefficient will lie between 0 and 1 which indicates the strength of a relationship. 0 means no relationship, 1 means a perfect relationship.
- The closer the coefficient is to one, the stronger the relationship, the closer to 0, the weaker the relationship.
- The coefficient will either be positive or negative which indicates the direction of the relationship.

### Examples of Pearsons’ R correlations

The table below show the relationship between age and four other variables. (Note this data is hypothetical or made up and for illustrative purposes only!)

Age group | happiness score | wealth £ | hours watching TV per week | ave no of friends |

20 | 10 | £10,000 | 15 | 5 |

30 | 8 | £20,000 | 10 | 8 |

40 | 6 | £30,000 | 33 | 11 |

50 | 4 | £40,000 | 22 | 10 |

60-69 | 2 | £50,000 | 9 | 16 |

Pearson’s R | -1 | 1 | 0 | 0.93 |

The correlations are as follows:

- between age and happiness: perfect negative correlation.
- between age and wealth: perfect positive correlation.
- between age and watching TV: no correlation
- between age and number of friends: strong positive correlation.

The scatter plots for the above data are as follows:

#### Age and happiness

#### Age and wealth

#### Age and TV

#### Age and friends

## Spearman’s Rho

Spearmans’ Rho is often represented with Greek letter p and is designed for use with ordinal variables. It can also be used when one variable is ordinal and the other is interval/ ratio.

It is exactly the same as Pearson’s R in that the computed value will be between 0 and 1 and either positive or negative.

Pearson’s R can only be used when both variables are interval/ ratio. Spearman’s Rho can be used when on the the variables is ordinal.

## Phi and Cramer’s V

The Phi coefficient is used for the analysis of the relationship between two dichotomous variables. Like Pearsons R it results in computed statistic which is either positive or negative and varies between 0 and 1.

Cramer’s V can be used with nominal variables. It can only show the strength of relation between two variables, not the direction.

Cramers’ V is usually reported along with a contingency table and chi-square test.

## Comparing means and eta

If you need to examine the relationship between an interval/ ratio variable and a nominal variable if the latter can be relatively unambiguously identified as the independent variable, then it might be useful to compare the means of the interval/ratio variable for each subgroup of the nominal variable.

This procedure is often accompanied by a test of association between variables called eta. The statistic expresses the level of association between the two variables will always be positive.

Eta-squared expresses the amount of variation in the interval/ ratio variable that is due to the nominal variable.

## Signposting and sources

This material should be of interest to anyone studying quantitative social research methods.

To return to the homepage – revisesociology.com

Bryman, A (2016) Social Research Methods