Skip to main content

How to Understand and Calculate Covariance in Statistics




The Covariance ExplainedCovariance is a statistical concept that measures how two variables are related to each other. It tells us whether the variables tend to move in the same or opposite directions, and how much they vary together. In this blog post, we will explain what covariance is, how to calculate it, and how to interpret it.

What is Covariance?Covariance is defined as the expected value of the product of the deviations of two variables from their respective means. Mathematically, it can be written as:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])]$$

where $X$ and $Y$ are two random variables, $E[X]$ and $E[Y]$ are their means, and $E$ is the expectation operator.

Covariance can also be computed using the following formula:

$$\mathrm{Cov}(X,Y) = E[XY] - E[X]E[Y]$$

where $E[XY]$ is the expected value of the product of $X$ and $Y$.

Covariance can be positive, negative, or zero. A positive covariance means that the two variables tend to move in the same direction, i.e., when one variable increases, the other variable also increases, and vice versa. A negative covariance means that the two variables tend to move in opposite directions, i.e., when one variable increases, the other variable decreases, and vice versa. A zero covariance means that the two variables are independent, i.e., they have no linear relationship.

How to Calculate Covariance?To calculate the covariance of two variables, we need to have data on both variables for a sample or a population. For example, suppose we have the following data on the heights and weights of 10 people:


Height (cm) Weight (kg)

170 65

180 75

160 60

190 80

175 70

165 55

185 85

155 50

195 90

150 45


To calculate the covariance of height and weight, we first need to find the means of both variables:

$$E[X] = \frac{1}{n}\sum_{i=1}^n X_i = \frac{1}{10}(170 + 180 + \cdots + 150) = 172.5$$

$$E[Y] = \frac{1}{n}\sum_{i=1}^n Y_i = \frac{1}{10}(65 + 75 + \cdots + 45) = 67.5$$

where $n$ is the sample size, and $X_i$ and $Y_i$ are the values of the $i$-th observation.

Next, we need to find the product of the deviations of each observation from the means:


Height (cm) Weight (kg) $(X - E[X])$ $(Y - E[Y])$ $(X - E[X])(Y - E[Y])$

170 65 -2.5 -2.5 6.25

180 75 7.5 7.5 56.25

160 60 -12.5 -7.5 93.75

190 80 17.5 12.5 218.75

175 70 2.5 2.5 6.25

165 55 -7.5 -12.5 93.75

185 85 12.5 17.5 218.75

155 50 -17.5 -17.5 306.25

195 90 22.5 22.5 506.25

150 45 -22.5 -22.5 506.25


Finally, we need to find the expected value of the product of the deviations, which is the same as the average of the last column:

$$E[(X - E[X])(Y - E[Y])] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])(Y_i - E[Y]) = \frac{1}{10}(6.25 + 56.25 + \cdots + 506.25) = 201.25$$

Therefore, the covariance of height and weight is:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = 201.25$$

How to Interpret Covariance?The covariance of height and weight is positive, which means that there is a positive relationship between the two variables. In other words, taller people tend to weigh more, and shorter people tend to weigh less. However, the covariance does not tell us how strong this relationship is, or how much one variable changes when the other variable changes. For that, we need to use another measure, such as correlation.

Correlation is a normalized version of covariance, which ranges from -1 to 1. It measures the degree of linear dependence between two variables, regardless of their scales. A correlation of 1 means that the variables have a perfect positive linear relationship, a correlation of -1 means that they have a perfect negative linear relationship, and a correlation of 0 means that they have no linear relationship. To calculate the correlation, we need to divide the covariance by the product of the standard deviations of the two variables:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$$

where $\mathrm{Var}(X)$ and $\mathrm{Var}(Y)$ are the variances of $X$ and $Y$, respectively.

Using the same data as before, we can find the standard deviations of height and weight as follows:

$$\mathrm{Var}(X) = E[(X - E[X])^2] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 225$$

$$\mathrm{Var}(Y) = E[(Y - E[Y])^2] = \frac{1}{n}\sum_{i=1}^n (Y_i - E[Y])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 187.5$$

$$\mathrm{SD}(X) = \sqrt{\mathrm{Var}(X)} = \sqrt{225} = 15$$

$$\mathrm{SD}(Y) = \sqrt{\mathrm{Var}(Y)} = \sqrt{187.5} = 13.69$$

Therefore, the correlation of height and weight is:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}} = \frac{201.25}{\sqrt{225 \times 187.5}} = 0.79$$

The correlation of height and weight is close to 1, which means that there is a strong positive linear relationship between the two variables. This confirms what we observed from the covariance, but also gives us a more precise and scale-free measure of the relationship.

SummaryIn this blog post, we learned about the covariance explained. We saw that covariance is a measure of how two variables vary together, and how it can be positive, negative, or zero. We also learned how to calculate covariance using the expected value of the product of the deviations, or using the expected value of the product minus the product of the expected values. Finally, we learned how to interpret covariance using correlation, which is a normalized and scale-free measure of the linear dependence between two variables. We hope this blog post was helpful and informative for you. If you have any questions or feedback, please leave a comment below. Thank you for reading!


Comments

Popular posts from this blog

Trade Unions 101: What They Are, Why They Matter, and How They Wor

  The history of trade unions is a long and complex one, involving social, economic, and political factors. Here is a brief summary of some key events and developments: Trade unions originated in Great Britain, continental Europe, and the United States during the Industrial Revolution, when workers faced harsh and exploitative conditions in factories and mines 1 . Trade unions were initially illegal and persecuted by employers and governments, who used laws such as restraint-of-trade and conspiracy to suppress their activities 1 . Trade unions gradually gained legal recognition and protection through acts such as the Trade-Union Act of 1871 in Britain 1 and a series of court decisions in the United States 2 . Trade unions adopted different strategies and structures depending on the country, industry, and sector they operated in. Some examples are craft unions, general unions, and industrial unions 1 2 . Trade unions also developed political affiliations and influences, such as the...

The Zero-Based Budgeting Method: How to Make Every Dollar Count

Hey friends! Are you tired of living paycheck to paycheck and never being able to save any money? It's a common problem, but there's a solution. Enter the zero-based budgeting method. Zero-based budgeting is a budgeting system where you start with zero dollars in your budget and then allocate every dollar to a specific category, whether it be savings, housing, or entertainment. The idea is that at the end of the month, your income minus your expenses should equal zero. Sounds simple, right? Well, the trick is sticking to it. But with a little discipline and effort, zero-based budgeting can be a game-changer for your finances. So, how do you get started with zero-based budgeting? Here's a step-by-step guide: Write down all of your monthly income, including your salary, any side hustle income, and any other sources of income. Write down all of your monthly expenses, including everything from rent and utilities to groceries and entertainment. Make sure to include all of your f...

How to Avoid Buying a Lemon: What George Akerlof Taught Us About Information Asymmetry and Market Failures

How the Market for Lemons Explains Why We Can’t Have Nice Things Have you ever wondered why it is so hard to find a good used car, or a reliable contractor, or a trustworthy insurance company? You might think that the market would reward the sellers of high-quality products and services, and weed out the low-quality ones. But sometimes, the opposite happens: the market becomes flooded with “lemons”, or defective goods, and the good ones disappear. This is what Nobel laureate George Akerlof called the “market for lemons” problem, and it has profound implications for many aspects of our economy and society. What is the market for lemons? The market for lemons is a situation where there is asymmetric information between buyers and sellers, meaning that one party has more or better information than the other. In particular, the seller knows more about the quality of the product or service than the buyer, and the buyer cannot easily verify it before making a purchase. This creates a problem...