Skip to main content

How to Understand and Calculate Covariance in Statistics




The Covariance ExplainedCovariance is a statistical concept that measures how two variables are related to each other. It tells us whether the variables tend to move in the same or opposite directions, and how much they vary together. In this blog post, we will explain what covariance is, how to calculate it, and how to interpret it.

What is Covariance?Covariance is defined as the expected value of the product of the deviations of two variables from their respective means. Mathematically, it can be written as:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])]$$

where $X$ and $Y$ are two random variables, $E[X]$ and $E[Y]$ are their means, and $E$ is the expectation operator.

Covariance can also be computed using the following formula:

$$\mathrm{Cov}(X,Y) = E[XY] - E[X]E[Y]$$

where $E[XY]$ is the expected value of the product of $X$ and $Y$.

Covariance can be positive, negative, or zero. A positive covariance means that the two variables tend to move in the same direction, i.e., when one variable increases, the other variable also increases, and vice versa. A negative covariance means that the two variables tend to move in opposite directions, i.e., when one variable increases, the other variable decreases, and vice versa. A zero covariance means that the two variables are independent, i.e., they have no linear relationship.

How to Calculate Covariance?To calculate the covariance of two variables, we need to have data on both variables for a sample or a population. For example, suppose we have the following data on the heights and weights of 10 people:


Height (cm) Weight (kg)

170 65

180 75

160 60

190 80

175 70

165 55

185 85

155 50

195 90

150 45


To calculate the covariance of height and weight, we first need to find the means of both variables:

$$E[X] = \frac{1}{n}\sum_{i=1}^n X_i = \frac{1}{10}(170 + 180 + \cdots + 150) = 172.5$$

$$E[Y] = \frac{1}{n}\sum_{i=1}^n Y_i = \frac{1}{10}(65 + 75 + \cdots + 45) = 67.5$$

where $n$ is the sample size, and $X_i$ and $Y_i$ are the values of the $i$-th observation.

Next, we need to find the product of the deviations of each observation from the means:


Height (cm) Weight (kg) $(X - E[X])$ $(Y - E[Y])$ $(X - E[X])(Y - E[Y])$

170 65 -2.5 -2.5 6.25

180 75 7.5 7.5 56.25

160 60 -12.5 -7.5 93.75

190 80 17.5 12.5 218.75

175 70 2.5 2.5 6.25

165 55 -7.5 -12.5 93.75

185 85 12.5 17.5 218.75

155 50 -17.5 -17.5 306.25

195 90 22.5 22.5 506.25

150 45 -22.5 -22.5 506.25


Finally, we need to find the expected value of the product of the deviations, which is the same as the average of the last column:

$$E[(X - E[X])(Y - E[Y])] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])(Y_i - E[Y]) = \frac{1}{10}(6.25 + 56.25 + \cdots + 506.25) = 201.25$$

Therefore, the covariance of height and weight is:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = 201.25$$

How to Interpret Covariance?The covariance of height and weight is positive, which means that there is a positive relationship between the two variables. In other words, taller people tend to weigh more, and shorter people tend to weigh less. However, the covariance does not tell us how strong this relationship is, or how much one variable changes when the other variable changes. For that, we need to use another measure, such as correlation.

Correlation is a normalized version of covariance, which ranges from -1 to 1. It measures the degree of linear dependence between two variables, regardless of their scales. A correlation of 1 means that the variables have a perfect positive linear relationship, a correlation of -1 means that they have a perfect negative linear relationship, and a correlation of 0 means that they have no linear relationship. To calculate the correlation, we need to divide the covariance by the product of the standard deviations of the two variables:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$$

where $\mathrm{Var}(X)$ and $\mathrm{Var}(Y)$ are the variances of $X$ and $Y$, respectively.

Using the same data as before, we can find the standard deviations of height and weight as follows:

$$\mathrm{Var}(X) = E[(X - E[X])^2] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 225$$

$$\mathrm{Var}(Y) = E[(Y - E[Y])^2] = \frac{1}{n}\sum_{i=1}^n (Y_i - E[Y])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 187.5$$

$$\mathrm{SD}(X) = \sqrt{\mathrm{Var}(X)} = \sqrt{225} = 15$$

$$\mathrm{SD}(Y) = \sqrt{\mathrm{Var}(Y)} = \sqrt{187.5} = 13.69$$

Therefore, the correlation of height and weight is:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}} = \frac{201.25}{\sqrt{225 \times 187.5}} = 0.79$$

The correlation of height and weight is close to 1, which means that there is a strong positive linear relationship between the two variables. This confirms what we observed from the covariance, but also gives us a more precise and scale-free measure of the relationship.

SummaryIn this blog post, we learned about the covariance explained. We saw that covariance is a measure of how two variables vary together, and how it can be positive, negative, or zero. We also learned how to calculate covariance using the expected value of the product of the deviations, or using the expected value of the product minus the product of the expected values. Finally, we learned how to interpret covariance using correlation, which is a normalized and scale-free measure of the linear dependence between two variables. We hope this blog post was helpful and informative for you. If you have any questions or feedback, please leave a comment below. Thank you for reading!


Comments

Popular posts from this blog

Book Review: The Millionaire Next Door: The Surprising Secrets of America's Wealthy

 "The Millionaire Next Door" is a must-read for anyone looking to understand the true nature of wealth and success. The book takes a deep dive into the habits and characteristics of America's wealthiest individuals, and what sets them apart from those who struggle to make ends meet. One of the biggest takeaways from the book is that wealth is not necessarily correlated with a high income. Instead, it's often a result of consistent savings, frugal spending habits, and smart investments. The authors bust several popular myths about the wealthy, including the idea that they all inherit their money or that they live extravagant lifestyles. I found the book to be incredibly eye-opening, and it has forever changed the way I think about money. I was particularly impressed with the level of research and data analysis that went into the book. The authors surveyed and studied thousands of individuals, and their findings are presented in a clear and easy-to-understand manner. On...

How Collusion Affects the Economy: A Guide for Savvy Consumers

To Collude, or Not to Collude: The Economics Behind Collusion Explained Collusion is a term that often has negative connotations in the business world. It refers to a secret or illegal agreement between two or more firms to coordinate their actions in order to gain an unfair advantage over their competitors. Collusion can take many forms, such as fixing prices, dividing markets, limiting output, or sharing confidential information. Collusion can also occur at different levels of the supply chain, such as between suppliers and retailers, or between buyers and sellers. But why do firms collude in the first place? And what are the consequences of collusion for consumers, producers, and society as a whole? In this blog post, we will explore the economics behind collusion and its pros and cons. The Incentive to Collude The main reason why firms collude is to increase their profits by reducing competition and increasing their market power. By colluding, firms can act as if they were a monopo...

How to Avoid the Correlation-Causation Fallacy in Finance: A Quick Guide

  # Correlation Does Not Imply Causation: A One Minute Perspective on Correlation vs. Causation If you are interested in finance, you have probably encountered many graphs, charts, and statistics that show the relationship between two variables. For example, you might see a graph that shows how the stock market performance is correlated with the unemployment rate, or how the inflation rate is correlated with the consumer price index. But what do these correlations mean? And can we use them to make predictions or draw conclusions about the causes of financial phenomena? ## What is correlation? Correlation is a measure of how closely two variables move together. It ranges from -1 to 1, where -1 means that the variables move in opposite directions, 0 means that there is no relationship, and 1 means that the variables move in the same direction. For example, if the correlation between the stock market and the unemployment rate is -0.8, it means that when the stock market goes up, the u...