Skip to main content

How to Understand and Calculate Covariance in Statistics




The Covariance ExplainedCovariance is a statistical concept that measures how two variables are related to each other. It tells us whether the variables tend to move in the same or opposite directions, and how much they vary together. In this blog post, we will explain what covariance is, how to calculate it, and how to interpret it.

What is Covariance?Covariance is defined as the expected value of the product of the deviations of two variables from their respective means. Mathematically, it can be written as:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])]$$

where $X$ and $Y$ are two random variables, $E[X]$ and $E[Y]$ are their means, and $E$ is the expectation operator.

Covariance can also be computed using the following formula:

$$\mathrm{Cov}(X,Y) = E[XY] - E[X]E[Y]$$

where $E[XY]$ is the expected value of the product of $X$ and $Y$.

Covariance can be positive, negative, or zero. A positive covariance means that the two variables tend to move in the same direction, i.e., when one variable increases, the other variable also increases, and vice versa. A negative covariance means that the two variables tend to move in opposite directions, i.e., when one variable increases, the other variable decreases, and vice versa. A zero covariance means that the two variables are independent, i.e., they have no linear relationship.

How to Calculate Covariance?To calculate the covariance of two variables, we need to have data on both variables for a sample or a population. For example, suppose we have the following data on the heights and weights of 10 people:


Height (cm) Weight (kg)

170 65

180 75

160 60

190 80

175 70

165 55

185 85

155 50

195 90

150 45


To calculate the covariance of height and weight, we first need to find the means of both variables:

$$E[X] = \frac{1}{n}\sum_{i=1}^n X_i = \frac{1}{10}(170 + 180 + \cdots + 150) = 172.5$$

$$E[Y] = \frac{1}{n}\sum_{i=1}^n Y_i = \frac{1}{10}(65 + 75 + \cdots + 45) = 67.5$$

where $n$ is the sample size, and $X_i$ and $Y_i$ are the values of the $i$-th observation.

Next, we need to find the product of the deviations of each observation from the means:


Height (cm) Weight (kg) $(X - E[X])$ $(Y - E[Y])$ $(X - E[X])(Y - E[Y])$

170 65 -2.5 -2.5 6.25

180 75 7.5 7.5 56.25

160 60 -12.5 -7.5 93.75

190 80 17.5 12.5 218.75

175 70 2.5 2.5 6.25

165 55 -7.5 -12.5 93.75

185 85 12.5 17.5 218.75

155 50 -17.5 -17.5 306.25

195 90 22.5 22.5 506.25

150 45 -22.5 -22.5 506.25


Finally, we need to find the expected value of the product of the deviations, which is the same as the average of the last column:

$$E[(X - E[X])(Y - E[Y])] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])(Y_i - E[Y]) = \frac{1}{10}(6.25 + 56.25 + \cdots + 506.25) = 201.25$$

Therefore, the covariance of height and weight is:

$$\mathrm{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = 201.25$$

How to Interpret Covariance?The covariance of height and weight is positive, which means that there is a positive relationship between the two variables. In other words, taller people tend to weigh more, and shorter people tend to weigh less. However, the covariance does not tell us how strong this relationship is, or how much one variable changes when the other variable changes. For that, we need to use another measure, such as correlation.

Correlation is a normalized version of covariance, which ranges from -1 to 1. It measures the degree of linear dependence between two variables, regardless of their scales. A correlation of 1 means that the variables have a perfect positive linear relationship, a correlation of -1 means that they have a perfect negative linear relationship, and a correlation of 0 means that they have no linear relationship. To calculate the correlation, we need to divide the covariance by the product of the standard deviations of the two variables:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$$

where $\mathrm{Var}(X)$ and $\mathrm{Var}(Y)$ are the variances of $X$ and $Y$, respectively.

Using the same data as before, we can find the standard deviations of height and weight as follows:

$$\mathrm{Var}(X) = E[(X - E[X])^2] = \frac{1}{n}\sum_{i=1}^n (X_i - E[X])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 225$$

$$\mathrm{Var}(Y) = E[(Y - E[Y])^2] = \frac{1}{n}\sum_{i=1}^n (Y_i - E[Y])^2 = \frac{1}{10}((-2.5)^2 + (7.5)^2 + \cdots + (-22.5)^2) = 187.5$$

$$\mathrm{SD}(X) = \sqrt{\mathrm{Var}(X)} = \sqrt{225} = 15$$

$$\mathrm{SD}(Y) = \sqrt{\mathrm{Var}(Y)} = \sqrt{187.5} = 13.69$$

Therefore, the correlation of height and weight is:

$$\mathrm{Corr}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}} = \frac{201.25}{\sqrt{225 \times 187.5}} = 0.79$$

The correlation of height and weight is close to 1, which means that there is a strong positive linear relationship between the two variables. This confirms what we observed from the covariance, but also gives us a more precise and scale-free measure of the relationship.

SummaryIn this blog post, we learned about the covariance explained. We saw that covariance is a measure of how two variables vary together, and how it can be positive, negative, or zero. We also learned how to calculate covariance using the expected value of the product of the deviations, or using the expected value of the product minus the product of the expected values. Finally, we learned how to interpret covariance using correlation, which is a normalized and scale-free measure of the linear dependence between two variables. We hope this blog post was helpful and informative for you. If you have any questions or feedback, please leave a comment below. Thank you for reading!


Comments

Popular posts from this blog

How Social Media Impacts Your Finances: The Good, The Bad, and The Ugly

  The Economics of Social Media: How It Affects Your Wallet Social media platforms, such as Facebook, Twitter, Instagram, and TikTok, have become ubiquitous in the modern economy and fundamentally changed how people interact, communicate, and consume information. But what are the economic implications of social media for individuals, businesses, and society? How does social media affect your wallet, both positively and negatively? In this blog post, we will explore some of the main aspects of the economics of social media, based on the latest research and evidence. The Production of User-Generated Content One of the distinctive features of social media platforms is that they rely on user-generated content (UGC), which is any form of content, such as text, images, videos, or audio, that is created and shared by users. UGC is the main source of value for social media platforms, as it attracts and retains users, generates data, and enables targeted advertising. However, UGC also poses...

Book Review: Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones by James Clear

  Atomic Habits by James Clear is an absolute game-changer for anyone looking to build good habits and break bad ones. This book has truly revolutionized the way I think about habits and how they impact our lives. Clear's writing is easy to follow and understand, and he provides practical and actionable steps to help you create the habits you want in your life. One of the things I loved most about this book was the emphasis on making small, incremental changes. Clear explains how small changes over time can lead to big results, and how even the smallest of habits can have a profound impact on our lives. This idea was incredibly empowering to me, as it means that anyone can make a change in their life, no matter how small it may seem. Another aspect of the book that I found incredibly helpful was Clear's focus on the systems and processes that drive our habits. By understanding the underlying systems and processes, we can more easily create new habits and break old ones. Clear p...

How to Spot and Avoid Spoofing in Crypto: A Guide to Order Books and Market Manipulation

Order Books and Spoofing (Crypto’s “Spoofy”) Explained in One Minute: Definition, Legal Issues, etc. If you are a crypto trader, you may have heard of terms like order books and spoofing. But what do they mean and how do they affect the market? In this post, we will explain these concepts in one minute and help you understand the risks and opportunities they present. What Are Order Books? Order books are simply records of all the buy and sell orders that are placed on a crypto exchange for a specific asset. They show the price and quantity of each order, as well as the time and date they were placed. Order books are useful for traders because they provide information about the supply and demand of the market, as well as the liquidity and volatility of the asset. For example, if you want to buy Bitcoin, you can look at the order book and see how many sellers are willing to sell at different prices. You can also see how many buyers are competing with you for the same asset. This can help...