Variance Formula: Definition, How to Calculate, and Examples

The variance formula tells the statistician about various aspects of the data set. Typically, you’ll use two slightly different formulas to calculate the variance for the entire data set versus calculating the variance for just a sample of the data set.

In addition, the variance depends on the standard deviation, and both statistical concepts are useful in a variety of settings.

In this article, we will explore what the variance formula is, why it is important, how it differs from the standard deviation and how to use each formula to calculate the variance of a population and a small sample.

Contents

What is Variance?
Variance and Standard Deviation
How to Calculate the Variance of a Data Set?
Calculating the Variance in the Data Sample
Population Variance and Sample Variance

5.1 Example of population variance
5.2 Example of sample variance
Conclusion

What is Variance?

The variance is the average of the squared differences, also known as the standard deviation, of the mean. Simply put, variance is a statistical measure of how scattered the data points are in a sample or data set.

In addition to the mean and standard deviation, the variance of a sample set allows statisticians to understand, organize, and evaluate the data they collect for research purposes.

Basically, variance has two formulas that you can use depending on the data set you’re measuring. For example, if you were measuring data from an entire population set, such as the grades of an entire college class, you would calculate the variance using this formula:

Variance = (Sum of each term – mean)^2 / n

Here are the elements of the formula:

  • The variance of your entire population will be the square of the standard deviation.
  • Each term represents each value or number in your data set.
  • You need to know the average of your data set.
  • The expression ^2 represents a function of the square, or in other words, multiplying a number by itself.
  • The variable n represents the number of values ​​you have in your population.

When calculating the variance from only a sample of the population, you would use this formula:

Variance = (Sum of each term – mean)^2 / n-1

Here are the elements of the above formula:

  • The variance is what you want to find for your sample set.
  • Each term is what you use to subtract the average, which you also need to know before calculating the variance.
  • The variable n represents the total number of samples you have.

You use n-1 because you are calculating the variance for a sample of the entire population rather than the entire population itself.

Variance and Standard Deviation

Simply put, the standard deviation looks at the exact value of how spread out a set of data points are from the population or sample mean. The variance, however, measures the degree to which the mean that each data point differs from the mean. This means variance looks at the average of all the values ​​in your data set, while standard deviation looks at the exact assessment of the spread of the data.

Although there is a slight difference between these two concepts, variance and standard deviation depend on each other. When you find the standard deviation in a sample set or the entire population, you can square this result to get the variance.

While this is the simplest relationship between variance and standard deviation, it represents the need to understand how these two calculations work to provide insight into the different aspects of the data you are studying.

In addition, the standard deviation represents the relative range of a data set and does not account for outliers to either direction of the standard mean. The variance, in contrast, represents all of the variables’ changes or differences in the data set, including the relative outliers on either side of the mean.

Without these two statistical factors, there would be no variability in the data range of the sample set, which means the values ​​in the data set would cluster more around the mean than spread out, similar to a bell curve.

How to Calculate the Variance of a Data Set?

In statistics, you can calculate the variance of an entire data set, such as an annual sales report that lists total daily net sales for the year. You can also count only a sample of all data points.

In the simple annual sales report example, the sample could be summer sales totals. In this case, the statistician will measure a sample set within a specific date range. In both of these examples, you can calculate the variance using one of two formulas:

Calculate the variance of the entire data set
If you are measuring the entire data set, use the following steps to formulate the variance for the entire data set:

Variance = (Sum of each term – mean)^2 / n

  • Subtract the average of each value in your data set. Your first step is to subtract your population mean from each term in your set. For example, assume you have a population of three data points. You will subtract the mean of each of these three terms. Here is an example assuming the mean of a population is 35: (108-35, 100-35, 78-35) where each term is subtracted by 35.
  • Square each of these differences. After you subtract the average of all your terms, square each of these results by multiplying the value by the value itself. Using the example above, it would look like this: (73), (65), (43) and each of these terms squared gives (5,329), (4,225) and (1,849), respectively.
  • Add up all the resulting squares. Add up these new values ​​to get a total, like this: (5,329) + (4,225) + (1,849) = 11,403.
  • Divide the resulting number by the number of values ​​in your data set. Now you can divide the sum from step three by the total number of values ​​you have in the population you are measuring. Using the sample values ​​from the previous step, the number you use to divide is 11,403 and the value you use for n is three, because there are only three terms in the sample population. Here’s how it looks: (11,403) / (3) = 3,801. So the variance of the entire population is 3801.

Here is a simplified version of the above example:

2 = ((108-35)^2 + (100-35)^2 + (78-35)^2) / 3
= (73^2 + 65^2 + 43^2) / 3
= (5,329 + 4,225 +1.849) / 3
= 11.403 / 3
= 3.801

Calculating the Variance in the Data Sample

If you’re only measuring a sample of the entire data set, you’ll rely on a formula that describes this in terms of n-1. Just like the variance formula for the entire population, you would start this formula the same way. Follow the steps below:

Variance = (Sum of each term – mean)^2 / (n-1)

  • Subtract the average of each value in your sample set. Just as you would with the entire data set, subtract your average from each term in your sample. Here’s an example assuming the mean is 25 and you have three values ​​in your sample: (33-25), (16-25), (45-25). Your difference will yield (8), (-9) and (20), respectively.
  • Square each of these differences. Once you’ve got each difference, go ahead and square each of these values. Using the sample values ​​from the previous step, here are the resulting products: (64), (81) and (400). With this example, you can see how the value (-9) is squared to give you a positive value. This is important and essential for variance, because variance is more like the average of a point spread than an average.
  • Add up all the resulting squares. Just like the previous variance formula, add up all the products resulting from the second step: (64) + (81) + (400) = 545.
  • Subtract one from the total number of values ​​in your sample set. Before you divide, subtract one from the number of values ​​in your sample set. Using the previous example, you only have three terms. Plug three into the n-1 part of the formula: n-1 = (3) – 1. The result is two.
  • Divide that number by the resulting n-1 difference. Finally, divide the sum from step three by two, as this is the difference you got in step four. Use the value of the previous example to divide: (545) / (2) = 272.5. So the variance of the sample set of samples is 272.5.

2 = ((33-25)^2 + (16-25)^2 + (45-25)^2) / (3-1)
= (8^2 + -9^2 + 20^2) / ( 3-1)
= (64 + 81 + 400) / (3-1)
= 545 / (3-1)
= 545 / 2
= 272.5

Population Variance and Sample Variance

A small sample variance of an entire population or data set gives researchers and statisticians only a limited perspective on what is actually happening across the population.

Population variance, however, can provide a more accurate statistical representation of the range of the data and its relationship to the mean. Here are some examples of how it works:

Example of population variance

Assume a statistician wants to measure the weight variance of zebra populations in a wildlife sanctuary. The statistician will first find the mean of the population weights, and then subtract that value from each weighted value. Assume there are five zebras currently being held in a nature reserve. Statisticians measure the weight of each zebra at the following values:

Zebra 1: 670 pounds
Zebra 2: 765 pounds
Zebra 3: 780 pounds
Zebra 4: 820 pounds
Zebra 5: 735 pounds

The statistician then added up all these values ​​to get a total of 3,770 pounds. They divide this value by five, because five is the number of zebras in the entire population. The resulting average was 754. This means the average weight of the five reserve zebras is 754 pounds. The statistician then subtracts this mean from the weight of each zebra:

670 – 754 = -84
765 – 754 = 11
780 – 754 = 26
820 – 754 = 66
735 – 754 = -19

The statistician then squares each of these differences before adding the resulting product:

(-84)^2 = 7,056
(11)^2 = 121
(26)^2 = 676
(66)^2 = 4,356
(-19)^2 = 361
(7,056) + (121) + (676) + ( 4.356) + (361) = 12,570

The statistician then divides this number by the number of zebras in the population: (12,570) / (5) = 2,514. This value represents the variance of the entire population.

Example of sample variance

If the sample set of five zebras represents a larger sample of the population, the statistician will subtract one from five before dividing. This is what it will look like:

(12,570) / (5-1) = 12,570/4 = 3,142.5. This means that the variance of that small sample will be 3,142.5.

Conclusion

The variance allows the statistician to understand the extent of variation in a sample or an entire population because variance will often explain any outliers in the population.

The variance formula is also useful in many business situations, including measuring and assessing sales figures, developing products based on market research, and many other applicable uses that can benefit businesses and organizations.

In addition to business use, statisticians rely on variance to compare different numbers in a range of data. In an entire data set, variance is very important for tracking outliers, i.e. data points that lie far from the mean. The closer to zero variance is, the more clustered together with the data set. When the variance yields a higher value and is primarily expressed as a ratio, the more spread out (and thus diverse) the data points are.

Leave a Comment