Variance, denoted by 'σ2', is the measure of how far values are spread out in a dataset from their mean value. It is the square of standard deviation denoted by 'σ'. Both variance and standard deviation are used to measure the fluctuation in data.
In MS Excel, variance is calculated using "=Var.P(Range)" or "=Var.S(Range) function where 'P' means population and 'S' means sample.
1. Open MS Excel and enter 4.2,4.3,4.1,3.9,4.5,3.6,3.5,3.7,3.8,4.0 in range A2:A11.
2. In cell A12, enter "=Var.P(A2:A11)". This will calculate variance of dataset.
3. In cell A13, enter "=Stdev.P(A2:A11)". This will calculate standard deviation of dataset.
4. In cell A14, enter "=Average(A2:A11)". This will calculate mean of dataset. We will use this later.
5. We are assuming the this is the entire data therefore we are using 'P' which is for entire population instead of 'S' which is for sample.
6. To better understand the concept of variance and standard deviation create another column named 'Standard deviation'.
7. Standard deviation means deviation from standard and standard is mean. Simply put, it means x-mean.
8. In cell B2 enter "=A2-$A$14". This will compute deviation for first value. Now drag this formula till B11.
9. If you calculate the variance and standard deviation of B2:B11, it will be same as A2:A11.
10. Now create another column named 'Variance'.
11. In cell C2, enter "=B2*B2". This will calculate square of B2 or deviation. Now drag this formula till C11.
12. In cell C12, enter "=Average(C2:C11). Now observe that this is the same as variance calculated in A12 and B12.
13. Now draw a chart of B2:B11, this shows how far values are spread out from mean.
In python you can use "statistics" liberary to perform arithmetic mean.
If you do not have python already installed, please download one with spider. Simplest ones are portable ones which do not need to be installed like winpython.
1. To import statistics type "import statistics as st".
2. You will also need pyplot liberary to draw a chart. Import pyplot using "import matplotlib.pyplot as pp".
3. Now define your dataset as variable x using "x = [4.2,4.3,4.1,3.9,4.5,3.6,3.5,3.7,3.8,4.0]".
4. We assume that this is the entire data and not just sample, therefore we will use population functions i.e. pvariance() and pstdev().
5. Compute and store the value of standard deviation using "sd = st.pstdev(x)".
6. Compute and store the value of variance using "v = st.pvariance(x)".
7. Compute and store the value of Mean using "m = st.mean(x)".
8. You can display the values computed above using print command.
9. Type "print("Standard Deviation: " + str(sd))" to display standard deviation.
10. Type "print("Variance: " + str(var))" to display variance.
11. Type "print("Mean: " + str(m))" to display mean.
12. Now compute standard deviation (x-mean value) for each value in 'x' by using "sdx = tuple(x-m for x in x)".
13. Here we loop through our dataset stored in 'x' and subtract mean stored in 'm' from each value.
14. Now print sdx using "print("Standard Deviations: " + str(sdx))".
15. Multiply each value in 'sdx' with itself to create a new tuple of square values using "varx = tuple(x*x for x in sdx)".
16. Print these values using "print("Variances: " + str(varx))".
17. Show variation in data on a bar chart using "pp.bar(tuple(str(x) for x in x),sdx)".