Measures of spread summarise the data in a way that shows how scattered the values are and how much they differ from the mean value. For example: Dataset A. Calculating the Range Dataset A. Calculating Quartiles Dataset A. Interquartile Range. The variance and standard deviation are measures of the spread of the data about the mean. They summarise how close the data values are to the mean value. The smaller the variance and standard deviation, the more the mean value is indicative of the whole data set.
The standard deviation is the square root of the variance. The standard deviation of a sample can be found using the formula:. The following video from Crash Course explains more about measures of spread and some examples in context.
Statistics Centre and spread Data visualisation Distribution of data. Centre and spread Mean and median The median and mean are both measures of the centre of a set of data. It is not affected by the presence of extreme values in the data set. However, when there is an even total number of values, there is a complication -- we can't average two ordinal values as we can with ratio or interval-level values to find a "middle value". The two middle ranks are a jack J and a queen Q.
What would their average be? Due to the difficulty in answering this question, some texts suggest that for an even-length list of ordinal data, one should instead simply choose the lower of the two middle values to be the median. The mode is the most frequent data value in the population or sample. There can be more than one mode, although in the case where there are no repeated data values, we say there is no mode.
Modes can be used even for nominal data. The midrange is just the average of the highest and lowest data values. While easily understood, it is strongly affected by extreme values in the data set, and does not reliably find the center of a distribution. In addition to knowing where the center is for a given distribution, we often want to know how "spread out" the distribution is -- this gives us a measure of the variability of values taken from this distribution.
The below graphic shows the general shape of three symmetric unimodal distributions with identical measures of center, but very different amounts of "spread". Just as there were multiple measures of center, there are multiple measures of spread -- each having some advantages in certain situations and disadvantages in others:.
The range is technically the difference between the highest and lowest values of a distribution, although it is often reported by simply listing the minimum and maximum values seen. It is strongly affected by extreme values present in the distribution.
Another measure of spread is given by the mean absolute deviation , which is the average distance to the mean. Additionally, the corresponding sample statistic is a biased estimator of the population's mean absolute deviation.
0コメント