Key Concepts
- Understand how to find measures of Center and Spread
- Understand how to use appropriate Statistics to compare Data sets
- Understand how to recognize a normal Distribution
- Understand how to Classify a Data distribution.
Critique and Explain
Chen and Dakota were asked to estimate the mean and median of the following data set.
Chen said, ’The middle value is 11. Both the mean and median are approximately 11.” Dakota said
“Most of the data are the left I think the mean and median will be about 9, with the mean slightly
Larger.”
- Is either Chen or Dakota correct? Explain
- What strategies could you to approximate the exact mean and median
- Which measure of center is more representative in this case, the mean or the median? Explain.
Solution:
- Both Chen and Dakota are not correct. Because the mean of the histogram is 11(approx) and median is 8(approx).
- I am going to follow the following strategies:
Mean = ∑xifi / n where,
xi= mid value of the class intervals
fi= Frequency
n = total frequency
Median = l + n/2 −cf / f * h where,
L = lower boundary of the median class
n = total frequency
cf = cumulative frequency of the median class
f = frequency of the median class
h = size of the class
- The mean is more representable in the given case. We can determine the data values by above the mean or below the mean. The median is nothing but the middle value of the data when the data is written in ascending order.
Example 1: Find measures of centre and spread
- What are the mean and standard deviation of the following data set?
4, 12, 15, 9, 14, 13, 6, 7, 6, 25, 3, 13, 17, 22, 4
The mean, or average of a data set is the sum of the values in the data set divided by a number of values in the data set. The Standard Deviation is a measure of how much the values in a data set vary, or deviate, from the mean. It is the measure of variability or spread of data.
You can use a spreadsheet to calculate the mean and standard deviation.
- The mean and the standard deviation are used together to measure the centre and spread of the data.
The mean is
x ≈ 11.6, and the standard deviation is σ ≈ 6.3
- What is the five-number summary of the data set?
The five-number summary includes the minimum value, first quartile, median, third quartile and maximum value.
Step 1: Rearrange the data in ascending numerical order.
3, 4, 4, 6, 6, 7, 9, 12, 13, 13, 14, 15, 16, 17, 22, 25
Step 2: Note the minimum and maximum values:
Minimum = 3
Maximum = 25
Step 3:
Calculate the median, the number in the middle of the data set. Since there are an even number of values, the median is the average of the two middle values or 12.5.
Step 4:
Calculate the first and third quartiles. The quartiles show how the data are disturbed. The first quartile
is the median of the lower half of the data, 6. The third quartile is the median of the upper half
of the data, 15.5.
These data can be represented in a box-and-whisker plot. Notice that the one quartile is closer to the median than the other.
The five-number summary of this data set is: minimum = 3, 1st quartile = 6, median = 12.5
3rd quartile = 15.5, maximum = 25
Try it
- List the mean, standard deviation, and five-number summary of the following data set 3, 4, 9, 12, 12, 14, 15, 19, 30, 32, 33, 34, 34, 35
Solution:
Mean = 3+4+9+12+12+14+15+19+30+32+33+34+34+35 / 14
=286 / 14
≈ 20.4
Standard deviation = ∑(x − x−)2 / n−1
= √1883.38 / 13
≈12.03
Five number summary:
Minimum value = 3
Maximum value = 35
Median = mean of n/2th observation and n2 + 1th observation.
= mean of 7th and 8th observations
= 15+19 / 2
= 17
First quartile = 12
Third quartile = 33
Then,
The five-number summary of this data set is: minimum = 3, 1st quartile = 12, median = 17
3rd quartile = 33, maximum = 35
Example 2 : Use appropriate statistics to compare data sets
- How can you describe different types of distributions?
To compare the different types of distributions, look at the shape, the center, and the spread of the distributions.
The standard deviation, range, and the interquartile range are three measures of spread. The range of a data set is the difference.
- When measuring centre and spread, median and interquartile range are used together, and mean and standard deviation are used together between the maximum and minimum values. The interquartile range is the difference between the third quartile and the first quartile.
A skewed distribution is one with a shape that is stretched out in either the positive or negative direction. A symmetrical distribution has a shape, when reflected across the mean, the display is roughly the same.
The shape of a distribution can affect the measures of center and spread and determine which measures the center and spread best describes the data.
The mean, median, and mode are all about the same in a symmetric distribution. You can use the mean and the standard deviation to describe the center and spread.
- What measures of center and spread would you use for the following data set?
10, 13, 16, 21, 22, 26, 29, 29, 30, 32, 33, 33, 33, 35, 37 You can use a histogram to determine the shape. Since the mean is more affected than the median by a data distribution that is skewed, it is better to use the median and interquartile range as the measures of center and spread. Also, the quartiles show how the data are disturbed differently on either side of the center.
The data are already in numerical order.
The range is 37 – 10 = 27, and the interquartile range is 33 – 21 = 12
Try it
- What are the better measures of center and spread of the following data sets?
- 55, 55, 57, 57, 57, 58, 58, 59, 59, 61, 61
- 110, 110, 110, 120, 120, 130,140, 150, 160, 170, 180, 190
Solution:
- Step 1: Make a histogram of the data set.
The histogram is skewed to the left. So, it is better to use the median and interquartile range as
the measures of center and spread.
- 110, 110, 110, 120, 120, 130, 140, 150, 160, 170, 180, 190
Solution:
Step 1: Make the histogram for the data.
The histogram is skewed to the right. So, it is better to use the median and interquartile range as the measures of center and spread.
Example 3: Recognize a normal distribution
Are the following variables likely to have a normal distribution?
- The heights of all people in a large group.
A normal distribution can be modeled by a particular bell-shaped curve that is symmetric about the mean. This is call the normal curve.
Approximately normal distributions can be found in many real-world situations where the data are symmetric and mostly clustered near the mean.
The heights of people in a large group are likely to be normally distributed.
- The probability of landing on each of 8 equal parts of a spinner.
This data set is not normally distributed because each outcome has the same probability of occurring as any other.
- The scores on any test.
The scores on any test are often skewed to the left and not normally distributed, because more students will receive higher scores.
- The number of children in a family.
The number of children in a family is not normally distributed. The distribution is skewed to the right because many families have 0, 1, 2, or 3 children, but very few families have 10 or more children.
Example 4: Classify a data distribution
How would you classify the following the data set? Describe the shape of the distribution and
the center and spread of the data.
106, 96 ,86, 120, 98, 76, 112, 64, 99, 72, 119, 115, 76, 120, 97
Step 1. Make a histogram of the data.
Step 2: Analyze the shape of the histogram.
Since the data are bunched to the right and have a long tail to the left, the data are skewed left.
Step 3:
Determine the center and spread of the data. Use the median and inter-quartile range.
64, 72, 76, 76, 86, 96, 98, 99, 106, 112, 115, 119, 120, 120
1st quartile = 76, median = 98, 3rd quartile = 115
The interquartile range is 115 – 76 = 39. Notice that the 3rd quartile is closer to the median than
the first quartile. This is the characteristic of a distribution that is skewed left.
The distribution is skewed left with median 98 and interquartile range 39.
Try it
- What is the type of distribution and the center and spread of the data ?
20 , 17 , 17 , 12 , 18 , 21 , 19 , 18 , 13 , 14 , 17 , 23 , 25
Solution:
Ascending order of the data set is
12, 13, 14, 17, 17, 17, 18, 18, 19, 20, 21, 23, 25
The histogram is skewed right. So, it is better to use the median and interquartile range as
the measures of center and spread.
Step 2:
Determine the center and spread of the data. Use the median and inter quartile range.
12, 13, 14, 17, 17, 17, 18, 18, 19, 20, 21, 23, 25
Median =
13+1 / 2 = 7th observation = 18
1st quartile = 14 +17 / 2
= 15.5
3rd quartile = 20+21 / 2
= 20.5
Interquartile range = 20.5 – 15.5 = 5
The distribution is skewed with the median 18 and interquartile range 5.
Concept Summary
Data Distributions
Shapes
For distributions that are approximately normal, use mean and standard deviation to describe the data. For skewed distributions, use median and quartiles to describe the data.
Graphs
Let’s check our knowledge:
- Determine the mean, standard deviation and five-number summary to the following data set.
5, 8, 5, 9, 6, 14, 9, 3, 8, 7, 10, 12
- For each of data, describe the shape of the distribution and determine which measures of center and spread best represents the data.
- 28, 13, 23, 34, 55, 38, 44, 65, 49, 33, 50, 59, 67, 45
- 12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15
Answers:
- Determine the mean, standard deviation and five number summary to the following data set.
5, 8, 5, 9, 6, 14, 9, 3, 8, 7, 10, 12
Solution:
Mean =
5+8+5+9+6+14+9+3+8+7+10+12 / 12
= 96 / 12
= 8
Standard deviation:
Standard deviation =∑(x − x)2 / n−1
=√106 / 11
≈ 3.10
Five number summary:
Minimum = 3
Maximum = 14
Ascending order of the data set: 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 12, 14
Median = 8+8 / 2
= 8
First quartile = 5
Third quartile = 10
- For each of data, describe the shape of the distribution and determine which measures of centre and spread best represents the data.
- 28, 13, 23, 34, 55, 38, 44, 65, 49, 33, 50, 59, 67, 45
- 12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15
Solution:
- Step 1: Make the histogram for the data set
The shape of the histogram is symmetric.
So, it is better to use standard deviation and mean to describe center and spread.
- 12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15
Solution:
Step 1: Make the histogram of the data set
The shape of the histogram is symmetric.
So, it is better to use standard deviation and mean to describe the center and spread.
Exercise
Determine if each situation is likely to be uniformly distributed, normally distributed, skewed left or skewed right.
- The age at which people die in United States
- Number of pets owned by students at your school.
- Selling price of cars in 2018.
- The test scores from a history test are 88, 95, 92, 60, 86, 78, 95, 98, 92, 96, 70, 80, 89, and 96
- Find the mean and the standard deviation
- Find the five number summary of the test scores
- Describe the type of distribution
Concept Summary
Related topics
Addition and Multiplication Using Counters & Bar-Diagrams
Introduction: We can find the solution to the word problem by solving it. Here, in this topic, we can use 3 methods to find the solution. 1. Add using counters 2. Use factors to get the product 3. Write equations to find the unknown. Addition Equation: 8+8+8 =? Multiplication equation: 3×8=? Example 1: Andrew has […]
Read More >>Dilation: Definitions, Characteristics, and Similarities
Understanding Dilation A dilation is a transformation that produces an image that is of the same shape and different sizes. Dilation that creates a larger image is called enlargement. Describing Dilation Dilation of Scale Factor 2 The following figure undergoes a dilation with a scale factor of 2 giving an image A’ (2, 4), B’ […]
Read More >>How to Write and Interpret Numerical Expressions?
Write numerical expressions What is the Meaning of Numerical Expression? A numerical expression is a combination of numbers and integers using basic operations such as addition, subtraction, multiplication, or division. The word PEMDAS stands for: P → Parentheses E → Exponents M → Multiplication D → Division A → Addition S → Subtraction Some examples […]
Read More >>System of Linear Inequalities and Equations
Introduction: Systems of Linear Inequalities: A system of linear inequalities is a set of two or more linear inequalities in the same variables. The following example illustrates this, y < x + 2…………..Inequality 1 y ≥ 2x − 1…………Inequality 2 Solution of a System of Linear Inequalities: A solution of a system of linear inequalities […]
Read More >>
Comments: