D1. Data Literacy
Specific Expectations
Data Collection and Organization
D1.1
describe the difference between discrete and continuous data, and provide examples of each
- discrete data (data that can be counted):
- number of siblings
- number of buttons
- number of First Nations in Ontario
- continuous data (infinite number of possible values for a given range):
- height
- length of time
- temperature
- Quantitative data is either discrete or continuous.
- Discrete data includes variables that can be counted using whole numbers, such as the number of students in a class, the number of pencils in a pencil case, or the number of words in a sentence.
- Continuous data can have an infinite number of possible values for a given range of a variable (e.g., height, length, distance, mass, time, perimeter, and area). Continuous data can take on any numerical value, including decimals and fractions.
Note
- A variable is any attribute, number, or quantity that can be measured or counted.
Provide students with different scenarios that deal with qualitative data and both discrete and continuous quantitative data. Have them sort the scenarios into these three categories and explain their choices.
Reinforce the distinctions between the different types of data on an ongoing basis as students determine which type of data they need in order to answer their questions of interest.
D1.2
collect qualitative data and discrete and continuous quantitative data to answer questions of interest about a population, and organize the sets of data as appropriate, including using intervals
- questions of interest involving discrete quantitative data:
- How many large cities are in each province and territory in Canada?
- How many First Nations are in each province and territory in Canada?
- questions of interest involving continuous quantitative data:
- What is the average height of the students in the junior classes in our school?
- Using the total amount of rainfall we collected in our barrel last year, what would be the average monthly rainfall over the past year?
- questions of interest involving qualitative data:
- What woke me up this morning?
- In what season is your birthday?
- frequency table with intervals:
- shows data that might answer to the above question “What is the average height of the students in the junior classes in our school?”
- The type and amount of data to be collected will be based on the question of interest.
- Some questions of interest may require answering multiple questions that involve any combination of qualitative data and quantitative data.
- Depending on the question of interest, the data may need to be collected from a primary or a secondary source.
- Depending on the question of interest, a random sample of the population may need to be taken. Types of sampling methods include simple random sampling, stratified random sampling, and systematic random sampling.
- When continuous data is collected, it can be recorded and organized using intervals in frequency tables.
Note
- A census is an attempt to collect data from an entire population.
- Every subject in the sample must be collected in the same manner in order for the data to be representative of the population.
The Grade 6 students at School F measured the length of their feet and recorded the results in the chart below. Ask students to organize the data in a frequency table. Then have them compare their table with that of another student and answer: What is the same? What is different? Support students in recognizing that the more intervals (bins) that are used, the more spread out the data will be.
Other questions that can be asked about this data set include:
- What do you know from this data set?
- What is the median foot length for the Grade 6 students? Explain its significance.
- What is the mode for this data set? Explain its significance.
- What is the average foot length of the Grade 6 students? Explain its significance.
Data Visualization
D1.3
select from among a variety of graphs, including histograms and broken-line graphs, the type of graph best suited to represent various sets of data; display the data in the graphs with proper sources, titles, and labels, and appropriate scales; and justify their choice of graphs
- choice of graphs:
- histogram (with continuous data on the horizontal axis):
- The histogram can be interpreted as:
- there are 20 students with height ≥ 120 cm and < 125 cm
- there are 15 students with height ≥ 125 cm and < 130 cm
- and so on
- The histogram can be interpreted as:
- broken-line graph (with continuous data on the vertical axis):
- "I can see that August had the most precipitation."
- "The largest decrease in precipitation took place between August and September."
- "There was very little change in precipitation from February to March and June to July."
- Understanding the features and purposes of different kinds of graphs is important when selecting appropriate displays for a set of data.
- Pictographs, line plots, bar graphs, multiple-bar graphs, and stacked-bar graphs are used to display qualitative data and discrete quantitative data.
- Histograms display continuous quantitative data using intervals. The bars on a histogram do not have gaps between them due to the continuous nature of the data. This contrasts with bar graphs, which do have gaps between the bars to show the discrete categories.
- Broken-line graphs are used to show change over time and are helpful for identifying trends. To create a broken-line graph, students apply their understanding of scales and estimation.
- The source, titles, labels, and scales provide important information about data in a graph or table:
Note
- It is important for students to understand the difference between a bar graph and a histogram and to recognize that they are not the same.
- At least one of the variables of a broken-line graph is not continuous.
Have students graph the height of two different plants (e.g., a tomato plant and a sunflower plant) weekly for a period of two months. After the data collection is over, ask students to make three conclusions they can make by looking at the data.
Use the data from “Student Heights in the Junior Division” to make a histogram.
- histogram (continuous data on the horizontal axis):
Once students have completed their histogram, have them compare it with a partner who used a bigger interval and a partner that used a smaller interval. Support students in recognizing that the different choices of intervals will result in different histograms, thereby affecting the appearance or shape of the histogram. The smaller the bins, the greater the detail, and vice versa.
Have students create appropriate graphs in various contexts throughout the year, including cross-curricular applications.
D1.4
create an infographic about a data set, representing the data in appropriate ways, including in tables, histograms, and broken-line graphs, and incorporating any other relevant information that helps to tell a story about the data
- infographic on the topic of “Book Drive”:
- Infographics are used in real life to share data and information on a topic, in a concise and appealing way.
- Infographics contain different representations, such as tables, plots, and graphs, with limited text.
- Information to be included in an infographic needs to be carefully considered so that it is clear, concise, and connected.
- Infographics tell a story about the data with a specific audience in mind. When creating infographics, students need to create a narrative about the data for that audience.
Note
- Creating infographics has applications in other subject areas, such as communicating key findings and messages in STEM projects.
To deepen their understanding of infographics and their purpose, have students examine the features and messages of an infographic, such as “Book Drive” which is found in the examples for D1.4. Ask questions such as:
- What audience do you think the infographic was intended for?
- What messages do you think the author was trying to share?
- What data visualizations has the author used? Why do you think they were chosen?
Have students create an infographic for previously collected data, such as information they gathered for a STEM project. Ask them to identify their audience, what message they want to get across, what data visualization techniques they will use, and any other information that will help them share their message. Have them share their ideas with a peer to check that their message is coming through.
Data Analysis
D1.5
determine the range as a measure of spread and the measures of central tendency for various data sets, and use this information to compare two or more data sets
- determining the range, mean, median, and mode:
- A school team measured the length of their feet in centimetres and found them to be:
- 25.5, 32.5, 25.5, 40.6, 34.7, 28.3, 15.3, 25.0, 30.0, 27.4, 31.5, 18.8
- A school team measured the length of their feet in centimetres and found them to be:
- range: 25.3 cm
- 40.6 − 15.3 = 25.3
- mode: 25.5 cm
- 15.3, 18.8, 25.0, 25.5, 25.5, 27.4, 28.3, 30.0, 31.5, 32.5, 34.7, 40.6
- mean: 27.9 cm
- 15.3 + 18.8 + 25.0 + 25.5 + 25.5+ 27.4 + 28.3 + 30.0 + 31.5 + 32.5 + 34.7 + 40.6 = 335.1
- 335.1 ÷ 12 = 27.9 or 28 (rounded to the nearest one)
- median: 27.7 cm
- Organize the data from least to greatest:
- (27.3 + 28.3) ÷ 2 = 27.8
- Organize the data from least to greatest:
- The mean, median, and mode are the three measures of central tendency. The mean, median, and mode can be determined for quantitative data. Only the mode can be determined for qualitative data.
- A variable can have one mode, multiple modes, or no modes.
- The use of the mean, median, or mode to make an informed decision is relative to the context.
- The range is one type of measure to describe the spread of a data set, and it is the difference between the greatest and least data values.
- Data sets are compared by the mean, median, or mode of the same variable.
- If the data sets that are both representative of a similar population, then it is possible to compare the mean, median, and mode of data sets that have a different number of data values.
- If the data sets are representing different populations, then it is important for the comparison of the mean, median, and mode be based on the same number of data values.
Note
- The range and the measures of central tendency provide information about the shape of the data and how this can be visualized graphically (e.g., when the three measures of central tendency are the same, then a histogram is symmetrical).
It is important for students to understand the difference between the range, the mode, the median, and the mean. Give them a set of data values, and ask them to determine the range, the mode, the median, and the mean. For example, the cost of various T-shirts (in dollars) at Store Y is:
- $15.50, $12.25, $15.50, $35.00, $44.50, $28.75, $15.50, $35.00, $20.00, $17.25, $31.50, $8.75, 22.25, $10.75, $46.00
- range: $37.25
- The prices range from $8.75 to $46.00, which is a difference of $37.25.
- $46.00 − $8.75 = $37.25
- mode: $15.50
- More T-shirts cost $15.50 than any other amount.
- mean: $23.90
- $15.50 + $12.25 + $15.50 + $35.00 + $44.50 + $28.75 + $15.50 + $35.00 + $20.00 + $17.25 + $31.50 + $8.75 + $22.25 + $10.75 + $46.00 = $358.50
- $358.50 ÷ 15 = $23.90
- The average cost of the T-shirts is $23.90.
- median: $20.00
- $$
\sf \small $8.75, $10.75, $12.25, $15.50, $15.50, $15.50, $17.25,\enclose{circle}[mathcolor=DodgerBlue]{\color{black}$20.00, }$22.25, $28.75, $31.50, $35.00, $35.00, $44.50, $46.00
$$ - The median price of the T-shirts is $20.00.
- Seven T-shirts cost less than $20.00, and seven T-shirts cost more than $20.00.
- $$
Help students understand the difference between the range, the mode, the median, and the mean by posing questions like:
- What is the difference between the greatest and the least value? (range)
- What is the most frequent value? (mode)
- What is the median for this data set? What does it tell you? In this case, where there is an odd number of data points, half of the rest of the values are less than the median and the other half are more than the median.
- How would you calculate the mean for this set of data? What does it tell you?
- What would happen to these measures if the three lowest-price T-shirts were removed from the list?
- What would happen to these measures if the three highest-price T-shirts were removed from the list?
- What would happen to these measures if the cost of all the T-shirts increased by 50%?
Have students determine the mean, the median, and the mode for data collected from a variety of sources, including those that involve cross-curricular applications, such as science experiments.
D1.6
analyse different sets of data presented in various ways, including in histograms and broken-line graphs and in misleading graphs, by asking and answering questions about the data, challenging preconceived notions, and drawing conclusions, then make convincing arguments and informed decisions
- data presented in various ways:
- question that requires reading and interpreting data from a graph or table:
- What interval do the heights of most students fall in?
- question that requires finding data from a graph or table and using it in a calculation:
- Looking at the multiple-bar graph, which division collected the most books for the book drive?
- question that requires using data to make an inference or prediction:
- What might explain the spikes in the broken-line graph showing the growth of the plants over two months?
- misleading graph:
- The graph below is misleading because the scale in the number of guests does not start at zero, so it exaggerates the differences, and the age intervals are not equal.
- A histogram provides a picture of the distribution or shape of the data.
- A normal distribution results in a symmetrical histogram that looks like a bell. In this case, the mode, mean, and median are the same.
- If data are skewed to the left (goes up from left to right), then the mean is likely to be less than the median. If the data are skewed to the right (goes down from left to right), then the mean is likely to be greater than the median.
- Broken-line graphs show changes in data over time.
- Sometimes graphs misrepresent data or show it inappropriately, which can influence conclusions about the data. Therefore, it is important to always interpret presented data with a critical eye.
- Data presented in tables, plots, and graphs can be used to ask and answer questions, draw conclusions, and make convincing arguments and informed decisions.
- Sometimes presented data challenges current thinking and leads to new and different conclusions and decisions.
- Questions of interest are intended to be answered through the analysis of the representations. Sometimes the analysis raises more questions that require further collection, representation, and analysis of data.
Note
- Broken-line graphs are not used to make predictions, only to show what has happened to the data over time. Only data values that show a strong relationship between two variables can be used to make predictions.
- There are three levels of graph comprehension that students should learn about and practise:
- Level 1: information is read directly from the graph and no interpretation is required.
- Level 2: information is read and used to compare (e.g., greatest, least) or perform operations (e.g., addition, subtraction).
- Level 3: information is read and used to make inferences about the data using background knowledge of the topic.
- Working with misleading graphs helps students analyse their own graphs for accuracy.
Provide students with a bar graph or histogram that presents information in a misleading way. For example, the histogram below does not start at zero on the vertical axis, nor does it have a consistent scale for the age of guests. Have students describe what makes this graph misleading. Ask them to recreate the graph so that it presents the information accurately.
Throughout the year, have students collect representations of data about real-life topics that are of interest to them. Model asking the three types of questions outlined in the examples in D1.6, and then have students pose and answer their own questions that require thinking critically about the data.