D1. Data Literacy
Specific Expectations
Data Collection and Organization
D1.1
explain why percentages are used to represent the distribution of a variable for a population or sample in large sets of data, and provide examples
- large data set in a frequency table:
- large data set in a relative frequency table:
- large data set in a circle graph:
- When comparing categories of a large population, it is easier to compare them by relative amounts (i.e., using percentages) rather than by their exact quantities. For example, in a survey administered to 45 896 respondents, 36 572 respondents selected “yes” and 592 selected “maybe”. It is easier to interpret the data if you know that 80% selected “yes” and 1% selected “maybe”.
- A variable for data sets of various populations with different sizes can be compared relatively.
Note
- Samples of varying sizes can also be compared relatively.
Provide students with a frequency table showing the distribution of data for a large population. Ask them to describe any relationships that they notice in the data. Next, provide students with a relative-frequency table showing the same distribution represented in the form of percentages. Ask students to describe any relationships that they notice now. Discuss how using percentages can support understanding the data and making predictions.
Have students revisit data that they have recently collected and re-represent it using percentages. Discuss the new discoveries the students make about their data.
D1.2
collect qualitative data and discrete and continuous quantitative data to answer questions of interest, and organize the sets of data as appropriate, including using percentages
- questions of interest requiring discrete data:
- How many tickets were sold at the drive-in theatre every day for the past month?
- How many times did the top five genres appear on the Top 50 Movies list in each of the past five years?
- questions of interest requiring continuous data:
- Did the tomato plant or the sunflower plant grow the most over the past month? How do you know?
- At what time in the movie does the main character make an appearance?
- questions of interest requiring qualitative data:
- Which three genres of movie are most popular with 12- to 16-year-olds?
- If a drive-in theatre were to choose two movies of different genres to show, what combination of genres would attract the most customers?
- relative frequency table with percentages:
- The type and amount of data to be collected is based on the questions of interest. Some questions of interest may require answering multiple questions that involve any combination of qualitative data and quantitative data.
- Depending on the question of interest, the data may need to be collected from a primary or a secondary source.
- Depending on the question of interest, a random sample of the population may need to be taken. Types of sampling methods include simple random sampling, stratified random sampling, and systematic random sampling.
- Relative frequency tables are helpful for recording and analysing data and necessary to prepare certain kinds of graphs. The frequencies in a relative frequency table must add to 100% if expressed as percentages and add to 1 if expressed as decimal numbers.
- In order to prepare a circle graph, the angle measures are determined by calculating the percentage of 360 degrees that each sector (category) requires.
The following table shows the genres of all the movies that appeared on the Top 50 Movies list last year. Ask students to present the data in two different ways, using percentages.
Provide student pairs with a copy of the following chart that has been cut into strips as shown. Their task is to sort the paper strips under the headings of “Qualitative”, or “Quantitative (Discrete)”, or “Quantitative (Continuous)”.
After students have completed the sorting, discuss the following questions for a selection of the topics:
- whether the data should be collected from a primary or secondary source;
- appropriate populations to collect data from;
- possible sampling techniques;
- data collection techniques to avoid and mitigate bias.
Data Visualization
D1.3
select from among a variety of graphs, including circle graphs, the type of graph best suited to represent various sets of data; display the data in the graphs with proper sources, titles, and labels, and appropriate scales; and justify their choice of graphs
- choice of graphs:
- circle graph:
- Circle graphs are used to show how categories represent parts of a whole data set that can be either qualitative or quantitative data. Histograms are used to display intervals of continuous quantitative data.
- Broken-line graphs are used to show changes over time.
- Pictographs, line plots, bar graphs, multiple-bar graphs, and stacked-bar graphs may be used to display qualitative data, and discrete quantitative data.
- The source, titles, labels, and scales provide important information about data in a graph:
- The source indicates where the data was collected.
- The title introduces the data contained in the graph.
- Labels provide additional information, such as the categories that are represented in the sectors of a circle graph. Percentages are often used in circle graphs to describe the categories.
- Scales identify the possible values of a variable along an axis of a graph. Values are arranged in ascending order on a scale.
Note
- When there are too many sections in the circle graph, it gets too crowded and hard to read. A possible strategy is to group more than one category together.
Provide students with data presented in a frequency table or a relative frequency table and ask them to create a circle graph. Before they begin, ask them to order the categories from what they expect to be the greatest to the least portions of the circle. The following is an example of data on the “top 50 movies” represented in a relative-frequency table, angle measures to the closest degree needed to create the circle graph, and the resulting circle graph.
- relative frequency table:
- circle graph:
D1.4
create an infographic about a data set, representing the data in appropriate ways, including in tables and circle graphs, and incorporating any other relevant information that helps to tell a story about the data
- infographic on the topic of “Earth Day Clean-Up”:
- Infographics are used in real life to share data and information on a topic in a concise and appealing way.
- Infographics contain different representations, such as tables, plots, and graphs, with limited text including quotes.
- Information to be included in an infographic needs to be carefully considered so that it is clear, concise, connected, and makes an impact.
- Infographics tell a story about the data with a specific audience in mind. When creating infographics, students need to create a narrative about the data for that audience.
Note
- Creating infographics has applications in other subject areas, such as communicating key findings and messages in STEM projects.
To deepen students' understanding of what an infographic is and what it is used for, provide them with an infographic that has already been created, such as the “Earth Day Clean-Up” infographic found in the examples for D1.4. Ask questions such as:
- What audience do you think the infographic was intended for?
- What messages do you think the author was trying to share?
- What data visualizations has the author used? Why do you think they were chosen?
- How might you change the infographic to make it more effective?
Have students collect infographics and, as a class, make a list of the features they notice in the infographics. Discuss how these features can change depending on the audience and the story that the author is trying to tell about the data.
Have students create an infographic about data they have previously collected. Ask them to develop an outline of the message they would like the infographic to convey, with a specific audience in mind, such as the student council. Have students give and receive feedback on their plan, data visualization techniques, and design elements.
Data Analysis
D1.5
determine the impact of adding or removing data from a data set on a measure of central tendency, and describe how these changes alter the shape and distribution of the data
Height (cm) of Group H with Original Five Members |
Height (cm) of Group H with Two Additional Members |
156 | 156 |
149 | 149 |
160 | 160 |
168 | 168 |
160 | 160 |
231 | |
231 | |
Mode: 160 | Modes: 160 and 231 |
Median: 160 | Median: 160 |
Mean: 158.6 | Mean: 179.2 |
- impact on measures of central tendency:
- Whereas the original set of data had one mode, the second set of data has two modes.
- The median remains the same for both sets of data.
- The mean rises by more than 20 cm in the second data set because the two new group members are so much taller than the original five members.
- Adding or removing a data value that is not the most frequent value in the set will not impact the mode.
- Adding data values that are extremely different from the existing data values can have a significant impact on the measures of central tendency. As a result, the distribution and the shape of the data shown in the graphs can change.
- Removing data values that are clustered to one end or the other of an ordered data set can significantly impact the measures of central tendency. As a result, the distribution and the shape of the data shown in the graphs can change.
Note
- Outliers are measures that are significantly different from the other measures. They may mean that something has gone wrong in the data collection or they may represent a valid, unexpected piece of the population needing further clarification.
Have students add more data to a qualitative data set that they have previously collected. Ask them to identify the impact of adding the new data by looking for similarities and differences between the two data sets. For example, the data below shows the genres of the “top 50 movies”:
Then more data is added in order to show the “top 100 movies”:
Ask students what they notice and wonder about when comparing the original set of data with the expanded set. For example:
- What impact does the addition of the new data have on the mode?
- What movies are the most popular with the addition of the new data? How does this compare with the original data?
- What new genres appear with the addition of the new data?
- How has the circle graph changed with the addition of the new data?
Provide students with a quantitative data set that has outliers. Have them determine the mean, median, and mode for the data set with and without the outliers, and then compare the results.
For example, assign students to work in groups of five. Ask them to measure their heights in centimetres and to then find the mean, median, and mode for this data set. Next, ask students to add the following data values, representing the tallest players in NBA (National Basketball Association) history, to their data set:
- Gheorghe Mureșan, 231 cm
- Manute Bol, 231 cm
Ask students to recalculate the mean, median, and mode with this additional data and note how each of these measures have been affected by the addition of the new data.
Discuss why outlier data is important to consider, and introduce the idea that outliers must be considered with respect to the context. In this scenario, the outlier data does not help understand the typical height of Grade 7 students and should therefore be discounted.
D1.6
analyse different sets of data presented in various ways, including in circle graphs and in misleading graphs, by asking and answering questions about the data, challenging preconceived notions, and drawing conclusions, then make convincing arguments and informed decisions
- data presented in a variety of ways:
- question that requires reading and interpreting data from a graph or table:
- What percentage of the top 100 movies are comedy?
- What genres appear only in the top 100 movies and not in the top 50?
- question that requires finding data from a graph or table and using it in a calculation:
- The circle graph shows the proportions of students that worked in Park 1, Park 2, and Park 3 for the Earth Day clean-up. If there are 500 students in the school, how many more students worked in Park 2 than Park 3?
- How many more comedies were there in the top 100 movies than in the top 50 movies?
- question that requires using data to make an inference or prediction:
- Why do you think adventure movies appeared the most in the top 50 movies?
- Why were more students assigned to Park 1 than Park 3?
- misleading circle graph:
- The proportions in the graph below are not representative of the percentages given:
- When interpreting a circle graph, the size of the slices (sectors) will help indicate which category is greatest or least. Sometimes the actual amount is needed, and this will require the percentage to be multiplied by the total number of data values.
- All of the slices (sectors) in a circle graph should add up to 100%.
- Looking at the angle of a sector can help in estimating the percentage that a sector takes up.
- Fractions can also describe the sectors of a circle graph, e.g., if a sector takes up half of the circle, it would represent half of total data.
- Sometimes graphs misrepresent data or show it inappropriately and this can influence the conclusions made about the data. Therefore, it is important to always interpret presented data with a critical eye.
- Data presented in tables, plots, and graphs can be used to ask and answer questions, draw conclusions, and make convincing arguments and informed decisions.
- Sometimes presented data challenges current thinking and leads to new and different conclusions and decisions.
- Questions of interest are intended to be answered through the analysis of the representations. Sometimes the analysis raises more questions that require further collection, representation, and analysis of data.
Note
- There are three levels of graph comprehension that students should learn about and practise:
- Level 1: information is read directly from the graph and no interpretation is required.
- Level 2: information is read and used to compare (e.g., greatest, least) or perform operations (e.g., addition, subtraction).
- Level 3: information is read and used to make inferences about the data using background knowledge of the topic.
Provide students with, or have students bring in, misleading graphs found in the news or advertising media, including bar graphs, histograms, and circle graphs. Discuss why and how they are being used, as well as who may benefit or be disadvantaged by the representation. For examples, have students discuss the ways in which a three-dimensional circle graph can be misleading: