D1. Collection, Representation, and Analysis of Data
Specific Expectations
Application of Data
D1.1
identify a current context involving a large amount of data, and describe potential implications and consequences of its collection, storage, representation, and use
- contexts involving large amounts of data:
- census data collected by the government
- personal health data and biometric data
- personal user data collected when purchasing goods and services at stores or online
- personal user data collected through social media websites, apps, and the search history of users of online search engines
- big data used for machine learning
- data that is centrally collected through fitness websites and apps
- long-term or large-scale scientific research in climate science or epidemiology and population biology
- potential implications and consequences:
- the ability to better assess the need for community programs and services and to enact policy and funding changes
- the ability to develop and evaluate targeted advertising campaigns and evaluate their success and impact
- the ability to refine business intelligence to determine what individuals might be interested in and, accordingly, the content that they are exposed to online
- the ability to make advancements in scientific and technological research and development
- the need for privacy protection and other security aspects of data storage
Teachers can:
- facilitate student-led discussions of current contexts that are of interest to them;
- invite students to share their own experiences of potential implications and consequences of data collection, storage, representation, and use, creating an inclusive learning environment where students feel safe sharing;
- highlight the implications and consequences of data collection, storage, representation, and use from the perspectives of individuals, various levels of government, corporations, and community organizations;
- provide opportunities for students to reflect on the ways in which data can be misrepresented and used to mislead audiences.
- What are some everyday situations in which personal data is collected? Who is collecting this data? Who might be using it? How might they be using it?
- How is the importance of individual privacy weighed against the societal value of collecting certain data?
- Social media platforms are constantly collecting data on their users: what you post and posts you look at and like, whom you follow and what topics you search, and even whether you linger on a post while scrolling through your feed. They can often use this data to personalize your experience on their platform and also share this data with third parties that use this data to target ads to you personally. What implications does this have on how you use social media?
- Mapping apps that display traffic information get that information by tracking the phones of the people in traffic. What are some advantages and disadvantages of this method of collecting traffic data?
- What are some issues related to the collection and use of Indigenous data? What could be some possible solutions to those issues?
- How can large amounts of data that have been collected over a long period of time (e.g., data related to climate change) be used to make predictions about the future?
Have students examine how cryptocurrencies use large server farms to collect, store, and mine information. Have students describe implications and consequences of these large server farms, including the amount of energy they consume.
Have students discuss how biometric data is captured and used to identify people using facial recognition software. Have students describe implications and consequences of the collection, storage, and use of biometric data by various companies and organizations.
Have students investigate the sources of data used to monitor the effects of climate change. Have students discuss how this data is being used to make predictions about future scenarios, and how these scenarios could inform the actions that people take now.
Representation and Analysis of Data
D1.2
represent and statistically analyse data from a real-life situation involving a single variable in various ways, including the use of quartile values and box plots
- real-life situations involving a single variable:
- lengths of commutes to school for a given group of students
- amount of a given pesticide found in water samples collected from a local river
- magnitudes of earthquakes in a given year, using the Richter scale
- salaries of employees in an organization
- amount of caffeine or sugar in various beverages
- various representations:
- graphical:
- box plot representing data involving a single variable:
- graphical:
- back-to-back box plots comparing the distributions of multiple groups:
- numerical:
- measures of central tendency (mean, median, or mode, as appropriate for the data)
- measures of spread (range and interquartile range)
- five-number summary (lowest value, first quartile, median, third quartile, greatest value)
- statistical analysis:
- descriptions of the centre, spread, outliers, and shape of the data set based on the numerical and graphical representations
Teachers can:
- support students in selecting an appropriate data set from a real-life situation involving a single variable;
- provide appropriate technological tools (e.g., statistical tools, spreadsheets, coding environments) as necessary for students to represent and analyse the data;
- revisit learning from earlier grades related to the graphical representations of data involving a single variable, such as histograms, stem-and-leaf plots, circle graphs, and various types of bar graphs, and distinguish between discrete and continuous data;
- support students in understanding the differences in the measures of central tendency and knowing when each one might be appropriate;
- continue to support students in developing their proportional reasoning skills, including the use of appropriate scaling in their representations;
- support students in expanding their communicative repertoire to include a broader range of related terminology and conventions, particularly for English language learners.
- How do you identify quartile values?
- What steps do you follow to create a box plot?
- When do you use a box plot to represent data?
- How do you know which data values might be outliers?
- What information does a box plot show that a histogram does not?
- What information does a stem-and-leaf plot show that a box plot does not?
Have students represent the CO2 emissions in metric tons per capita from countries with populations of more than 20 million, using appropriate graphical and numerical representations.
Have students describe the shape, centre, and spread of the distribution of the number of cyclones formed over the Atlantic basin over the last 50 years, and any outliers.
Have students compare the distributions of the average number of points per game two basketball players scored in each season of their careers. They might create a box plot that looks something like the following:
Have students write code using subprograms to determine the range for a data set.
The following is an example of pseudocode for a subprogram that scans through a list of data to determine the minimum number.
findMinimum subprogram
subprogram findMinimum (numList) |
numOfItems = number of items in the list |
minimum = value of the first item in the list |
itemNum = 2 |
repeat while (itemNum<=numOfItems) |
if value of itemNum < minimum |
minimum = value of itemNum |
itemNum = itemNum + 1 |
The following is an example of pseudocode for a subprogram that scans through a list of data to determine the maximum number.
findMaximum subprogram
subprogram findMaximum (numList) |
numOfItems = number of items in the list |
minimum = value of the first item in the list |
itemNum = 2 |
repeat while (itemNum<=numOfItems) |
if value of itemNum < maximum |
maximum = value of itemNum |
itemNum = itemNum + 1 |
The following is an example of pseudocode that calls up the two subprograms to determine the range.
main program
range = 0.00 |
run subprogram findMaximum |
run subprogram findMinimum |
range = maximum - minimum |
output “The range of the set of values is,” range |
Pseudocode does not represent a specific programming language. It can be adapted to work with a variety of programming languages and/or environments.
D1.3
create a scatter plot to represent the relationship between two variables, determine the correlation between these variables by testing different regression models using technology, and use a model to make predictions when appropriate
- two variables with relationships:
- the fuel consumption of a car and its speed
- the amount of saturated fats (in grams) and the number of calories in different granola bars
- the amount of money borrowed and the interest rate that is offered
- the size of the labour force and the employment rate
- correlation:
- use of the correlation coefficient r to describe the strength and the direction of a linear relationship between two variables
- strong positive linear correlation:
- weak positive linear correlation:
- regression models constructed using technology:
- linear regression models
- non-linear regression models
Teachers can:
- support students in describing the relationship observed on the scatter plot by discussing the direction (positive or negative), strength (strong, moderate, or weak), outliers, and form (linear or non-linear);
- ensure students have access to appropriate technological tools (e.g., statistical software, spreadsheets, coding environments when creating the scatter plot, determining the correlation, and testing different regression models;
- highlight, through the use of technology, linear and non-linear regression models as applications of linear and non-linear relations;
- support students in selecting the appropriate strategies to make predictions;
- facilitate conversations with students about when a regression model is and is not appropriate for making predictions;
- support students in expanding their communicative repertoire to include a broader range of related terminology and conventions, particularly for English language learners.
- How do you make a scatter plot?
- What is the purpose of a scatter plot?
- In what ways can you describe the relationship between two variables on a scatter plot?
- What information does the correlation coefficient give us?
- How do outliers influence the value of the correlation coefficient?
- What is the difference between correlation and causation?
- What are the limitations involved in making predictions using regression models?
Have students create a scatter plot to show the relationship between average temperature and average wind speed at a given location over a period of time. Have students determine the appropriate regression model and use it to make predictions.
Give students different regression models of the same set of two-variable data and have them determine which model best represents the relationship.
Have students arrange six scatter plots of varying correlations according to the direction and strength of the association, and explain their reasoning.