Start by watching the short video What is Big Data and how does it work?
In previous lessons, we've made simple tools and games. This lesson is about combining the power of arrays and loops to write programs for one of the most important tasks computers have: processing data.
We'll see that arrays and loops work hand-in-hand. Arrays store hundreds, thousands or millions of values, and loops allow us to access and manipulate each stored value in the same way.
Without the simple techniques in this lesson, we'd never be able to work with big data.
In this lesson, students will:
Begin by watching the video demonstrating the student scores program:
As a class, read over the structural pseudocode below then answer following questions one by one to help understand the program and what it does. (Click here for a Scratch adaptation of the program.)
BEGIN student_marks ← [83, 72, 92, 65, 54, 54, 78, 67, 52, 54, 48, 69, 87, 55, 51, 52, 44, 57, 79, 64, 66, 19, 82, 71, 66, 31, 87, 83, 64, 78] Using a loop... Display each student's number and mark from student_marks high_mark ← determine the highest mark in the student_marks Using a loop... low_mark ← determine the lowest mark in the student_marks Using a loop grade_a ← assemble an array of student numbers with marks >=85 grade_b ← assemble an array of student numbers with >=65 and <85 grade_c ← assemble an array of student numbers with >=45 and <65 grade_d ← assemble an array of student numbers with >=25 and <45 grade_e ← assemble an array of student numbers with <25 Display high_mark Display low_mark Display grade_a Display grade_b Display grade_c Display grade_d Display grade_e END
The array grade_a stores the student numbers for all marks from the input array that are greater than or equal to 85. grade_b works on marks greater than or equal to 65 and less than 85, etc.
(Student numbers are just the positions in the input array, ie. the first mark in the input array is student number 0, the second mark is student number 1, and so on.)
But these functions are limited in how they do what they do. While print(student_marks) will display the array contents, it does not allow control over how it is displayed. The function max(student_marks) gives us the highest value, but what if we wanted to ignore outliers, or perform some custom operation while working through the array?
For more on setting up and choosing a language, see Setting Up.
The above video demonstrates coding the first part of the student marks program in Python. Try it yourself! Choose a link to start with the data already in place, then check the completed code so far.
First, add a few more scores to the array at the start: 78, 71, 14, 96 and 84. Test your program to make sure it still works as expected.
Next, see if you can determine the average (mean) score and display it. You'll need a loop to find the sum of all the scores first.
These challenges use the skills covered so far.
A class of students has calculated the grams of sugar in their lunches, and you have collected the following data:
|17 g||10 g||14 g||15 g||12 g||12 g||14 g||9 g|
|12 g||23 g||6 g||12 g||24 g||10 g||4 g||16 g|
|17 g||10 g||21 g||23 g||3 g||20 g||8 g||7 g|
Your task is to write a program to assign star ratings to the lunches based on the amount of sugar.
For this challenge, you don't need to create arrays for the star ratings, but you do need to loop through the main data array and display a rating for each lunch as follows:
Lunch no. 0 gets 2 stars. Lunch no. 1 gets 3 stars. Lunch no. 2 gets 3 stars.
a. Prepare pseudocode first.
BEGIN lunches ← [17, 10, 14, 15, 12, 12, 14, 9, 12, 23, 6, 12, 24, 10, 4, 16, 17, 10, 21, 23, 3, 20, 8, 7] noOfLunches ← 24 For i from 0 to noOfLunches - 1 If lunches[i] < 6 starRating ← 5 Else If lunches[i] < 10 starRating ← 4 Else If lunches[i] < 15 starRating ← 3 Else If lunches[i] < 20 starRating ← 2 Else starRating ← 1 End If Display 'Lunch no.', i, 'gets', starRating, 'stars.' End For END
a. A Selection Sort is one of the simplest algorithms for sorting values in an array, that is, putting the values in order. In this challenge, you'll write code for a selection sort that works by repeatedly moving the smallest value in an unsortedList into a sortedList.
First, read the pseudocode below. As a class or in pairs, use a trace table to test the algorithm.
BEGIN unsortedList ← [11, 25, 12, 22, 64] sortedList ←  noOfValues ← length of unsortedList Display "Here's the unsorted array: ", unsortedList // Repeat the whole algorithm enough times to move every value. For i from 0 to noOfValues - 1 // Identify the smallest value currently in the unsorted list. smallest ← 100 For j from 0 to length of unsortedList - 1 If unsortedList[j] < smallest smallest ← unsortedList[j] End If End For // Move the smallest value across to the sorted list. Remove smallest from unsortedList Append smallest to sortedList // Display as we go. Display "Here's the sorted array: ", sortedList End For END
In Python, the remove function will search for a value and remove it from an array.
// Search for and remove "Australia" from the array.
// Remove one country at position 3 in the array.
Finally, try adding more values (below 100) to the unsorted list. Does the algorithm still work?
Your friend Zippy is developing a fairground game in which a player is asked to choose a number between 1 and 60 inclusive. To decide whether the player wins, the chosen number goes through the following test:
Zippy says the game is very generous because most numbers between 1 and 60 will result in a win. But your other friend Wanda says that people don't choose numbers evenly. She says it's more of a curve with extreme numbers like 2 and 59 being much rarer than numbers like 29 or 31.
To really investigate how fair the game is, you decide to code a simulator:
Next, use a loop to run the winning or losing test on simpleNumbers. You need to be able to display the total number of winners and the total size of the array, eg.
"Out of 60 numbers, there were ??? winners. This is a ??? percent chance of winning."