Notebook
ASSESSMENT - IDENTIFY AN APPROPRIATE DATASET FOR USE IN A DATA INVESTIGATION ASSESSMENT - DEMONSTRATE HOW TO USE AN SQL STATEMENT TO RETRIEVE A DATASET NOTE: WE COULD USE pandasql TO RUN A SQL QUERY ON A DATAFRAME PUSHED INTO A SQLITE DATABASE
ASSESSMENT - DEMONSTRATE HOW TO LOAD IN FROM, OR SAVE DATA TO, A DATA FILE IN A RECOGNISED FORMAT
ASSESSMENT - DEMONSTRATE TWO OR MORE TECHNIQUES THAT PROVIDE AN OVERVIEW OF A NEW DATASET
ASSESSMENT - DEMONSTRATE TWO OR MORE TECHNIQUES THAT CAN BE APPLIED TO CLEAN A DATASET ASSESSMENT - DEMONSTRATE HOW TO SORT A DATASET
ASSESSMENT - ADD A NEW COLUMN TO A DATASET ASSESSMENT - GENERATE A NEW COLUMN FROM A PRE-EXISTING COLUMN
ASSESSMENT - DEMO
ASSESSMENT - DEMONSTRATE A WAY OF SUBSETTING A DATASET BASED ON ONE OR MORE ROW BASED CRITERIA
ASSESSMENT - DEFINE AND APPLY A SIMPLE PYTHON FUNCTION THAT ACCEPTS ONE OR MORE PARAMETERS AND RETURNS ONE OR MORE VALUES
ASSESSMENT - DEMONSTRATE HOW TO RESHAPE A DATASET, EG USING PIVOT, MELT, STACK OR UNSTACK OPERATORS
ASSESSMENT - DEMONSTRATE A WAY OF SUBSETTING A DATASET BASED ON ONE OR MORE COLUMN BASED CRITERIA
ASSESSMENT - CRITIQUE THE APPROPRIATENESS OF A PARTICULAR QUESTION ASKED OF THE DATA IN A PARTICULAR WAY
ASSESSMENT - DEMONSTRATE A TECHNIQUE FOR CLEANING OR REDUCING A DATASET BASED ON THE PRESENCE OF NULL VALUES
ASSESSMENT - DEMONSTRATE HOW TO PERFORM OPERATIONS ACROSS A ROW
ASSESSMENT - USE A BOOLEAN OPERATOR TO FILTER A DATASET BASED ON TWO OR MORE CRITERIA
ASSESSMENT - GENERATE ONE OR MORE QUESTIONS TO ASK OF A SELECTED DATASET
ASSESSMENT - DEMONSTRATE HOW TO GROUP A DATASET ACCORDING TO ONE OR MORE CRITERIA ASSESSMENT - DEMONSTRATE HOW TO ACCESS A PARTICULAR GROUP AS A GROUP
ASSESSMENT - DEMONSTRATE HOW TO PROCESS ELEMENTS IN A GROUP BY GROUP ASSESSMENT - INTERPRET THE RESULTS GENERATED BY ASKING A PARTICULAR QUESTION OF A SELECTED DATASET
ASSESSMENT - DEMONSTRATE HOW TO PROCESS A GROUP BASED ON GROUP PROPERTIES
ASSESSMENT - DEMONSTRATE HOW TO MERGE TWO OR MORE DATASETS ASSESSMENT - DEMONSTRATE HOW TO MANIPULATE THE MARGINAL PROPERTIES OF A DATA TABLE (EG INDICES, COLUMN HEADINGS)
ASSESSMENT NOTES Sports such as gymnastics use criteria based scoring where participants must demonstrate several elements from different difficulty groups (eg http://www.british-gymnastics.org/technical-information/selection/womens-artistic/cat_view/334-regions-and-home-countries/467-south-east/578-event-info ). One approach to asssessing notebooks on an investigation around a free-data-choice activity might be to require students to demonstrate a range of technical skills (perhaps self-identifying them to reinforce reflection about their work) in an appropriate context. I have tried to identify - and abstract - assessment opportunities along the way; should students be required to do the same as part of the assessment as part of a critique of their own work? Looking back over the assessment points, many of the steps were included *becuase the data needed treating in some way in order to ask a particular question or perform a particular transformation*. How can we capture the relationship between questions asked of the data and how those questions prompt certain transformations of the data in order to answer them? Many data anlayses are likely to include false starts that still take time to explore. Students should be allowed to include 'false-start' components in their script if they derive from a plausible initial line of investigation and demonstrate a required element. This notebook has focussed on the demonstration of particular skills using a particular programming language (Python) and programming library within that language (pandas). Some (many? all?) of the questions could have been asked directly of the dataset using SQL. Should the notebook require students to demonstrate solutions to the same problem in different languages? What assessment points are missing? The notebook does not include any graphical representations of the data (no charts). The notebook does not include any statistical analyses, other than simple rankings, sorting and extrema detection. The notebook does not require the student to model any data form or ingest any data into, a database. The notebook does not require students to do any more than single line programming at each step. That is, the student is not required to develop any functions (other than one line lambda functions) at any stage. As a rule of thumb, I estimate that each question cell, code cell, intepretation cell combination will take of the order 5-15 minutes to produce.