next up previous
Next: Base software environment Up: Computational thinking for statisticians: Previous: The Problem

The Plan

As with any educational offering, we need to consider four essential interdependent components of course design: the goals, the content, the audience, and the actual delivery. In this course the goals and delivery transport to different audiences while the details of the content may need adaptation for particular audiences.

The primary goal is for the student to experience computational thinking within the context of statistics. They should come away with a sense of excitement and some confidence about what they can achieve with a computer. A secondary goal is the exploration of the interplay between Computer Science and Statistics -- the topics and rigour will depend upon the audience.

The course is delivered in a single 12 week semester,the first 2/3 of which are formal lectures with topics covered fairly quickly. In that time, assignments can be given to encourage students to engage in the course early. The remaining four weeks (or 12 hours of regularly scheduled classes) are given over to the students to work on a large group project. It is important that the project be run entirely by the students; in the final 1/3 of the course the instructor's role becomes that of critic, cheerleader, and information resource.

Like everyone else, students will become actively engaged if given real responsibility and creative opportunity. Responsibility is encouraged in many ways -- the use of a group project being particularly effective. At the end of term, each student hands in an independent report describing the overall project, the contributions of the individual members, and a more detailed description of their own contribution. Each student is assessed on their written report, their software, and their contribution to the discussion and direction of the project. Giving a clear expectation of activity is also important. I make it clear that I expect students to have worked together or independently between meetings in order to have material for presentation and discussion at the next meeting. I also declare a rarely invoked rule that I leave the meeting if no activity occurs for five consecutive minutes. Creativity can be encouraged by the instructor by recognizing and encouraging students interests. A willingness of the instructor to give impromptu lectures and pointers to resource material (or persons) is particularly helpful in this regard.

Groups of size five to seven seem about right. Our class sizes are small in statistical computing, typically about 7 to 12 students and at most 20. For larger class sizes the number of lectures could be reduced to cover the intersection of topics needed by all groups; other topics would be researched by the students as needed.

As regards our audience, the students are typically senior year undergraduates and have had several terms of work experience. Minimally they will have had 3 courses (a course being 12 weeks of lectures) in calculus, 3 in algebra, 2 in Computer Science, 1 in probability, 1 in statistics generally, and 1 course in applied linear regression. Before graduating they will have had at least 9 further courses in the mathematical and computational sciences. Typically students in this course will have previously taken other statistics courses and/or are taking other statistics courses at the same time. They will have used some statistical package, typically S.

Because it is the only advanced (third year) statistics course guaranteed to have been taken by everyone, the statistical content of the present course builds around applied linear regression. The formal lectures therefore concentrate on computational issues related to regression. These include such traditional statistical computing topics as roundoff error analysis, solving linear systems of equations, and discussion of various matrix decompositions. This discussion allows one to focus on the various calculations needed for least-squares regression and diagnostics (influential and collinearity). The statistical theory is review with the possible exception of the depth of treatment given to diagnostics. For the most part, the computational detail is new to them. The design, use, and critical assessment of statistical graphics is also explored in lectures. The remainder of the lectures are spent on programming principles and practice - data and procedural abstraction, object-oriented programming, as well as details of the programming language being used. Finally, some pointers on software implementation of statistical analysis strategy is given where objects represent steps in an analysis (e.g. Oldford and Peters, 1988). In the most recent offering the impromptu lectures given were on generating random variates, spline-based scatterplot smoothing, and nonparametric regression methods such as additive models and projection pursuit; these topics vary from offering to offering.

The students were assigned the task of organizing and developing an interactive display oriented system for carrying out a linear regression analysis. Throughout they were continually encouraged to think about what could be done, to move beyond what they had seen in commercial systems, to imagine what some idealized system should look like. And then to produce one.


 
next up previous
Next: Base software environment Up: Computational thinking for statisticians: Previous: The Problem

2000-05-17