Statistical Models in S
                               edited by
                 John M. Chambers and Trevor J. Hastie
                                 1992
                              608 pages 
                          ISBN 0-534-16765-9
             Wadsworth and Brooks/Cole Advanced Books and Software


Contents:

1.  An Appetizer
    (by J.M. Chambers, T.J. Hastie)
2.  Statistical Models 
    (by J.M. Chambers, T.J. Hastie)
3.  Data for Models
    (by J.M. Chambers)
4.  Linear Models
    (by J.M. Chambers)
5.  Analysis of Variance; Designed Experiments
    (by J.M. Chambers, A.E. Freeny, R.M. Heiberger)
6.  Generalized Linear Models
    (by T.J. Hastie, D. Pregibon)
7.  Generalized Additive Models
    (by T.J. Hastie)
8.  Local Regression Models
    (by W.S. Cleveland, E. Grosse, W.M. Shyu)
9.  Tree-Based Models
    (by L.A. Clark, D. Pregibon)
10. Nonlinear Models
    (by D.M. Bates, J.M. Chambers)
Appendix A. Classes and Methods: Object-Oriented Programming in S
    (by J.M. Chambers)
Appendix B. S Functions and Classes

New programming functionality has been added to the ``New S'' language
since the publication of the ``New S'' manual in 1988 (The New S language:
A programming environment for data analysis and graphics, by R.A. Becker,
J.M. Chambers, and A.R. Wilks).
By using this extension to the ``New S'' language, the ten authors of this
manual are able to develop a
unified approach to the fitting and analysis
of a fairly complete collection of response models (traditional and recent).
The book is highly recommended to anyone
interested in using New S (or is it New-New S?),
in applying more recent response models, or
in research in statistical computing.

The editors are to be congratulated for a surprisingly well organized
book (given ten authors were involved).
Each chapter is organized into four primary sections.
The first describes the statistical methodology, the second how to use
the S functions and data structures, the third how to extend or
specialize the given software, and the fourth contains more detail on the
computations.
Consequently, by reading only the first two primary sections of
each chapter one comes away well equipped to use the software in
a host of applications.
For most readers this will be enough.
As interest and circumstance demand, the remaining sections of any chapter
can be read with profit.
Although early chapters are required reading for later chapters,
chapters 7 through 10 can be read independently of one another.

The only problem with an otherwise excellent piece of work is the software
model on which the statistical models are based.
``Object-oriented programming in S'' is unlike any object-oriented programming
language I know.
At best, New S's functional programming style has been extended so that the
user can write functions which dispatch to other functions depending only
on the value of the ``class'' attribute of one of its arguments.
True, this makes it possible to write functions which are ``generic''
but it is a far cry from object-oriented programming.
Despite the impression given to the casual reader,
there is no such thing as a ``class'' in this New S; there is merely an
attribute called ``class'' which can appear on any S data structure.
The generic functions look to this attribute to decide which one of a collection
of New S functions (called methods) to invoke.
The dispatching is often called ``method lookup'' and in the New S model is
confused with the definition of a class.

In an object-oriented programming language, classes are data structures
which can themselves be manipulated.
Minimally, they can be related one to another
through the notion of inheritance.
As an example consider using classes as data structures to describe
birds.
We might define
a general class called ``bird'' which would be a template data structure
representing the properties held by birds in general.
A second class called ``flightless-bird'' could be introduced to represent
birds which have evolved to a flightless state (e.g. penguins and ostriches).
It is clear that every element of the class ``flightless-bird'' is also
an element of the class ``bird''.
It is also clear that the converse does not hold; an element of the class
``bird'' is not necessarily also an element of the class ``flightless-bird''.
This distinction is reflected in the software by asserting that
``flightless-bird'' is a subclass of ``bird''.
Consequently any property of ``bird'' is inherited by ``flightless-bird''.
If I had a pet ostrich called Frank, he would be represented in this system
as an ``instance'' of the class ``flightless-bird''  -- a ``flightless-bird
object''.
A generic function that operated on birds might be ``fly'' which would cause
the bird-object to fly from its present position to a new specified position.
If applied to the object representing Frank however nothing should happen
because Frank is a ``flightless-bird''.
This is implemented in software by defining a generic function called ``fly''
and separate ``fly'' methods for each of the classes ``bird'' and 
``flightless-bird''.
The method lookup procedure typically traverses the inheritance hierarchy
of the classes to determine which is the most specific method for a given
argument to the generic function call.  (In some systems, this lookup
can be redefined.)

In the extended New S system, Frank would be represented by making a
``bird'' data structure and pushing the string ``flightless-bird'' onto
its class attribute vector.
No class called ``flightless-bird'' would exist as a data structure.
A separate New S function, fly.flightless-bird, would be defined to represent
the fly method for flightless-birds.
So far so good.

In an object-oriented programming language
extending a system developed using classes and generic functions
is simple and powerful.
Should I wish to isolate the properties and behaviours which distinguish
migratory birds
I do so by defining a class, ``migratory-birds'', as a subclass
of ``bird'' distinct from the ``flightless-bird'' class.
I need not re-implement any methods and data-fields
for migratory birds which are defined for birds in general.
Should I wish to 
Consider now a statistical example
-- implementing generalized linear models (glm)
and standard linear models (lm).
Because a linear model is really a special kind of generalized
linear model one might naturally define two classes, say ``lm'' and ``glm''
and assert that ``lm'' is a specialized subclass of ``glm''.
As a consequence, whatever property one expects of a ``glm'' would also be
found on an ``lm'' since it is simply a special kind of ``glm''.
through ``inheritance''.
Method lookup will typically traverse 

Many object-oriented systems have long since left the Smalltalk-80 model
where generic functions can dispatch on the type of only one of its
arguments and now permit dispatching to depend on the type of any number
of its arguments.
The Common Lisp object system is an example of one such system.
It is difficult to see how the New S could be extended yet again to accommodate
this kind of method-lookup.


There are no classes in this extension of new S.
While it is claimed througout that object-oriented programming is used
throughout this is not so.
programming 

The standard response models of the classic linear model (lm) and the 
generalized linear model (glm) (including quasi-likelihood models)
provide the basic intuition for the design of the unified approach.
Newer methodologies
like tree-based models for classification and regression, local regression
models (loess), and generalized additive models (gam) are treated in similar
fashion.
Here ``similar fashion'' is an understatement; any common elements
of the analysis in these response models are enforced by the
design of the software.
For example, with
the exception of the non-linear model, the fitting procedure of any
response model accepts an extended version of the Wilkinson and Rogers 
notation for specifying the structural part of the model (1973,
Applied Statistics, 22, pp 392-399).
(A non-linear model must explicitly define its parameters.)
While the commonality is emphasised, specialized treatment in specific
circumstances is encouraged.
For example, to ensure the correct analysis of variance for
some experimental designs
the formula specification is extended to allow identification of different
error sources for analysis of variance data structures (aov).

Common and specialized behaviour for different response models is easily
specified programmatically
through the two
extensions to the New S language described in this book.
The first is given by the twin notions of generic functions and specialized
methods.
As an example, consider the function ``anova''.
As its first argument, it takes a fitted ``model'' data structure and produces
an anova style table summarizing the fitted model. 
``Anova'' should (and does) work for any fit produced by an aov
an lm, a glm, a gam,
and a loess fit.
By this it is meant that there is some sense in which we would like
to producing an anova-like table
for any of these fits.
Yet what should be produced will depend on the kind of
data structure given as its first argument.
If for example a glm fit is given, then an appropriate ``analysis of deviance''
table is printed.
This specialization is achieved by having the ``anova'' function automatically
dispatch
to the function ``anova.glm'' whenever it is presented with a glm fitted model.
Here the ``anova'' is a generic function and ``anova.glm'' one of its
specialized methods.

The second extension allows arbitrary S data structures to be related
to one another through some
kind of inheritance.
This is implemented by adding a new attribute called ``class''
on S data structures.
For example, a glm fitted model will have as its class attribute the
vector (in New S terminology) given by <"glm", "lm">.
Operationally this means that any generic function (e.g. "anova")
that is called on a
glm will look first for a function of the same name but ending in ".glm"
(e.g. "anova.glm") to apply to the argument.
If there is one then it is used.
If there is not, it looks again but this time for one ending in ".lm" (e.g.
"anova.lm").
If the entire vector of class attributes fails to turn up an appropriate
function, then finally the ending ".default" is tried (e.g. "anova.default")
-- there may or may not be a ".default" method defined.

On the surface, these two extensions seem to endow S with some of
the principal features
that have come to be associated with the phrase "object-oriented programming"
(oo-programming for short).
Indeed, the book is strewn throughout
with the common terminology of oo-programming
(e.g. classes, generic functions, methods and the like).
The reader should not be misled.
What may be "Object-oriented programming in S'' as described in the Appendix A
is at best a poor cousin to what is generally understood to be oo-programming.

In oo-programming languages with which I have worked (Smalltalk-80, LOOPS,
CLOS) classes exist as data structures in their own right.
They can be manipulated, instantiated and ...

THIS IS GOING TO BE FAIRLY TECHNICAL AND NOT A LITTLE NEGATIVE.
The difficulty I have with the book (and hence the software) is its view of
object-oriented programming and the consequences this view has had on the
design of the statistical software.

1. classes do not exist in their own right, there is no relation ship between
classes. 
DINDE, Arizona, and Quail are what is known as class-based systems.
In such systems a distinction is drawn between a class data structure and
the data structure called an instance of that class; the former is a template
for the latter.
Classes are related one to another through inheritance.
In contrast to class-based systems
Lisp-Stat is a prototype or exemplar based system.
That means there is no distinction between instances and classes  -- any
object can inherit properties and behaviour from any other object.
New S's is purportedly a class based system but more accurately is