294x Filetype PDF File size 0.84 MB Source: www.stats.ox.ac.uk
Statistical Data Mining
B. D. Ripley
May2002
c c
B.D.Ripley1998–2002. MaterialfromRipley(1996)isB.D.Ripley1996.
c
MaterialfromVenablesandRipley(1999,2002)isSpringer-Verlag,NewYork
1994–2002.
i
Introduction
This material is partly based on Ripley (1996), Venables & Ripley (1999, 2002)
andtheon-linecomplements availableat
http://www.stats.ox.ac.uk/pub/MASS4/
Mycopyrightagreements allow me to use the material on courses, but no further
distributionis allowed.
The S code in this version of the notes was tested with S-PLUS 6.0 for
Unix/Linuxand Windows,andS-PLUS 2000 release 3. With minorchanges it
workswithRversion 1.5.0.
Thespecific add-ons forthe material in thiscourse are available at
http://www.stats.ox.ac.uk/pub/bdr/SDM2001/
All the other add-on libraries mentioned are available for Unix and for Win-
dows. Compiledversions forS-PLUS 2000 are availablefrom
http://www.stats.ox.ac.uk/pub/SWin/
and for S-PLUS 6.x from
http://www.stats.ox.ac.uk/pub/MASS4/Winlibs/
ii
Contents
1 OverviewofDataMining 1
1.1 Multivariateanalysis ........................ 2
1.2 Graphical methods ......................... 3
1.3 Clusteranalysis........................... 13
1.4 Kohonen’sself organizingmaps .................. 19
1.5 Exploratoryprojectionpursuit ................... 20
1.6 Anexampleofvisualization .................... 23
1.7 Categoricaldata........................... 30
2 Tree-based Methods 36
2.1 Partitioningmethods . . . ..................... 37
2.2 Implementation inrpart ...................... 49
3 Neural Networks 58
3.1 Feed-forwardneuralnetworks ................... 59
3.2 Multiplelogisticregression and discrimination .......... 68
3.3 Neuralnetworksinclassification.................. 69
3.4 Alookatsupportvector machines ................. 76
4 Near-neighbour Methods 79
4.1 Nearest neighbourmethods ..................... 79
4.2 Learningvectorquantization.................... 85
4.3 Forensicglass............................ 88
5 Assessing Performance 91
5.1 Practicalwaysofperformanceassessment............. 91
5.2 Calibrationplots........................... 93
5.3 PerformancesummariesandROCcurves ............. 95
5.4 Assessinggeneralization ...................... 97
References 99
Contents iii
Index 105
no reviews yet
Please Login to review.