jagomart
digital resources
picture1_Data Mining Notes 179220 | Data Mining In Excel


 167x       Filetype PDF       File size 2.68 MB       Source: mineracaodedados.files.wordpress.com


File: Data Mining Notes 179220 | Data Mining In Excel
data mining in excel lecture notes and cases draft december 30 2005 galit shmueli nitin r patel peter c bruce c 2005 galit shmueli nitin r patel peter c bruce ...

icon picture PDF Filetype PDF | Posted on 29 Jan 2023 | 2 years ago
Partial capture of text on file.
                Data Mining In Excel: Lecture Notes and Cases
                                Draft December 30, 2005
                                     Galit Shmueli
                                     Nitin R. Patel
                                     Peter C. Bruce
                      (c) 2005 Galit Shmueli, Nitin R. Patel, Peter C. Bruce
                                       Distributed by:
                                    Resampling Stats, Inc.
                                      612 N. Jackson St.
                                     Arlington, VA 22201
                                          USA
                                      info@xlminer.com
                                      www.xlminer.com
           2
                            Contents
                            1 Introduction                                                                                               1
                                1.1  WhoIs This Book For? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            1
                                1.2  What Is Data Mining? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          2
                                1.3  Where Is Data Mining Used? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            3
                                1.4  The Origins of Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          3
                                1.5  The Rapid Growth of Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . .             4
                                1.6  Whyare there so many different methods? . . . . . . . . . . . . . . . . . . . . . . . .              5
                                1.7  Terminology and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          5
                                1.8  Road Maps to This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            7
                            2 Overview of the Data Mining Process                                                                        9
                                2.1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      9
                                2.2  Core Ideas in Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         9
                                     2.2.1    Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      9
                                     2.2.2    Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     9
                                     2.2.3    Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     10
                                     2.2.4    Predictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
                                     2.2.5    Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
                                     2.2.6    Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
                                     2.2.7    Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
                                2.3  Supervised and Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . .           11
                                2.4  The Steps in Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         11
                                2.5  Preliminary Steps      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   12
                                     2.5.1    Organization of Datasets      . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   12
                                     2.5.2    Sampling from a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        13
                                     2.5.3    Oversampling Rare Events        . . . . . . . . . . . . . . . . . . . . . . . . . . . .   13
                                     2.5.4    Pre-processing and Cleaning the Data . . . . . . . . . . . . . . . . . . . . . .          13
                                     2.5.5    Use and Creation of Partitions . . . . . . . . . . . . . . . . . . . . . . . . . .        18
                                2.6  Building a Model - An Example with Linear Regression . . . . . . . . . . . . . . . .               20
                                2.7  Using Excel For Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          27
                                2.8  Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   30
                            3 Data Exploration and Dimension Reduction                                                                 33
                                3.1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     33
                                3.2  Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       33
                                3.3  Data Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         34
                                3.4  Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       36
                                3.5  Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       38
                                3.6  Reducing the Number of Categories in Categorical Variables . . . . . . . . . . . . . .             39
                                                                                   i
                            ii                                                                                                CONTENTS
                                3.7   Principal Components Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           39
                                      3.7.1    Example 2: Breakfast Cereals . . . . . . . . . . . . . . . . . . . . . . . . . . .         39
                                      3.7.2    The Principal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . .           43
                                      3.7.3    Normalizing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         44
                                      3.7.4    Using Principal Components for Classification and Prediction . . . . . . . . .              46
                                3.8   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    47
                            4 Evaluating Classification and Predictive Performance                                                         49
                                4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      49
                                4.2   Judging Classification Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .           49
                                      4.2.1    Accuracy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        49
                                      4.2.2    Cutoff For Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        52
                                      4.2.3    Performance in Unequal Importance of Classes . . . . . . . . . . . . . . . . .             55
                                      4.2.4    Asymmetric Misclassification Costs . . . . . . . . . . . . . . . . . . . . . . . .          59
                                      4.2.5    Oversampling and Asymmetric Costs . . . . . . . . . . . . . . . . . . . . . . .            62
                                      4.2.6    Classification Using a Triage Strategy . . . . . . . . . . . . . . . . . . . . . .          67
                                4.3   Evaluating Predictive Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .           68
                                4.4   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    70
                            5 Multiple Linear Regression                                                                                  73
                                5.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      73
                                5.2   Explanatory Vs. Predictive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . .           73
                                5.3   Estimating the Regression Equation and Prediction . . . . . . . . . . . . . . . . . . .             74
                                      5.3.1    Example: Predicting the Price of Used Toyota Corolla Automobiles               . . . . .   75
                                5.4   Variable Selection in Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . .         78
                                      5.4.1    Reducing the Number of Predictors          . . . . . . . . . . . . . . . . . . . . . . .   78
                                      5.4.2    How to Reduce the Number of Predictors . . . . . . . . . . . . . . . . . . . .             79
                                5.5   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    83
                            6 Three Simple Classification Methods                                                                          87
                                6.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      87
                                      6.1.1    Example 1: Predicting Fraudulent Financial Reporting . . . . . . . . . . . . .             87
                                      6.1.2    Example 2: Predicting Delayed Flights . . . . . . . . . . . . . . . . . . . . . .          88
                                6.2   The Naive Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        88
                                6.3   Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       89
                                      6.3.1    Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        89
                                      6.3.2    APractical Difficulty and a Solution: From Bayes to Naive Bayes                . . . . . .   90
                                      6.3.3    Advantages and Shortcomings of the Naive Bayes Classifier . . . . . . . . . .               94
                                6.4   k-Nearest Neighbor (k-NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           97
                                      6.4.1    Example 3: Riding Mowers . . . . . . . . . . . . . . . . . . . . . . . . . . . .           98
                                      6.4.2    Choosing k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       99
                                      6.4.3    k-NN for a Quantitative Response . . . . . . . . . . . . . . . . . . . . . . . .          100
                                      6.4.4    Advantages and Shortcomings of k-NN Algorithms . . . . . . . . . . . . . . . 100
                                6.5   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   102
                            7 Classification and Regression Trees                                                                        105
                                7.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     105
                                7.2   Classification Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      105
                                7.3   Recursive Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       105
                                7.4   Example 1: Riding Mowers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           106
                                      7.4.1    Measures of Impurity      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   108
                                7.5   Evaluating the Performance of a Classification Tree . . . . . . . . . . . . . . . . . . .           113
The words contained in this file might help you see if this file matches what you are looking for:

...Data mining in excel lecture notes and cases draft december galit shmueli nitin r patel peter c bruce distributed by resampling stats inc n jackson st arlington va usa info xlminer com www contents introduction whois this book for what is where used the origins of rapid growth whyare there so many dierent methods terminology notation road maps to overview process core ideas classication prediction association rules predictive analytics reduction exploration visualization supervised unsupervised learning steps preliminary organization datasets sampling from a database oversampling rare events pre processing cleaning use creation partitions building model an example with linear regression using exercises dimension practical considerations summaries correlation analysis reducing number categories categorical variables i ii principal components breakfast cereals normalizing evaluating performance judging accuracy measures cuto unequal importance classes asymmetric misclassication costs tri...

no reviews yet
Please Login to review.