jagomart
digital resources
picture1_Business Spread Sheet 42846 | Biintro


 164x       Filetype PPT       File size 0.32 MB       Source: www.cse.ust.hk


File: Business Spread Sheet 42846 | Biintro
also adapted from sources tan steinbach kumar tsk book introduction to data mining weka book witten and frank wf data mining han and kamber hk book data mining bi book ...

icon picture PPT Filetype Power Point PPT | Posted on 16 Aug 2022 | 3 years ago
Partial capture of text on file.
         Also adapted from sources
     Tan, Steinbach, Kumar (TSK) Book: 
        Introduction to Data Mining
     Weka Book: Witten and Frank (WF): 
        Data Mining
     Han and Kamber (HK Book): 
        Data Mining
     BI Book is denoted as “BI Chapter #...”
                                                       2
               BI1.4 Business Intelligence 
                              Architectures
      •  Data Sources                        •  An example
          – Gather and integrate data            – Building a telecom 
          – Challenges                              customer retention model
      •  Data Warehouses and                          • Given a customer’s 
         Data Marts                                     telecom behavior, predict if 
                                                        the customer will stay or 
          – Extract, transform and load                 leave
             data                                – KDDCUP 2010 Data
          – Multidimensional 
             Exploratory Analysis
      •  Data Mining and Data 
         Analytics
          – Extraction of Information 
             and Knowledge from Data
          – Build Models of Prediction 
                                                                                3
                    BI3: Data Warehousing
       •   Data warehouse:
            – Repository for the data available for BI and Decision Support Systems
            – Internal Data, external Data and Personal Data
            – Internal data: 
                 •  Back office: transactional records, orders, invoices, etc.
                 •  Front office: call center, sales office, marketing campaigns,
                 •  Web-based: sales transactions on e-commerce websites
            – External:
                 •  Market surveys, GIS systems
            – Personal: data about individuals
            – Meta: data about a whole data set, systems, etc.  E.g., what structure is 
               used in the data warehouse? The number of records in a data table, etc.
       •   Data marts: subset of data warehouse for one function (e.g., 
           marketing).
       •   OLAP: set of tools that perform BI analysis and decision making.
       •   OLTP: transactional related online tools, focusing on dynamic data. 
                                                                                                4
                      Working with Data: BI Chap 7
          •    Let’s first consider an 
                                                                                                Independent Variables                  Dependent
               example dataset                                                                                                         Variable
                                                                                  Outlook        Temp        Humidity       Windy         Play
          •    Univariate Analysis (7.1)                                        sunny                 85             85     FALSE      no
          •    Histograms                                                       sunny                 80             90     TRUE       no
                                                                                overcast              83             86     FALSE      yes
                 – Empirical density=e_h/m,                                     rainy                 70             96     FALSE      yes
                      e_h=values that belong to                                 rainy                 68             80     FALSE      yes
                      class h.                                                  rainy                 65             70     TRUE       no
                                                                                overcast              64             65     TRUE       yes
                 – X-axis=value range                                           sunny                 72             95     FALSE      no
                 – Y-axis=empirical density                                     sunny                 69             70     FALSE      yes
                                                                                rainy                 75             80     FALSE      yes
                                                                                sunny                 75             70     TRUE       yes
                                                                                overcast              72             90     TRUE       yes
                                                                                overcast              81             75     FALSE      yes
                                                                                rainy                 71             91     TRUE       no
                                                                                                                                                5
                     Measures of Dispersion
                                              1      m
       • Variance                2                (x  )2
                                           m1              i
                                                    i1
                                                            1      m               1/2
       •                                                             (x  )2
           Standard deviation                             m 1 i                  
                                                                  i1              
       •                                                  r*
           Normal Distribution: interval 
             – r=1 contains approximately 68% of the observed              Thm 7.1Chebyshev’s Theorem
                values;                                                    r>=1, and (x1, x2, …xm)
             – r=2: 95% of the observed values                             be a group of m values.
             – r=3: 100% of values
             – Thus, if a sample outside (             ), it may be an            2
                                                3                       (1-1/r ) of the values will fall 
                outlier                                                                      r*
                                                                           within interval  
                                                                                                         6
The words contained in this file might help you see if this file matches what you are looking for:

...Also adapted from sources tan steinbach kumar tsk book introduction to data mining weka witten and frank wf han kamber hk bi is denoted as chapter business intelligence architectures an example gather integrate building a telecom challenges customer retention model warehouses given s marts behavior predict if the will stay or extract transform load leave kddcup multidimensional exploratory analysis analytics extraction of information knowledge build models prediction warehousing warehouse repository for available decision support systems internal external personal back office transactional records orders invoices etc front call center sales marketing campaigns web based transactions on e commerce websites market surveys gis about individuals meta whole set g what structure used in number table subset one function olap tools that perform making oltp related online focusing dynamic working with chap let first consider independent variables dependent dataset variable outlook temp humidity...

no reviews yet
Please Login to review.