164x Filetype PPT File size 0.32 MB Source: www.cse.ust.hk
Also adapted from sources Tan, Steinbach, Kumar (TSK) Book: Introduction to Data Mining Weka Book: Witten and Frank (WF): Data Mining Han and Kamber (HK Book): Data Mining BI Book is denoted as “BI Chapter #...” 2 BI1.4 Business Intelligence Architectures • Data Sources • An example – Gather and integrate data – Building a telecom – Challenges customer retention model • Data Warehouses and • Given a customer’s Data Marts telecom behavior, predict if the customer will stay or – Extract, transform and load leave data – KDDCUP 2010 Data – Multidimensional Exploratory Analysis • Data Mining and Data Analytics – Extraction of Information and Knowledge from Data – Build Models of Prediction 3 BI3: Data Warehousing • Data warehouse: – Repository for the data available for BI and Decision Support Systems – Internal Data, external Data and Personal Data – Internal data: • Back office: transactional records, orders, invoices, etc. • Front office: call center, sales office, marketing campaigns, • Web-based: sales transactions on e-commerce websites – External: • Market surveys, GIS systems – Personal: data about individuals – Meta: data about a whole data set, systems, etc. E.g., what structure is used in the data warehouse? The number of records in a data table, etc. • Data marts: subset of data warehouse for one function (e.g., marketing). • OLAP: set of tools that perform BI analysis and decision making. • OLTP: transactional related online tools, focusing on dynamic data. 4 Working with Data: BI Chap 7 • Let’s first consider an Independent Variables Dependent example dataset Variable Outlook Temp Humidity Windy Play • Univariate Analysis (7.1) sunny 85 85 FALSE no • Histograms sunny 80 90 TRUE no overcast 83 86 FALSE yes – Empirical density=e_h/m, rainy 70 96 FALSE yes e_h=values that belong to rainy 68 80 FALSE yes class h. rainy 65 70 TRUE no overcast 64 65 TRUE yes – X-axis=value range sunny 72 95 FALSE no – Y-axis=empirical density sunny 69 70 FALSE yes rainy 75 80 FALSE yes sunny 75 70 TRUE yes overcast 72 90 TRUE yes overcast 81 75 FALSE yes rainy 71 91 TRUE no 5 Measures of Dispersion 1 m • Variance 2 (x )2 m1 i i1 1 m 1/2 • (x )2 Standard deviation m 1 i i1 • r* Normal Distribution: interval – r=1 contains approximately 68% of the observed Thm 7.1Chebyshev’s Theorem values; r>=1, and (x1, x2, …xm) – r=2: 95% of the observed values be a group of m values. – r=3: 100% of values – Thus, if a sample outside ( ), it may be an 2 3 (1-1/r ) of the values will fall outlier r* within interval 6
no reviews yet
Please Login to review.