jagomart
digital resources
picture1_Ppt Seminar Proposal 42610 | Tpctc2014 Rabl Discussion Of Bigbenchː A Pro


 178x       Filetype PPTX       File size 0.71 MB       Source: msrg.org


Ppt Seminar Proposal 42610 | Tpctc2014 Rabl Discussion Of Bigbenchː A Pro
proposal end to end benchmark  application level based on a product retailer  tpc ds  focused on parallel dbms and mr engines history  st launched at 1 wbdb  ...

icon picture PPTX Filetype Power Point PPTX | Posted on 16 Aug 2022 | 3 years ago
Partial capture of text on file.
              THE BIGBENCH PROPOSAL
               End to end benchmark
                Application level
               Based on a product retailer (TPC-DS)
               Focused on Parallel DBMS  and MR engines 
               History
                               st
                Launched at 1  WBDB, San Jose
                Published at SIGMOD 2013
                Spec at WBDB proceedings 2012 (queries & data set)
                Full kit at WBDB 2014
               Collaboration with Industry & Academia
                First: Teradata, University of Toronto, Oracle, InfoSizing
                Now: bankmark, CLDS, Cisco, Cloudera, Hortonworks, Infosizing, Intel, Microsoft, MSRG, Oracle, Pivotal, 
                SAP
              05.09.2014                                                                                           EXTENDING BIGBENCH   2
              DATA MODEL
                                                                                               Structured: TPC-DS + market prices
                             Structured Data                             Unstructure
             Marketpric                                                      d Data            Semi-structured: website click-stream
                   e                                     Item
                                                                                               Unstructured: customers’ reviews
                                    Sales                                  Reviews
                 Web                                  Custome
                Page                                       r
                                     Web                                    Adapted
                                     Log                                    TPC-DS
                         Semi-Structured Data                               BigBench
                                                                            Specific
              05.09.2014                                                                                           EXTENDING BIGBENCH    3
              DATA MODEL – 3 VS
               Variety
                Different schema parts
               Volume
                Based on scale factor
                Similar to TPC-DS scaling, but continuous
                Weblogs & product reviews also scaled 
               Velocity
                Refresh for all data
              05.09.2014                                                                                           EXTENDING BIGBENCH    4
              WORKLOAD
                Workload Queries
                30 “queries”
                Specified in English (sort of)
                No required syntax (first implementation in Aster SQL MR)
                Kit implemented in Hive, HadoopMR, Mahout, OpenNLP
               Business functions (Adapted from McKinsey)
                Marketing
                  Cross-selling, Customer micro-segmentation, Sentiment analysis, Enhancing multichannel consumer experiences
                Merchandising
                  Assortment optimization, Pricing optimization
                Operations
                  Performance transparency, Product return analysis
                Supply chain
                  Inventory management
                Reporting (customers and products)
              05.09.2014                                                                                           EXTENDING BIGBENCH    5
              WORKLOAD - TECHNICAL 
              ASPECTS
                    Generic Characteristics                                              Hive Implementation 
                                                                                               Characteristics
               Data Sources               #Queries          Percenta            Query Types             #Queries Percentag
                                                                ge                                                               e
          Structured                     18                60%                Pure HiveQL              14                46%
          Semi-structured                7                 23%                Mahout                   5                 17%
          Un-structured                  5                 17%                OpenNLP                  5                 17%
          Analytic techniques              #Queries         Percenta
                                                                ge            Custom MR                6                 20%
          Statistics analysis             6                20%
          Data mining                     17               57%
          Reporting                       8                27%
              05.09.2014                                                                                           EXTENDING BIGBENCH    6
The words contained in this file might help you see if this file matches what you are looking for:

...The bigbench proposal end to benchmark application level based on a product retailer tpc ds focused parallel dbms and mr engines history st launched at wbdb san jose published sigmod spec proceedings queries data set full kit collaboration with industry academia first teradata university of toronto oracle infosizing now bankmark clds cisco cloudera hortonworks intel microsoft msrg pivotal sap extending model structured market prices unstructure marketpric d semi website click stream e item unstructured customers reviews sales web custome page r adapted log specific vs variety different schema parts volume scale factor similar scaling but continuous weblogs also scaled velocity refresh for all workload specified in english sort no required syntax implementation aster sql implemented hive hadoopmr mahout opennlp business functions from mckinsey marketing cross selling customer micro segmentation sentiment analysis enhancing multichannel consumer experiences merchandising assortment optim...

no reviews yet
Please Login to review.