202x Filetype PPTX File size 0.13 MB Source: indico.cern.ch
HEP Data Format Activity There is literally a flurry for some time Focus on columnar formats (storage or conversion) Dianna-HEP Parquet, Awkward Array, Femtocode, OAMap, etc iris-HEP iDDS, Service-X, DOMA R&D ROOT Project RDataFrame Others COFFEA HEP-Google TIM March 24-26, 2020 2 HEP Public Cloud Activity Many projects leverage public clouds HEPCloud (AWS) HTCondor (AWS) ICCEP GCPM Project (GCP) Atlas Data Ocean Project (GCP) Many other independent projects HEP-Google TIM March 24-26, 2020 3 The Synthesis Why not combine all these ideas Analysis using a public cloud E.G. Google Cloud Platform (GCP) With a cloud storage friendly data format E.G. Parquet (https://parquet.apache.org/) Suitable for efficient memory representation E.G. PANDAS (https://pandas.pydata.org/) That Python oriented physicists find useful We should learn quite a lot HEP-Google TIM March 24-26, 2020 4 How We Got Here August 2019 Informal discussion started (Andrew Hanushevsky & Ross Thomson) September 2019 Project conceptualized October 2019 Project formalized On-boarded 20% Google engineer (Guilhem Tesseyre) November 2019 onwards Various approaches investigated and tried February 2020 On-boarded physics analyst (Shawfeng Dong SLAC ACF) HEP-Google TIM March 24-26, 2020 5 Project Goals I Demonstrate efficient use of GCP for physics analysis We are only addressing analysis here Using Python as the language The demonstration has two aspects Workflow for needed data flow setup This usually requires data conversion Workflow for running an analysis job HEP-Google TIM March 24-26, 2020 6
no reviews yet
Please Login to review.