133x Filetype PDF File size 1.99 MB Source: projects.iq.harvard.edu
1 NoisePy: a new high-performance python tool for 2 ambient noise seismology 1∗ 1 3 Chengxin Jiang and Marine A. Denolle 4 1Department of Earth and Planetary Sciences, Harvard University, MA, USA 5 ∗Corresponding author: Chengxin Jiang (chengxinjiang@gmail.com) 1 6 Abstract 7 The fast-growing interests in high spatial resolution of seismic imaging and high temporal reso- 8 lution of seismic monitoring pose great challenges for fast, efficient, and stable data processing 9 in ambient noise seismology. This coincides with the explosion of available seismic data in the 10 last few years. However, the current computational landscape of ambient seismic field seismol- 11 ogy remains highly heterogeneous, with individual researchers building their own homegrown 12 codes. Here, we present NoisePy, a new high-performance python tool designed specifically for 13 large-scale ambient noise seismology. NoisePy provides most of the processing techniques for 14 the ambient field data and the correlations found in the literature, along with parallel download 15 routines, dispersion analysis, and monitoring subroutines. NoisePy takes advantage of ASDF, a 16 parallel I/O enabled HDF5 data format designed for seismology, for a structured organization of 17 the cross-correlation data. NoisePy obeys the embarrassing parallelism of computing the noise 18 correlations over time windows using MPI. Thus, NoisePy observes a strong scaling with the num- 19 ber of cores, a small memory overhead, and stable memory usage. Benchmark comparisons with 20 the latest version of MSNoise demonstrate about 4-time improvement in compute time of the cross 21 correlations, which is the slowest step of ambient noise seismology. NoisePy is suitable for ambi- 22 ent noise seismology of various data sizes, and it has been tested successfully at handling data of 23 size ranging from a few GBs to several tens of TBs. 24 1 Introduction 25 With more than two decades of flourishing developments both in methodologies and in scientific 26 discoveries, the use of ambient seismic field in seismology is now well established. It has been a 27 prime tool for structure imaging at a broad range of length scales, from reservoir scales (de Ridder 2 28 and Dellinger, 2011; Lin et al., 2013; Mordret et al., 2013; Nakata et al., 2015; Chmiel et al., 29 2019), to regional scales (Campillo and Paul, 2003; Shapiro et al., 2005; Sabra et al., 2005; Yao 30 et al., 2006; Brenguier et al., 2007; Lin et al., 2008; Gao et al., 2011; Porritt et al., 2011; Yang et al., 31 2012; Ward et al., 2013; Chen et al., 2014; Jiang et al., 2014; Xie et al., 2015; Bao et al., 2015; 32 Obermannetal., 2016; Lynner and Porritt, 2017; Bowden et al., 2017; Li et al., 2017; Delph et al., 33 2018;Jiangetal.,2018;Bergetal.,2018;Wangetal.,2018;Dengetal.,2019;Liuetal.,2019),and ¨ 34 continental scales (Yang et al., 2007; Ekstrom et al., 2009; Saygin and Kennett, 2010, 2012; Zhao 35 et al., 2016; Shen and Ritzwoller, 2016; Shen et al., 2016). Ambient-noise seismology is also used 36 to monitor tectonic, volcanic, and environmental processes (e.g., Brenguier et al., 2008a,b; Ermert 37 et al., 2015; Mordret et al., 2016; Viens et al., 2017; Wang et al., 2017a; Clements and Denolle, 38 2018; Taira et al., 2018; Yates et al., 2019; Mao et al., 2019a). Finally, other applications include 39 the prediction of long-period ground motions (Prieto and Beroza, 2008; Viens et al., 2015; Denolle 40 et al., 2014a,b, 2018). These standard approaches rely mainly on the processing of continuous 41 time series. To enable new scientific discovery, seismologists have to enhance spatial and temporal 42 resolution and thus they have to process larger amount of seismic data. 43 Thekeychallengesindataprocessingarethecomputationandthestorageoftheinter-stationcross 44 correlations, both of which scale quadratically with the number of stations (or channels) and lin- 45 early with time. Large-N arrays (Lin et al., 2013; Nakata et al., 2015; Wang et al., 2017b; Karplus 46 andSchmandt,2018;Ranasingheetal.,2018;Mordretetal.,2018;Keiferetal.,2019;Mengetal., 47 2019), Distributed Acoustic Sensing with optic fibers (e.g., Dou et al., 2017; Zeng et al., 2017; 48 Martin et al., 2018; Yu et al., 2019; Williams et al., 2019), and whole network analysis (e.g., Shen 49 and Ritzwoller, 2016; Zhao et al., 2016; Bowden et al., 2017) are becoming the standards for data 50 collection and analysis. These studies often involve hundreds to thousands of sensors/channels. 51 The computational requirements increase further when studies track the temporal evolution of the 52 cross-correlation functions (Wang et al., 2017a). In general, seismic studies involving over 100s of 3 53 channels at moderate-to-high sampling rates (10-100 Hz) require more elaborate code designs and 54 data management, such as I/O strategies and choices in parallelization on clusters. 55 One companion, yet basic, problem with such large data sets is the organization of the database 56 that usually becomes an individual choice based on specific research projects. When studying the 57 temporal evolution of a local structure, seismologists may tend to organize the data per time pe- 58 riod. When imaging the spatial variations in the elastic structure of a wide area, which requires 59 informationburiedintheinter-stationcorrelations regardless of the time period, seismologists may 60 tend to organize the data per station/channel. Another challenge in data storage is the number of 61 files: a single cross-correlation function is usually small in size and individual studies may generate 62 millions of such small files. High-performance computing (HPC) and High Throughput comput- 63 ing (HTC) centers use filesystems that fail at handling a large number of small files efficiently. 64 Therefore, the data ought to be stored and organized in large, parallel I/O enabled data containers 65 such as HDF5 and NetCDF. The Adaptable Seismic Data Format (ASDF, Krischer et al., 2016) 66 is one of such file formats that uses the HDF5 container to store large time series and metadata. 67 ASDF is easily read in C++, Fortran, Julia, and python, leading to a transportable data format 68 betweenhigh-level languages (like Python) and high-performance computing languages (like C++ 69 and Fortran). 70 Python has become the new standard, open-source, high-level language for seismic processing. 71 For example, ObsPy is now the most popular toolbox for seismology. Python is also the most 72 popular language for machine learning algorithms (e.g., Tensor Flow, PyTorch, and Keras), which 73 is also becoming standard practice in the seismology community (e.g. Kong et al., 2018; Bergen 74 et al., 2019). 75 Several generic software provide the functionality to compute the cross correlation of time series, 76 such as Seismic Analysis Code (SAC; Goldstein et al., 2003), Computer Programs in Seismol- 77 ogy (CPS; Herrmann, 2013), and ObsPy (Beyreuther et al., 2010; Megies et al., 2011). MSNoise 4
no reviews yet
Please Login to review.