WO2013033651A1 - Full elastic wave equation for 3d data processing on gpgpu - Google Patents

Full elastic wave equation for 3d data processing on gpgpu Download PDF

Info

Publication number
WO2013033651A1
WO2013033651A1 PCT/US2012/053546 US2012053546W WO2013033651A1 WO 2013033651 A1 WO2013033651 A1 WO 2013033651A1 US 2012053546 W US2012053546 W US 2012053546W WO 2013033651 A1 WO2013033651 A1 WO 2013033651A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
wave
stress
velocity
lag
Prior art date
Application number
PCT/US2012/053546
Other languages
French (fr)
Inventor
Igor PODLADTCHIKOV
Brad Artman
Original Assignee
Spectraseis Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spectraseis Ag filed Critical Spectraseis Ag
Publication of WO2013033651A1 publication Critical patent/WO2013033651A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V1/00Seismology; Seismic or acoustic prospecting or detecting
    • G01V1/28Processing seismic data, e.g. for interpretation or for event detection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/67Wave propagation modeling
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/67Wave propagation modeling
    • G01V2210/679Reverse-time modeling or coalescence modelling, i.e. starting from receivers

Definitions

  • TITLE FULL ELASTIC WAVE EQUATION FOR 3D DATA PROCESSING ON GPGPU
  • the disclosure is related to 3D data processing of physical parameters.
  • Applications include seismic exploration for oil and gas, and more particularly to processing and displaying seismic data attributes of the earth's subsurface.
  • Seismic exploration for hydrocarbons generally is conducted using a source of seismic energy and receiving and recording the energy generated by the source using seismic detectors.
  • the seismic energy source may be an explosive charge or another energy source having the capacity to impart impacts or mechanical vibrations at or near the earth's surface.
  • Seismic waves generated by these sources travel into the earth subsurface and are reflected back from boundaries and reach the surface of the earth at varying intervals of time depending on the distance traveled and the characteristics of the subsurface material traversed.
  • the return waves are detected by the sensors and representations of the seismic waves as representative electrical signals are recorded for processing.
  • Data acquisition for oil exploration may have a negative impact on the environment.
  • the impact of oil exploration methods on the environment may be reduced by using low-impact methods and/or by narrowing the scope of methods requiring an active source, including reflection seismic and electromagnetic surveying methods.
  • Geophysical and geological methods are used to determine well locations. Expensive exploration investment is often focused in the most promising areas using relatively slow methods, such as reflection seismic data acquisition and processing. The acquired data are used for mapping potential hydrocarbon-bearing areas within a survey area to optimize exploratory well locations and to minimize costly nonproductive wells.
  • the time from mineral discovery to production may be shortened if the total time required to evaluate and explore a survey area can be reduced by applying selected methods alone or in combination with other geophysical methods. Some methods may be used as a standalone decision tool for oil and gas development decisions when no other data is available. Preferable methods will be economical, have a low environmental impact, and relatively efficient with rapid data acquisition and processing.
  • a computerized method for imaging subsurface energy sources comprising acquiring multicomponent seismic data, arranging 3D volumes of velocity and stress parameters in a global computer memory by ordering adjacent planes from like-indexed blocks extracted from the 3D parameter volumes so that like-indexed blocks are adjacent in the computer memory, reverse propagating the multicomponent seismic data to obtain a reverse time wave field and solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data.
  • the method may also include applying at least one imaging condition to the wave field decomposition data to obtain imaging data and/or storing the imaging data for display.
  • Fig. 1 illustrates an elementary 2D cell or grid point
  • Fig. 2 illustrates 16 grid points, a thread-block grid
  • Fig. 3 illustrates relationships of stresses and velocities for computations
  • Fig. 4 illustrates the comparison of a 2D cell or grid point with a 3D cell or grid point representing stresses and velocities for computations
  • Fig. 5 illustrates a 3D cell or grid point representing stresses and velocities for computations indexed using x, y and z as well as 1, 2 and 3;
  • Fig. 6 illustrates a thread-block grid with a halo for the stress update calculations
  • Fig. 7 illustrates a thread-block grid with a halo for the velocity update calculations
  • Fig. 8 illustrates an example global memory address configuration relative to a 2D calculation
  • Fig. 9 illustrates a global memory address configuration with the element sizes equal
  • Fig. 10 illustrates a global memory address configuration for thread-block grids
  • Fig. 11 illustrates a visualization of the 3D Elements arranged into blocks to facilitate rapid calculations
  • Fig. 12 illustrates a block memory organization schematic relative to organizing data planes to facilitate rapid calculations
  • Fig. 13 illustrates the relationship between GPU memory threads and the planes of data for elements for each block
  • Fig. 14 illustrates a GPU memory structure showing the relationship of data blocks to data planes to elements to threads
  • Fig. 15 illustrates pseudo code for a non-limiting embodiment of the present invention
  • Fig. 16a, 16b and 16c illustrate various subsurface related images according non- limiting embodiments of the present invention.
  • Fig. 17 illustrates a computer system for implementing various non-limiting embodiments of the present invention.
  • GPGPU General purpose graphics processing units
  • This disclosure includes descriptions of embodiments to arrange data to be efficiently processed with GPGPU systems.
  • 3D problem An example of a 3D problem is used to illustrate how 3D data may be arranged to efficiently take advantage of the computing power available on a GPGPU system.
  • the 3D problem used as an example is a time reverse imaging example
  • Information to enable the determination of the location of underground fluids, underground fluid reservoirs or seismic source locations may be extracted from seismic waves and vibrations measured at the earth's surface using various imaging conditions in data processing. These waves may be measured using passive or active seismic data acquisition methods.
  • One or more sensors are used to measure vertical and horizontal components of motion due to background seismic waves at multiple locations within a survey area. These components may be measured separately or in combination and may be recorded as signals representing displacement, velocity, and/or acceleration.
  • the sensor equipment for measuring seismic waves may be any type of seismometer. Seismometer equipment having a large dynamic range and enhanced sensitivity compared with other transducers may provide the best results (e.g., multicomponent earthquake seismometers).
  • a number of commercially available sensors utilizing different technologies may be used, e.g. a balanced force feed-back instrument or an electrochemical sensor. An instrument with high sensitivity at very low frequencies and good coupling with the earth enhances the efficacy of the method.
  • CUD A Computer Unified Device Architecture
  • NVIDIA Network-to-Network Interface
  • CUDA Compute Unified Device Architecture
  • kernels, or C-like functions that are executed many times in parallel by a number of CUDA threads.
  • CUDA threads are extremely lightweight, and have very little creation overhead and are fast-switching. CUDA uses thousands of threads to achieved high efficiency and high throughput, whereas, current multi-core CPU may utilize only a few threads.
  • a kernel is executed by an array of threads in parallel, and threads may be arranged into one, two or three dimensional blocks. Thread blocks must be able to execute independently at any order, parallel or in series. The thread and block IDs are used as a convenient index to compute memory addresses and make control decisions. Threads can access multiple memories spaces-per thread local memory, per thread block shared memories and device global memories.
  • the number of blocks a multiprocessor can process at once depends on how many registers per thread and how much shared memory per blocks are required for a given kernel, because the register and shared memories are scarce and split among all the threads.
  • TRI time reverse imaging
  • the CUDA implementation uses a finite difference full elastic wave propagator, wave field decomposition, and six imaging conditions that compute absolute particle velocities and P-wave and S-wave auto correlations. Memory reads and writes may optimized by using texture bindings. Initial and/or boundary condition calculations may be built into the GPU update kernels resulting in substantial increases in elements per second throughput.
  • Time reverse imaging is a migration-like algorithm to locate subsurface sources or diffractions by propagating data, which is reversed in time through a velocity model. With no down-going source wave field, the algorithm may be roughly considered as approximately the data half of reverse-time migration that may be used in conventional seismic data processing.
  • one chain of operations is: elastic finite difference propagation, wave-field decomposition via spatial derivatives, and evaluation of up to six correlation based imaging conditions.
  • Propagation time is the exterior loop for all operations, and images are in physical space.
  • An illustration that may be used to evaluate the accuracy and performance of the GPU implementation is forward modeled data due to a subsurface vertical single force propagating through a homogeneous medium with an image showing focusing of the P and S wave modes at the appropriate location in space.
  • TRI is an ideal application of a real world 3D problem for the GPGPU, since elastic wave propagation, wave field decomposition and updating imaging conditions are all computationally intensive. The computations may be performed entirely on the GPU, and host (CPU), since communication may be minimally required-only for initialization, and for writing back the final solution.
  • pu f + ( ⁇ + 2 ⁇ ) ⁇ ( ⁇ ⁇ u) - ⁇ X (V X u).
  • the elastic wave equation Equation 1
  • Equation 1 may be discretized using the centered finite difference technique.
  • the elastic wave equation can be numerically propagated in time according to these following equations, which are first order in both time and space:
  • Density is presented by p, ⁇ and ⁇ are the Lame coefficients, V % and 3 ⁇ 4 f are the horizontal and vertical particle velocities, ⁇ ⁇ and ⁇ ⁇ are the horizontal and vertical stresses, and ⁇ ⁇ is the shear stress, all at time steps t.
  • V % and 3 ⁇ 4 f are the horizontal and vertical particle velocities
  • ⁇ ⁇ and ⁇ ⁇ are the horizontal and vertical stresses
  • ⁇ ⁇ is the shear stress, all at time steps t.
  • These velocities and stresses can be conveniently arranged onto a staggered grid. To update the state of any element on the staggered grid, only four neighboring elements— top, bottom, left, and right— are needed in the 2D case.
  • a free-surface (stress free) boundary condition is imposed for one side (the presumed surface) and tapered absorbing layers for the other three sides.
  • Fig. 1 illustrates a 2D cell, or grid node, staggered finite grid for full elastic wave propagation.
  • the horizontal velocity is Vx.
  • the vertical velocity is Vz.
  • the horizontal stress is Sxx (the same as ⁇ ⁇ in the equations above).
  • the vertical stress is Szz and the cross stress is Sxz.
  • Fig. 2 illustrates 16 grid nodes of 2D cells.
  • Fig. 3 illustrates the relationships of each the stresses and velocities for calculations of equations 2 through 6. Note that the symbols 'S' and ' ⁇ ' are used interchangeably in the text, equations and figures, but represent exactly the same thing.
  • Fig. 4 illustrates taking these concepts to the 3D case.
  • a third dimensional velocity, Vz, and stress Sz have been added along with cross stresses Sxz and Syz.
  • Fig. 5 illustrates the 3D case of Fig. 4 using numerical indices.
  • GPUs are designed for floating point operations. The time spent on calculation is small compared to the time spent accessing data and updating results in memory. Having an 10 access pattern where data are aligned for appropriately coalesced and optimized reads and writes is essential for a fast efficient implementation of 3D data parameterizing real world situations.
  • stresses and velocities are each separated into a linear array in memory.
  • N- by M-element model there are (N+l) x MV X elements, N x (M+1)V Z elements, N x ⁇ ⁇ and ⁇ ⁇ elements and (N-l) x (M-l) ⁇ ⁇ elements.
  • This arrangement keeps memory blocks relatively small to minimize memory paging. Smaller memory blocks are also easier to align with GPU memory warps to ensure coalesced reads and writes from a given thread. If the size of an array is not a factor of the half warp, it is padded to the next half warp factor.
  • a warp is the number of parallel threads that are created, managed, scheduled and executed in a group by GPU's SIMT (single instruction, multiple-thread) unit.
  • SIMT single instruction, multiple-thread
  • the graphics architecture for the example implementation disclosed herein has a warp size of 32, but this is not a limitation of the embodiments disclosed here, since the embodiments may be implemented in different architectures and with different warp ranges.
  • One possibility for implementation of 2D seismic data processing is to take advantage of CUDA built-in texture memory support by binding each array to texture and by using texture fetches to read data. Since there is a localized access pattern (top, bottom, left, right neighbors), threads of the same warp always read texture addresses that are close together. Texture memory space is cached on hardware level and the latency of addressing calculation is hidden better through thread parallelization, so there are significant speedups for data loaded with texture fetches compared to direct global memory reads. Having each thread load the data as needed through texture fetch is faster than first putting all needed data for a block into shared memory. Even though this implementation, on average, has twice the read redundancy, it keeps the memory usage for each thread to only a few registers, and thus allows more threads to run in parallel, which results in more rapid processing.
  • CUDA's capabilities depend on the GPU type. This disclosure uses as an example the specifications of compute capability of 1.3 GPUs (e.g. GTX 285, Tesla C1060), for the examples, but the methodology is straightforward to generalize to all other GPU types.
  • CUDA launches its threads in thread-blocks, configurable to be one to three dimensions, in this example a maximum 512 x 512 x 64 for each dimension and maximum 512 total threads. These blocks can be arranged in grids, configurable to be one to two dimensions, maximum 65535 x 65535.
  • the thread-blocks are arranged in a configuration of 16 by 16, resulting in 256 total threads, in a grid of nx / 16 by nz / 16, resulting in nx * nz / 256 total blocks.
  • a maximum of 4 blocks can be running on 1 multiprocessor at the same time. This is due to the way instructions are executed:
  • the blocks for this example implementation have 256 threads each, which is why a multiprocessor can manage 4 at a time. Given 30 multiprocessors, this results in 120 active threadblocks, or 30720 threads. Again, these numbers are for example only, it is the data arrangement that will lead to increased efficiencies.
  • the number of active threads (30720) cannot be mapped directly to the number of elementary cells processed in parallel.
  • a multiprocessor has 32 warps with 32 instructions each scheduled, it executes one warp at a time.
  • global memory IO operations take up hundreds of clock cycles and the multiprocessor uses this time to process other instructions. This way, the memory fetch latency can be "hidden" or accommodated to a certain amount.
  • a multiprocessor also has on-chip shared memory, which (in this example) is 16 KB in size and can be used by all threads inside a thread-block. The good thing about this memory is that it is very fast, a drawback can be that there may be a limited amount, so it must be carefully allocated.
  • the shared memory is also partitioned among thread-blocks, so the 16KB may be divided by 4 simultaneously running thread-blocks, allowing 4 KB to each.
  • Fig. 6 illustrates how the halo outside and adjacent the thread block is juxtaposed in relation to each thread-block grid for the stress update calculation.
  • Velocities including the velocity data from the halo zones that are used to update the stress calculations are loaded from global memory to the shared (local) memory.
  • the stresses are loaded from global memory to local registers for the update calculations.
  • the updated stress calculations are then written from the local registers back to global memory.
  • Fig. 7 illustrates how the halo is juxtaposed in relation to each thread- block grid for the velocity update calculation. Stresses, including the stress data from the halo zones that are used to update the velocity calculations are loaded from global memory to the shared (local) memory. The velocities are loaded from global memory to local registers for the update calculations. The updated velocity calculations are then written from the local registers back to global memory.
  • Fig. 8 illustrates a global memory address configuration.
  • aligning data in this way poses three major problems: 1) Operands (the variables V x , V y , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ ) are potentially scattered all over global memory, so to update V x from ⁇ ⁇ and ⁇ ⁇ , potentially large distances in memory space may need to be traversed, this applies to all 5 equations (equations 2 to 6); 2) Operands for a dz operation are at least nx-1 addresses apart; 3) The element sizes are different: V x is nx+1 by nz, V z is nx by nz+1, ⁇ ⁇ and ⁇ ⁇ are nx by nz and ⁇ ⁇ is nx-1 by nz-1.
  • Fig. 9 illustrates a global memory address configuration with the element sizes equal.
  • Fig. 10 illustrates a global memory address configuration for thread-block grids. For simplicity, the thread block grid has nine sectors, but it will be appreciated thread block grids usually have a larger number of sectors.
  • the data are accessed by thread-blocks, which in this example is a team of 16 by 16 threads.
  • the data are arranged such that all a thread-block will ever need (except halos) is in one place, globally arranged block after block.
  • a block would contain portions of V x , V y , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ , exactly the corresponding data that thread-blocks require.
  • a thread block "knows" its coordinates inside the thread-blocks grid (gx, gy) and can use the coordinates to access its relevant data.
  • TRI processing includes computing imaging conditions. Imaging conditions collapse the propagation time axis into physical space to allow visualization of wave-field focusing. The absolute particle velocity and P- and S- Wave potentials may be computed after elastic propagation. The equations used are:
  • the like imaging conditions include i) the zero-lag of the P-wave autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs. Additionally, these imaging conditions may be generally applied to any of: i) particle velocity measurements, ii) particle acceleration measurements, iii) particle pressure measurements and iv) particle displacement measurements.
  • the imaging conditions are updated on every grid point at every time step by reading the previous time-step value from global memory, calculating the incremental value, and then writing the updated value back to global memory.
  • the propagator consists of two kernels, one to update stress elements ( ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ ), and the other to update velocities (V X ,V Z ). For each time step, stress is updated first, then velocity. Each thread updates either the stresses or the velocities at a given array element location.
  • the free surface condition is achieved by setting the stress value to zero at the surface boundary during the velocity pass.
  • the absorbing boundaries may be applied as a Cosine taper function to the other three sides at the stress pass.
  • Wave field decomposition and imaging conditions are computed in a separate kernel from the propagation.
  • the computational complexity is on par with elastic element updates, so roughly half of the computation time is spend on elastic element updates, the other half on wave field decomposition and updating imaging conditions.
  • the propagator may also be used for forward time modeling by recording particle velocities at desired locations.
  • Fig. 11 illustrates a GPU data arrangement, which is a visualization of 3D elements (e.g. Velocities, Stresses and Cross Stresses).
  • 3D elements e.g. Velocities, Stresses and Cross Stresses.
  • the most critical optimization parameter to realize the significant performance capabilities of GPU cards is to minimize the stride-length of memory skips across the ID memory array streaming through each processing thread.
  • the size of each spatial volume in the model domain is also limited. Both issues are addressed by an appropriate domain-decomposition strategy focused on light-weight 2D planes of wave field and parameter values. Therefore, small planes (P0 to Pn-1, see Fig. 12) from the propagation domain is sequentially stored in memory for every wave field and parameter field, followed by the exterior loop over the third spatial dimension.
  • This set of planes defines a Thread Block domain, and a set of blocks contains the entire model domain, consisting of N Elements, each nl by n2 by n3. Note that the wave and parameter fields are consecutive planes before considering the third spatial dimension.
  • Fig. 13 illustrates the distribution of values in GPU memory, though adds some of the concepts of 3D spatial structure for an alternative view of the organization utilized.
  • the "speed" of the axis is then determined by the distance between neighboring elements:
  • the fast axis has its elements neighboring directly, the middle axis has nl elements between neighbors and the slow axis has nl*n2 elements between neighbors.
  • Thread-Block GPU threads are organized into a Thread-Block. Functions on the GPU are executed by Thread-Blocks, meaning that the instructions defined in the functions are executed by a group of threads at once (almost).
  • the size of the Thread-Block is configurable in one, two or three dimensions.
  • the number of Thread-Blocks scheduled is also configurable in one, two or three dimensional grids.
  • the grid will be configured Nl / BLOCK_DIM by N2 / BLOCK_DIM, where Nl and N2 are nl and n2 padded to multiples of BLOCK_DIM.
  • nl Length of fast axis, or axis 1.
  • n2 Length of middle axis, or axis 2.
  • n3 Length of slow axis, or axis 3.
  • threadldx.x When executing a function on the GPU, a Thread-Block has access to its grid and block coordinates. The block coordinates, threadldx.x and threadldx.y, inform a thread, where in the block it resides. This is used to access the proper addresses in the nl-n2 plane.
  • FIG. 15 shows an example of pseudocode for GPU based 2D TRI. Threads are synchronized between each kernel to ensure all the values are updated for the entire simulation space before moving on to the next calculation.
  • Memory is allocated on the host (CPU) and GPU device for V x , V z , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ , sources and imaging conditions. Sources, as used herein, are the seismic data input to the TRI.
  • the operands V x , V z , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ are initialed and source data are loaded. Data are transferred from the host CPU to the GPU device.
  • An example of a physical model is -15,000 m long by -4,000 m deep, discretized into a 10m x 10m grid, with 11 calculations per time step, resulting in -6,600,000 elements. Threads are organized into a 2D grid of blocks, each block consisting of 16 x 16 threads. With this block/grid arrangement, the thread and block IDs are used to index into the simulation space following the CUDA paradigm.
  • Fig. 16a illustrates Horizontal Velocity Data as recorded by surface receivers (Ricker Wavelet, fundamental frequency 3 Hz) recorded at every grid point at the surface from a forward model of a single vertical point source.
  • Fig. 16b illustrates a forward model of a single vertical point source. Receiver data is reversed in time and propagated.
  • Fig. 16c shows the TRI result, which highlights the source location using the P-S-Wave Potential Cross Correlation Imaging Condition 3 ⁇ 4 3 ⁇ 4 . All simulations performed on GPU.
  • N x and N z are numbers of horizontal and vertical grid points and N is the number of time steps.
  • Tj and T c are Java and CUDA execution time measured in seconds.
  • Ej and E c are Java and CUDA computational throughputs, in million elements per second. They are calculated by Nx x Nz x Nt x (5+6) / runtime. The factor of (5 + 6) is because there are 5 elastic properties (or Elements) and 6 imaging conditions to update at every time step in the 2D case.
  • the CUDA implementation of TRI which contains six imaging conditions and a finite difference full elastic wave propagation solver, that all runs fully on a GPU. Data are divided into small, manageable pieces to leverage CUDA's built-in texture functions, and ensure coalesced memory addressing from threads to for optimal memory reads and writes.
  • the embodiment disclosed herein illustrates a CUDA implementation that achieves a computational throughput of more than 5,000 million elements per second.
  • FIG. 11 The mapview of Elements E(n) in Fig. 11 schematically illustrates the data Block arrangement, B0 to B8, so that the same blocks, meaning like-indexed blocks, from other Elements (e.g. volumes of physical parameters) are closer together in memory to enable more rapid data transfer to speed computational speeds.
  • Fig. 12 illustrates the data arrangement in memory of adjacent Planes (P0 to Pn-1) and the like-indexed Blocks of Elements are closer together in memory for data transfer efficiency. As illustrated, the adjacent Planes of each Block for all Elements are arranged in memory so that reads and writes are most efficient.
  • Fig. 11 The mapview of Elements E(n) in Fig. 11 schematically illustrates the data Block arrangement, B0 to B8, so that the same blocks, meaning like-indexed blocks, from other Elements (e.g. volumes of physical parameters) are closer together in memory to enable more rapid data transfer to speed computational speeds.
  • Fig. 12 illustrates the data arrangement in memory of adjacent Planes (P0 to Pn-1) and
  • FIG. 13 then is an illustration of this result, where threads are in memory arranged for maximum efficiency.
  • Fig. 14 illustrates this data arrangement hierarchy. Starting from the bottom memory linear array the BLOCKS (like-indexed Blocks extracted from the Elements) are arranged in adjacent PLANES of the ELEMENTS. When the ELEMENTS are arranged in this manner, the thread blocks are in memory so that the GPU read and write sequences facilitate the maximally efficient throughput for GPU operations.
  • BLOCKS like-indexed Blocks extracted from the Elements
  • Fig. 17 illustrates a schematic example of the hardware and operating environment for which embodiments as described herein and their equivalents may be practiced.
  • the description of Fig. 17 includes a general description of computer hardware, computing environment or information handling system for which the embodiments may be implemented. Although specific hardware may not be required, embodiments may be implemented in the general context of computer-executable instructions, such as program modules, being executed by a computer. Various embodiments may be practiced with a personal computer, a mainframe computer or combinations that include workstations with servers.
  • Program modules include routines, programs, objects, components and data structures for performing tasks, processing data, and recording and displaying information.
  • An information handling system is any instrumentality or aggregate of instrumentalities primarily designed to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, measure, detect, record, reproduce, handle or utilize any form of information, intelligence or data for business, scientific, control or other purposes. Examples include personal computers and larger processors such as servers, mainframes, etc, and may contain elements illustrated in Fig. 17.
  • Embodiments may be practiced with various computer or information handling system configurations that separately or in combination may include handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network computers, minicomputers, mainframe computers, and the like. Embodiments may be practiced with tasks performed in and over distributed computing environments that include remote processing devices linked through a communications network. Program modules operating in distributed computing environments may be located in various memory locations, both local and remote.
  • Fig. 17 is illustrative of hardware and an operating environment for implementing a general purpose computing device or information handling system in the form of a computer 10.
  • Computer 10 includes a processor or processing unit 11 that may include Onboard' instructions 12.
  • Computer 10 has a system memory 20 attached to a system bus 40 that operatively couples various system components including system memory 20 to processing unit 11.
  • the system bus 40 may be any of several types of bus structures using any of a variety of bus architectures as are known in the art.
  • processing unit 11 While one processing unit 11 is illustrated in Fig. 17, there may be a single central-processing unit (CPU) or a graphics processing unit (GPU), or both or a plurality of processing units.
  • Computer 10 may be a standalone computer, a distributed computer, or any other type of computer.
  • System memory 20 includes read only memory (ROM) 21 with a basic input/output system (BIOS) 22 containing the basic routines that help to transfer information between elements within the computer 10, such as during start-up.
  • System memory 20 of computer 10 further includes random access memory (RAM) 23 that may include an operating system (OS) 24, an application program 25 and data 26.
  • OS operating system
  • application program 25 application program 25 and data 26.
  • Computer 10 may include a disk drive 30 to enable reading from and writing to an associated computer or machine readable medium 31.
  • Computer readable media 31 includes application programs 32 and program data 33.
  • computer readable medium 31 may include programs to process seismic data, which may be stored as program data 33, according to the methods disclosed herein.
  • the application program 32 associated with the computer readable medium 31 includes at least one application interface for receiving and/or processing program data 33.
  • the program data 33 may include seismic data acquired according to embodiments disclosed herein.
  • At least one application interface may be associated with calculating imaging conditions for locating subsurface hydrocarbon reservoirs.
  • the disk drive may be a hard disk drive for a hard drive (e.g., magnetic disk) or a drive for a magnetic disk drive for reading from or writing to a removable magnetic media, or an optical disk drive for reading from or writing to a removable optical disk such as a CD ROM, DVD or other optical media.
  • the drive 30 and associated computer-readable media 31 enable nonvolatile storage and retrieval for one or more application programs 32 and data 33 that include computer-readable instructions, data structures, program modules and other data for the computer 10.
  • Any type of computer-readable media that can store data accessible by a computer including but not limited to cassettes, flash memory, digital video disks in all formats, random access memories (RAMs), read only memories (ROMs), may be used in a computer 10 operating environment.
  • the application programs 32 may be associated with one or more application program interfaces.
  • An application programming interface (API) 35 may be an interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them.
  • An API 35 may also be a formalized set of software calls and routines that can be referenced by an application program 32 in order to access supporting application programs or services, which programs may be accessed over a network 90.
  • APIs 35 are provided that allow for higher level programming for displaying and mapping subsurface reservoirs.
  • APIs are provided for receiving seismic data, and decomposing, merging, smoothing and averaging the data.
  • the APIs allow for determining the output of imaging condition processing and storing it for display.
  • Serial interface 50 may be a universal serial bus (USB).
  • a user may enter commands or data into computer 10 through input devices connected to serial interface 50 such as a keyboard 53 and pointing device (mouse) 52.
  • Other peripheral input/output devices 54 may include without limitation a microphone, joystick, game pad, satellite dish, scanner or fax, speakers, wireless transducer, etc.
  • Other interfaces (not shown) that may be connected to bus 40 to enable input/output to computer 10 include a parallel port or a game port.
  • Computers often include other peripheral input/output devices 54 that may be connected with serial interface 50 such as a machine readable media 55 (e.g., a memory stick), a printer 56 and a data sensor 57.
  • a seismic sensor or seismometer for practicing embodiments disclosed herein are nonlimiting examples of data sensor 57.
  • a video display 72 e.g., a liquid crystal display (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)
  • a map or subsurface display created from imaging condition values as disclosed herein may be displayed with video display 72.
  • a computer 10 may operate in a networked environment using logical connections to one or more remote computers. These logical connections are achieved by a communication device associated with computer 10.
  • a remote computer may be another computer, a server, a router, a network computer, a workstation, a client, a peer device or other common network node, and typically includes many or all of the elements described relative to computer 10.
  • the logical connections depicted in Fig. 17 include a local-area network (LAN) or a wide-area network (WAN) 90.
  • LAN local-area network
  • WAN wide-area network
  • the computer 10 When used in a networking environment, the computer 10 may be connected to a network 90 through a network interface or adapter 60.
  • computer 10 may include a modem 51 or any other type of communications device for establishing communications over the network 90, such as the Internet.
  • Modem 51 which may be internal or external, may be connected to the system bus 40 via the serial interface 50.
  • a networked deployment computer 10 may operate in the capacity of a server or a client user machine in server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • program modules associated with computer 10, or portions thereof may be stored in a remote memory storage device.
  • the network connections schematically illustrated are for example only and other communications devices for establishing a communications link between computers may be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Environmental & Geological Engineering (AREA)
  • Geology (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Geophysics (AREA)
  • Image Processing (AREA)

Abstract

A method and system for a computerized method for imaging subsurface energy sources comprising acquiring multicomponent seismic data, arranging 3D volumes of velocity and stress parameters in a global computer memory by ordering adjacent planes from like-indexed blocks extracted from the 3D parameter volumes so that like-indexed blocks are adjacent in the computer memory, reverse propagating the multicomponent seismic data to obtain a reverse time wave field and solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data. The method may also include applying at least one imaging condition to the wave field decomposition data to obtain imaging data and/or storing the imaging data for display.

Description

TITLE: FULL ELASTIC WAVE EQUATION FOR 3D DATA PROCESSING ON GPGPU
CROSS-REFERENCE TO RELATED APPLICATION
This application claims the benefit of U.S. Provisional Application No. 61/529,758 filed 31-August-2011, which is fully incorporated by reference.
BACKGROUND OF THE DISCLOSURE
Technical Field
[0001] The disclosure is related to 3D data processing of physical parameters. Applications include seismic exploration for oil and gas, and more particularly to processing and displaying seismic data attributes of the earth's subsurface.
Description of the Related Art
[0002] Computer processing of three dimensional data volumes may be challenging because of the volume of data necessary to adequately provide realistic parameterization of real world situations. This is particularly true for the case of the full elastic wave equation for seismic data processing.
[0003] Seismic exploration for hydrocarbons generally is conducted using a source of seismic energy and receiving and recording the energy generated by the source using seismic detectors. On land, the seismic energy source may be an explosive charge or another energy source having the capacity to impart impacts or mechanical vibrations at or near the earth's surface. Seismic waves generated by these sources travel into the earth subsurface and are reflected back from boundaries and reach the surface of the earth at varying intervals of time depending on the distance traveled and the characteristics of the subsurface material traversed. The return waves are detected by the sensors and representations of the seismic waves as representative electrical signals are recorded for processing.
[0004] Normally, signals from sensors located at varying distances from the source are combined together during processing to produce "stacked" seismic traces. In marine seismic surveys, the source of seismic energy is typically air guns. Marine seismic surveys typically employ a plurality of sources and/or a plurality of streamer cables, in which seismic sensors are mounted, to gather three dimensional data. [0005] The process of exploring for and exploiting subsurface hydrocarbon reservoirs is often costly and inefficient because operators have imperfect information from geophysical and geological characteristics about reservoir locations. Furthermore, a reservoir's characteristics may change as it is produced.
[0006] Data acquisition for oil exploration may have a negative impact on the environment. The impact of oil exploration methods on the environment may be reduced by using low-impact methods and/or by narrowing the scope of methods requiring an active source, including reflection seismic and electromagnetic surveying methods.
[0007] Geophysical and geological methods are used to determine well locations. Expensive exploration investment is often focused in the most promising areas using relatively slow methods, such as reflection seismic data acquisition and processing. The acquired data are used for mapping potential hydrocarbon-bearing areas within a survey area to optimize exploratory well locations and to minimize costly nonproductive wells.
[0008] The time from mineral discovery to production may be shortened if the total time required to evaluate and explore a survey area can be reduced by applying selected methods alone or in combination with other geophysical methods. Some methods may be used as a standalone decision tool for oil and gas development decisions when no other data is available. Preferable methods will be economical, have a low environmental impact, and relatively efficient with rapid data acquisition and processing.
SUMMARY
[0009] Disclosed herein is a computerized method for imaging subsurface energy sources comprising acquiring multicomponent seismic data, arranging 3D volumes of velocity and stress parameters in a global computer memory by ordering adjacent planes from like-indexed blocks extracted from the 3D parameter volumes so that like-indexed blocks are adjacent in the computer memory, reverse propagating the multicomponent seismic data to obtain a reverse time wave field and solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data. The method may also include applying at least one imaging condition to the wave field decomposition data to obtain imaging data and/or storing the imaging data for display.
[0010] BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 illustrates an elementary 2D cell or grid point;
Fig. 2 illustrates 16 grid points, a thread-block grid;
Fig. 3 illustrates relationships of stresses and velocities for computations;
Fig. 4 illustrates the comparison of a 2D cell or grid point with a 3D cell or grid point representing stresses and velocities for computations;
Fig. 5 illustrates a 3D cell or grid point representing stresses and velocities for computations indexed using x, y and z as well as 1, 2 and 3;
Fig. 6 illustrates a thread-block grid with a halo for the stress update calculations;
Fig. 7 illustrates a thread-block grid with a halo for the velocity update calculations;
Fig. 8 illustrates an example global memory address configuration relative to a 2D calculation;
Fig. 9 illustrates a global memory address configuration with the element sizes equal;
Fig. 10 illustrates a global memory address configuration for thread-block grids;
Fig. 11 illustrates a visualization of the 3D Elements arranged into blocks to facilitate rapid calculations;
Fig. 12 illustrates a block memory organization schematic relative to organizing data planes to facilitate rapid calculations;
Fig. 13 illustrates the relationship between GPU memory threads and the planes of data for elements for each block;
Fig. 14 illustrates a GPU memory structure showing the relationship of data blocks to data planes to elements to threads;
Fig. 15 illustrates pseudo code for a non-limiting embodiment of the present invention;
Fig. 16a, 16b and 16c illustrate various subsurface related images according non- limiting embodiments of the present invention; and
Fig. 17 illustrates a computer system for implementing various non-limiting embodiments of the present invention. DETAILED DESCRIPTION
[0011] General purpose graphics processing units (GPGPU) enable rapid computations compared to conventional CPU systems. However, taking advantage of the increase in computing power can be difficult when the datasets to be processed are large 3 dimensional data sets. This disclosure includes descriptions of embodiments to arrange data to be efficiently processed with GPGPU systems.
[0012] An example of a 3D problem is used to illustrate how 3D data may be arranged to efficiently take advantage of the computing power available on a GPGPU system. The 3D problem used as an example is a time reverse imaging example
[0013] Information to enable the determination of the location of underground fluids, underground fluid reservoirs or seismic source locations may be extracted from seismic waves and vibrations measured at the earth's surface using various imaging conditions in data processing. These waves may be measured using passive or active seismic data acquisition methods.
[0014] One or more sensors are used to measure vertical and horizontal components of motion due to background seismic waves at multiple locations within a survey area. These components may be measured separately or in combination and may be recorded as signals representing displacement, velocity, and/or acceleration.
[0015] The sensor equipment for measuring seismic waves may be any type of seismometer. Seismometer equipment having a large dynamic range and enhanced sensitivity compared with other transducers may provide the best results (e.g., multicomponent earthquake seismometers). A number of commercially available sensors utilizing different technologies may be used, e.g. a balanced force feed-back instrument or an electrochemical sensor. An instrument with high sensitivity at very low frequencies and good coupling with the earth enhances the efficacy of the method.
[0016] Ambient noise conditions representative of seismic wave energy can negatively affect the recorded data. Techniques for removing unwanted artifacts and artificial signals from the data, such as cultural and industrial noise, may be important for applying this method successfully in areas where there is relatively high ambient noise. [0017] With the advancement of programmable graphics processing unit (GPU) hardware and supporting infrastructure to make GPU hardware more easily addressable and accessible, general-purpose computing on graphics processing units (GPGPU) is becoming more common. GPUs are highly parallel, multithreaded, many-core processors that are capable of very high computation and data throughput and are effective on problems that require stream proces sing-where data points have few dependencies and can be processed in the same way. In geophysics, many applications fall into this category. GPU-based implementations, from trace processing to time migration, show significant performance gains over traditional CPU based implementations.
[0018] CUD A (Compute Unified Device Architecture), a parallel computing architecture developed by NVIDIA, defines both a parallel programming model as well as a memory model for the user to develop GPU based programs in a high-level C-like programming language without worrying about low level hardware architecture, thus making the GPU more accessible and programmable. At the heart of CUDA is the ability for a programmer to define kernels, or C-like functions, that are executed many times in parallel by a number of CUDA threads.
[0019] Unlike CPU threads, CUDA threads are extremely lightweight, and have very little creation overhead and are fast-switching. CUDA uses thousands of threads to achieved high efficiency and high throughput, whereas, current multi-core CPU may utilize only a few threads. A kernel is executed by an array of threads in parallel, and threads may be arranged into one, two or three dimensional blocks. Thread blocks must be able to execute independently at any order, parallel or in series. The thread and block IDs are used as a convenient index to compute memory addresses and make control decisions. Threads can access multiple memories spaces-per thread local memory, per thread block shared memories and device global memories.
[0020] Local and shared memories are small, low-latency memories near each processor core; global memories on the other hand are larger, but off-chip memory that are high-latency and not cached, therefore costly to access. Following optimized access patterns is important when accessing global memories.
[0021] The number of blocks a multiprocessor can process at once depends on how many registers per thread and how much shared memory per blocks are required for a given kernel, because the register and shared memories are scarce and split among all the threads. In this disclosure, we present a CUDA-based time reverse imaging (TRI) implementation that initially achieved a 60 to 100 times speedup compared to an existing CPU-based code. The CUDA implementation uses a finite difference full elastic wave propagator, wave field decomposition, and six imaging conditions that compute absolute particle velocities and P-wave and S-wave auto correlations. Memory reads and writes may optimized by using texture bindings. Initial and/or boundary condition calculations may be built into the GPU update kernels resulting in substantial increases in elements per second throughput.
[0022] Time reverse imaging (TRI) is a migration-like algorithm to locate subsurface sources or diffractions by propagating data, which is reversed in time through a velocity model. With no down-going source wave field, the algorithm may be roughly considered as approximately the data half of reverse-time migration that may be used in conventional seismic data processing.
[0023] As disclosed with the following non-limiting embodiments, one chain of operations is: elastic finite difference propagation, wave-field decomposition via spatial derivatives, and evaluation of up to six correlation based imaging conditions. Propagation time is the exterior loop for all operations, and images are in physical space. An illustration that may be used to evaluate the accuracy and performance of the GPU implementation is forward modeled data due to a subsurface vertical single force propagating through a homogeneous medium with an image showing focusing of the P and S wave modes at the appropriate location in space.
[0024] TRI is an ideal application of a real world 3D problem for the GPGPU, since elastic wave propagation, wave field decomposition and updating imaging conditions are all computationally intensive. The computations may be performed entirely on the GPU, and host (CPU), since communication may be minimally required-only for initialization, and for writing back the final solution.
[0025] Finite difference elastic wave propagation: At the core of TRI is elastic wave propagation,
[0026] pu = f + (λ + 2μ)ν(ν u) - μν X (V X u). (1)
[0027] Following a finite difference method (e.g., described in Virieux, 1984), the elastic wave equation, Equation 1, may be discretized using the centered finite difference technique. In the case of 2D, the elastic wave equation can be numerically propagated in time according to these following equations, which are first order in both time and space:
Figure imgf000009_0001
J1 = * + At ((A + 2μ) ¾ + A (4) z+1 = z + At (λ ^ + (A + 2μ) ¾¾; (5) - = ζ + Δί (μ @ + Α ¾). (6)
[0028] Density is presented by p, λ and μ are the Lame coefficients, V% and ¾f are the horizontal and vertical particle velocities, εχχ and ε ζ are the horizontal and vertical stresses, and εχζ is the shear stress, all at time steps t. These velocities and stresses can be conveniently arranged onto a staggered grid. To update the state of any element on the staggered grid, only four neighboring elements— top, bottom, left, and right— are needed in the 2D case. A free-surface (stress free) boundary condition is imposed for one side (the presumed surface) and tapered absorbing layers for the other three sides.
[0029] Fig. 1 illustrates a 2D cell, or grid node, staggered finite grid for full elastic wave propagation. The horizontal velocity is Vx. The vertical velocity is Vz. The horizontal stress is Sxx (the same as εχχ in the equations above). The vertical stress is Szz and the cross stress is Sxz. Fig. 2 illustrates 16 grid nodes of 2D cells. Fig. 3 illustrates the relationships of each the stresses and velocities for calculations of equations 2 through 6. Note that the symbols 'S' and 'ε' are used interchangeably in the text, equations and figures, but represent exactly the same thing.
[0030] Fig. 4 illustrates taking these concepts to the 3D case. A third dimensional velocity, Vz, and stress Sz have been added along with cross stresses Sxz and Syz. Fig. 5 illustrates the 3D case of Fig. 4 using numerical indices.
[0031] GPUs are designed for floating point operations. The time spent on calculation is small compared to the time spent accessing data and updating results in memory. Having an 10 access pattern where data are aligned for appropriately coalesced and optimized reads and writes is essential for a fast efficient implementation of 3D data parameterizing real world situations.
[0032] Instead of interleaving the stresses and velocities into one large data structure, stresses and velocities are each separated into a linear array in memory. For an N- by M-element model, there are (N+l) x MVX elements, N x (M+1)VZ elements, N x Μεχχ and εζζ elements and (N-l) x (M-l) εχζ elements. This arrangement keeps memory blocks relatively small to minimize memory paging. Smaller memory blocks are also easier to align with GPU memory warps to ensure coalesced reads and writes from a given thread. If the size of an array is not a factor of the half warp, it is padded to the next half warp factor. A warp is the number of parallel threads that are created, managed, scheduled and executed in a group by GPU's SIMT (single instruction, multiple-thread) unit. The graphics architecture for the example implementation disclosed herein has a warp size of 32, but this is not a limitation of the embodiments disclosed here, since the embodiments may be implemented in different architectures and with different warp ranges.
[0033] One possibility for implementation of 2D seismic data processing is to take advantage of CUDA built-in texture memory support by binding each array to texture and by using texture fetches to read data. Since there is a localized access pattern (top, bottom, left, right neighbors), threads of the same warp always read texture addresses that are close together. Texture memory space is cached on hardware level and the latency of addressing calculation is hidden better through thread parallelization, so there are significant speedups for data loaded with texture fetches compared to direct global memory reads. Having each thread load the data as needed through texture fetch is faster than first putting all needed data for a block into shared memory. Even though this implementation, on average, has twice the read redundancy, it keeps the memory usage for each thread to only a few registers, and thus allows more threads to run in parallel, which results in more rapid processing.
[0034] However, for a 3D implementation of full wave field seismic processing, or other 3D data structures, the texture binding method is expensively inefficient. Therefore, implementing the method for 3D seismic processing requires arranging thread blocks to process data efficiently with the least number of time consuming data handing steps. [0035] CUDA's capabilities depend on the GPU type. This disclosure uses as an example the specifications of compute capability of 1.3 GPUs (e.g. GTX 285, Tesla C1060), for the examples, but the methodology is straightforward to generalize to all other GPU types. CUDA launches its threads in thread-blocks, configurable to be one to three dimensions, in this example a maximum 512 x 512 x 64 for each dimension and maximum 512 total threads. These blocks can be arranged in grids, configurable to be one to two dimensions, maximum 65535 x 65535.
[0036] The thread-blocks are arranged in a configuration of 16 by 16, resulting in 256 total threads, in a grid of nx / 16 by nz / 16, resulting in nx * nz / 256 total blocks. Given 256 threads per block, a maximum of 4 blocks can be running on 1 multiprocessor at the same time. This is due to the way instructions are executed: A multiprocessor has 8 cores with which it uses to execute 4x8 = 32 instructions (again, called a warp) at a time. It is capable of holding 32 such warps, therefore 32x32 = 1024 thread instructions.
[0037] The blocks for this example implementation have 256 threads each, which is why a multiprocessor can manage 4 at a time. Given 30 multiprocessors, this results in 120 active threadblocks, or 30720 threads. Again, these numbers are for example only, it is the data arrangement that will lead to increased efficiencies.
[0038] The number of active threads (30720) cannot be mapped directly to the number of elementary cells processed in parallel. On one hand, although a multiprocessor has 32 warps with 32 instructions each scheduled, it executes one warp at a time. On the other hand, global memory IO operations take up hundreds of clock cycles and the multiprocessor uses this time to process other instructions. This way, the memory fetch latency can be "hidden" or accommodated to a certain amount.
[0039] Even though reading from a graphic card RAM is relatively slow, the IO bandwidth is pretty wide: 512 bit, which allows reading 16 consecutive 32 bit words with 1 read instruction. As 32 bits is the exact float size, this allows reading 16 values at once, if properly positioned in the memory (e.g. consecutively). Arranging the data in a comfortable way for the threads to read is key to optimizing the process on CUDA. A multiprocessor also has on-chip shared memory, which (in this example) is 16 KB in size and can be used by all threads inside a thread-block. The good thing about this memory is that it is very fast, a drawback can be that there may be a limited amount, so it must be carefully allocated. The shared memory is also partitioned among thread-blocks, so the 16KB may be divided by 4 simultaneously running thread-blocks, allowing 4 KB to each.
[0040] Given the thread-block size of 16 by 16 = 256 and the size of a float (4 bytes), a "tile" of data is 256 times 4 = 1 KB. Because halos are needed (data from the border with neighboring thread-block grids) to completely update any given element or grid block, this allows 3 layers of data in shared memory.
[0041] By comparison with Fig. 2, Fig. 6 illustrates how the halo outside and adjacent the thread block is juxtaposed in relation to each thread-block grid for the stress update calculation. Velocities, including the velocity data from the halo zones that are used to update the stress calculations are loaded from global memory to the shared (local) memory. The stresses are loaded from global memory to local registers for the update calculations. The updated stress calculations are then written from the local registers back to global memory.
[0042] Fig. 7 illustrates how the halo is juxtaposed in relation to each thread- block grid for the velocity update calculation. Stresses, including the stress data from the halo zones that are used to update the velocity calculations are loaded from global memory to the shared (local) memory. The velocities are loaded from global memory to local registers for the update calculations. The updated velocity calculations are then written from the local registers back to global memory.
[0043] Three layers are all that are needed: 1) update stress: load Vx and Vz in shared tiles, compute all differences. 2) update velocity: load εχ, ey and exy in shared tiles, compute all differences. In this implementation a maximum of 3 operands can all put in shared memory and differences computed efficiently, thanks to the short fetch times from shared memory.
[0044] Even though the data are read only twice from global memory (once for stress update, once for velocity update), and written only once (after update), data input-output from memory is still the bottleneck of the process, so data input-output from memory needs to be optimized.
[0045] Elements as used here in the physical example are the Velocities (Vx, Vy and Vz) and the Stresses (Sx, Sy and Sz) and the Cross Stresses (Sxy, Sxz and Syz). A straightforward way to arrange the data (for the 2D case) is indexing element after element, and x-line after x-line inside each element:
Vx(0,0) VX(0,1) •••Vx(0,nx+1) VX(1,0) Vx(l,l) ... Vx(l,nx+1) •••Vx(nz,nx+1)
Vz(0,0) VZ(0,1) - Vz(0,nx) VZ(1,0) Vz(l,l) ... Vz(l,nx) - Vz(nz+l,nx) εχ(0,0) εχ(0,1)•••εχ(0,ηχ) εχ(1,0) εχ(1,1) ... εχ(1,ηχ) " ·εχ(ηζ,ηχ) εζ(0,0) εζ(0,1) " ·εζ(0,ηχ) εζ(1,0) εζ(1,1) ... εζ(1,ηχ) " ·εζ(ηζ,ηχ) εχζ(0,0) εχζ(0,1) " ·εχζ(0,ηχ-1) εχζ(1,0) εχζ(1,1) ... εχζ(1,ηχ-1) " ·εχζ(ηζ-1,ηχ-1)
Fig. 8 illustrates a global memory address configuration. However, aligning data in this way poses three major problems: 1) Operands (the variables Vx, Vy, εχ, εζ and εχζ) are potentially scattered all over global memory, so to update Vx from εχ and εχζ, potentially large distances in memory space may need to be traversed, this applies to all 5 equations (equations 2 to 6); 2) Operands for a dz operation are at least nx-1 addresses apart; 3) The element sizes are different: Vx is nx+1 by nz, Vz is nx by nz+1, εχ and εζ are nx by nz and εχζ is nx-1 by nz-1. This variation prevents synchronized, parallel reads/writes. The element size difference problem may be accommodated by making elements sizes all equal to nx by nz. The extra elements for xy can be ignored and padded with zeros while the missing elements in Vx and Vz are always zero due to the rigid boundary condition. In this method, the data are written to memory so that later reading is extremely fast. Fig. 9 illustrates a global memory address configuration with the element sizes equal. Fig. 10 illustrates a global memory address configuration for thread-block grids. For simplicity, the thread block grid has nine sectors, but it will be appreciated thread block grids usually have a larger number of sectors.
[0046] Therefore, the data are accessed by thread-blocks, which in this example is a team of 16 by 16 threads. The data are arranged such that all a thread-block will ever need (except halos) is in one place, globally arranged block after block. A block would contain portions of Vx, Vy, εχ, εζ and εχζ, exactly the corresponding data that thread-blocks require.
[0047] A thread block "knows" its coordinates inside the thread-blocks grid (gx, gy) and can use the coordinates to access its relevant data. The method includes: 1) Shift the global data address by (gx + gz * gnx) * n, where n is the number of elements per thread-block, which is simply 16 * 16 * 5 (Vx, Vy, εχ, ey and exy). 2) Read operands into shared memory and elements to update into registers. This can be done in 5 * 16 = 80 reads for all 256 * 5 = 1280 elements, since they are arranged consecutively and 16 elements are read at once. 3) Read halos. This is more difficult and takes more time than with data arranged in a more classic way. Overall the implementation is still faster because halos are only a small amount of the total data to read/write. The procedure for accessing halo data includes extracting based on an index mapping such as: +1 halo: + n; +nx halo: + n * gnx; - 1 halo: - n; -nx halo: - n * gnx. 4) Update elements (V or ε) using data from shared memory, which can be accessed very fast. 5) Write to global memory updated elements, 2 * 16 = 32 (velocities) or 3 * 16 = 48 (stresses).
[0048] TRI processing includes computing imaging conditions. Imaging conditions collapse the propagation time axis into physical space to allow visualization of wave-field focusing. The absolute particle velocity and P- and S- Wave potentials may be computed after elastic propagation. The equations used are:
Absolute particle velocity: V2 = V2 + V2. P-Wave potential: P2 = λ + ^).
dx dz J
S-Potential: S2 = M( V^ dz -^ dxY J
[0049] From these, the following non-exclusive six imaging conditions may be computed: Maximum Absolute Particle Velocity (Ivmax), Absolute Particle Velocity Auto Correlation (Iw ), P- and S-Wave Potential Auto Correlation (Ipp and ½), and P- and S-Wave Potential and Energy Cross Correlation (Ips and I For the 2D case:
Ivmax .x> z = max \ V(x,z) \ -
Figure imgf000015_0001
lss(x, z) =∑t≡o (S(x, z) * S(x, z)) .
Figure imgf000015_0002
lEps (x, z) = τ 0 ((Ρ( ) * S(x, z) * P(x, z) * S(x, z))) .
[0050] For the 3D case, the like imaging conditions include i) the zero-lag of the P-wave autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs. Additionally, these imaging conditions may be generally applied to any of: i) particle velocity measurements, ii) particle acceleration measurements, iii) particle pressure measurements and iv) particle displacement measurements.
[0051] Similar to the physical attributes at each grid point location, the imaging conditions are updated on every grid point at every time step by reading the previous time-step value from global memory, calculating the incremental value, and then writing the updated value back to global memory.
[0052] The propagator consists of two kernels, one to update stress elements (εχχ, εζζ, εχζ), and the other to update velocities (VX,VZ). For each time step, stress is updated first, then velocity. Each thread updates either the stresses or the velocities at a given array element location. The free surface condition is achieved by setting the stress value to zero at the surface boundary during the velocity pass. The absorbing boundaries may be applied as a Cosine taper function to the other three sides at the stress pass.
[0053] Wave field decomposition and imaging conditions are computed in a separate kernel from the propagation. The computational complexity is on par with elastic element updates, so roughly half of the computation time is spend on elastic element updates, the other half on wave field decomposition and updating imaging conditions. The propagator may also be used for forward time modeling by recording particle velocities at desired locations.
[0054] Fig. 11 illustrates a GPU data arrangement, which is a visualization of 3D elements (e.g. Velocities, Stresses and Cross Stresses). The most critical optimization parameter to realize the significant performance capabilities of GPU cards is to minimize the stride-length of memory skips across the ID memory array streaming through each processing thread. As total memory of GPU cards is relatively small, the size of each spatial volume in the model domain is also limited. Both issues are addressed by an appropriate domain-decomposition strategy focused on light-weight 2D planes of wave field and parameter values. Therefore, small planes (P0 to Pn-1, see Fig. 12) from the propagation domain is sequentially stored in memory for every wave field and parameter field, followed by the exterior loop over the third spatial dimension. This set of planes defines a Thread Block domain, and a set of blocks contains the entire model domain, consisting of N Elements, each nl by n2 by n3. Note that the wave and parameter fields are consecutive planes before considering the third spatial dimension. Fig. 13 illustrates the distribution of values in GPU memory, though adds some of the concepts of 3D spatial structure for an alternative view of the organization utilized.
[0055] The following table lists example address values for a model of nl=32, n2=16, n3=2, holding two Elements, processed using two Thread Blocks, 16 by 16 each. The GPU scheduling algorithm handles the order, priority, and distribution of Thread Blocks across the available cores on the card. Address Element Plane Block
[0 - 255] 0 0 0
[255 - 511] 1 0 0
[512 - 767] 0 1 0
[768 - 1023] 1 1 0
[1024 - 1279] 0 0 1
[1280 - 1535] 1 0 1
[1536 - 1791] 0 1 1
[1792 - 2047] 1 1 1
[0056] Axis attributes "fast", "middle" or "slow" refers to their arrangement in computer memory. Because memory address is one-dimensional, the representation of a three dimensional object is only possible by storing all their values consecutively, as in for i3 = 0 ... n3
for i2 = 0 ... n2
for il = 0 ... nl
memory[il + i2 * nl + i3 * nl * n2] = object[i3][i2][il]
[0057] The "speed" of the axis is then determined by the distance between neighboring elements: The fast axis has its elements neighboring directly, the middle axis has nl elements between neighbors and the slow axis has nl*n2 elements between neighbors.
[0058] Thread-Block : GPU threads are organized into a Thread-Block. Functions on the GPU are executed by Thread-Blocks, meaning that the instructions defined in the functions are executed by a group of threads at once (almost). The size of the Thread-Block is configurable in one, two or three dimensions.
[0059] The number of Thread-Blocks scheduled is also configurable in one, two or three dimensional grids. To use a two-dimensional grid, and if BLOCK_DIM is the size of both axis in the two-dimensional Thread-Block, the grid will be configured Nl / BLOCK_DIM by N2 / BLOCK_DIM, where Nl and N2 are nl and n2 padded to multiples of BLOCK_DIM.
nl: Length of fast axis, or axis 1.
n2: Length of middle axis, or axis 2.
n3 : Length of slow axis, or axis 3.
[0060] threadldx.x : When executing a function on the GPU, a Thread-Block has access to its grid and block coordinates. The block coordinates, threadldx.x and threadldx.y, inform a thread, where in the block it resides. This is used to access the proper addresses in the nl-n2 plane.
[0061] For illustration, Fig. 15 shows an example of pseudocode for GPU based 2D TRI. Threads are synchronized between each kernel to ensure all the values are updated for the entire simulation space before moving on to the next calculation. Memory is allocated on the host (CPU) and GPU device for Vx, Vz, εχ, εζ and εχζ, sources and imaging conditions. Sources, as used herein, are the seismic data input to the TRI. The operands Vx, Vz, εχ, εζ and εχζ are initialed and source data are loaded. Data are transferred from the host CPU to the GPU device. On the GPU for every grid point the stresses εχ, εζ and εχζ are updated, the threads are synched and then the velocities Vx and Vz are updated, and the threads are synched again. This is followed by wavefield decomposition V, P, S; imaging conditions are computed and the threads are synched.
[0062] An example of a physical model is -15,000 m long by -4,000 m deep, discretized into a 10m x 10m grid, with 11 calculations per time step, resulting in -6,600,000 elements. Threads are organized into a 2D grid of blocks, each block consisting of 16 x 16 threads. With this block/grid arrangement, the thread and block IDs are used to index into the simulation space following the CUDA paradigm.
[0063] The necessary memory is allocated and initialized first on the host (CPU), then the device (GPU), and then the propagation and imaging kernels are carried out on the GPU, with the final results transferred back to CPU for logging and analysis.
[0064] Fig. 16a illustrates Horizontal Velocity Data as recorded by surface receivers (Ricker Wavelet, fundamental frequency 3 Hz) recorded at every grid point at the surface from a forward model of a single vertical point source. Fig. 16b illustrates a forward model of a single vertical point source. Receiver data is reversed in time and propagated. Fig. 16c shows the TRI result, which highlights the source location using the P-S-Wave Potential Cross Correlation Imaging Condition ¾¾. All simulations performed on GPU.
[0065] Examples of TRI with actual physical settings using a single Nvidia GTX 285 graphics processor, which has 30 multi-processors (240 total compute cores), 1 gigabyte of device memory, and CUDA 2.3 with capability 1.3. Nx and Nz are numbers of horizontal and vertical grid points and N is the number of time steps. Tj and Tc are Java and CUDA execution time measured in seconds. Ej and Ec are Java and CUDA computational throughputs, in million elements per second. They are calculated by Nx x Nz x Nt x (5+6) / runtime. The factor of (5 + 6) is because there are 5 elastic properties (or Elements) and 6 imaging conditions to update at every time step in the 2D case.
[0066] Notwithstanding any inefficiency in the particular implementation, the throughput of more than 5,000 million elements per second achieved on a single GPU is non-trivial. Thirty seconds of physical simulation of elastic propagation takes around thirty seconds of runtime, enabling nearly real-time processing.
[0067] In the non-limiting embodiment disclosed herein, the CUDA implementation of TRI, which contains six imaging conditions and a finite difference full elastic wave propagation solver, that all runs fully on a GPU. Data are divided into small, manageable pieces to leverage CUDA's built-in texture functions, and ensure coalesced memory addressing from threads to for optimal memory reads and writes. The embodiment disclosed herein illustrates a CUDA implementation that achieves a computational throughput of more than 5,000 million elements per second.
[0068] To summarize the data arrangement, the necessary physical elements (velocities and stresses in the seismic example) are arranged in Blocks as illustrated in Fig. 11. The mapview of Elements E(n) in Fig. 11 schematically illustrates the data Block arrangement, B0 to B8, so that the same blocks, meaning like-indexed blocks, from other Elements (e.g. volumes of physical parameters) are closer together in memory to enable more rapid data transfer to speed computational speeds. Fig. 12 illustrates the data arrangement in memory of adjacent Planes (P0 to Pn-1) and the like-indexed Blocks of Elements are closer together in memory for data transfer efficiency. As illustrated, the adjacent Planes of each Block for all Elements are arranged in memory so that reads and writes are most efficient. Fig. 13 then is an illustration of this result, where threads are in memory arranged for maximum efficiency. Fig. 14 illustrates this data arrangement hierarchy. Starting from the bottom memory linear array the BLOCKS (like-indexed Blocks extracted from the Elements) are arranged in adjacent PLANES of the ELEMENTS. When the ELEMENTS are arranged in this manner, the thread blocks are in memory so that the GPU read and write sequences facilitate the maximally efficient throughput for GPU operations.
[0069] Fig. 17 illustrates a schematic example of the hardware and operating environment for which embodiments as described herein and their equivalents may be practiced. The description of Fig. 17 includes a general description of computer hardware, computing environment or information handling system for which the embodiments may be implemented. Although specific hardware may not be required, embodiments may be implemented in the general context of computer-executable instructions, such as program modules, being executed by a computer. Various embodiments may be practiced with a personal computer, a mainframe computer or combinations that include workstations with servers. Program modules include routines, programs, objects, components and data structures for performing tasks, processing data, and recording and displaying information.
[0070] The products as defined herein may be particularly adapted for use in what are termed "information handling system." An information handling system is any instrumentality or aggregate of instrumentalities primarily designed to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, measure, detect, record, reproduce, handle or utilize any form of information, intelligence or data for business, scientific, control or other purposes. Examples include personal computers and larger processors such as servers, mainframes, etc, and may contain elements illustrated in Fig. 17.
[0071] Embodiments may be practiced with various computer or information handling system configurations that separately or in combination may include handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network computers, minicomputers, mainframe computers, and the like. Embodiments may be practiced with tasks performed in and over distributed computing environments that include remote processing devices linked through a communications network. Program modules operating in distributed computing environments may be located in various memory locations, both local and remote.
[0072] Fig. 17 is illustrative of hardware and an operating environment for implementing a general purpose computing device or information handling system in the form of a computer 10. Computer 10 includes a processor or processing unit 11 that may include Onboard' instructions 12. Computer 10 has a system memory 20 attached to a system bus 40 that operatively couples various system components including system memory 20 to processing unit 11. The system bus 40 may be any of several types of bus structures using any of a variety of bus architectures as are known in the art.
[0073] While one processing unit 11 is illustrated in Fig. 17, there may be a single central-processing unit (CPU) or a graphics processing unit (GPU), or both or a plurality of processing units. Computer 10 may be a standalone computer, a distributed computer, or any other type of computer.
[0074] System memory 20 includes read only memory (ROM) 21 with a basic input/output system (BIOS) 22 containing the basic routines that help to transfer information between elements within the computer 10, such as during start-up. System memory 20 of computer 10 further includes random access memory (RAM) 23 that may include an operating system (OS) 24, an application program 25 and data 26.
[0075] Computer 10 may include a disk drive 30 to enable reading from and writing to an associated computer or machine readable medium 31. Computer readable media 31 includes application programs 32 and program data 33.
[0076] For example, computer readable medium 31 may include programs to process seismic data, which may be stored as program data 33, according to the methods disclosed herein. The application program 32 associated with the computer readable medium 31 includes at least one application interface for receiving and/or processing program data 33. The program data 33 may include seismic data acquired according to embodiments disclosed herein. At least one application interface may be associated with calculating imaging conditions for locating subsurface hydrocarbon reservoirs. [0077] The disk drive may be a hard disk drive for a hard drive (e.g., magnetic disk) or a drive for a magnetic disk drive for reading from or writing to a removable magnetic media, or an optical disk drive for reading from or writing to a removable optical disk such as a CD ROM, DVD or other optical media.
[0078] Disk drive 30, whether a hard disk drive, magnetic disk drive or optical disk drive is connected to the system bus 40 by a disk drive interface (not shown). The drive 30 and associated computer-readable media 31 enable nonvolatile storage and retrieval for one or more application programs 32 and data 33 that include computer-readable instructions, data structures, program modules and other data for the computer 10. Any type of computer-readable media that can store data accessible by a computer, including but not limited to cassettes, flash memory, digital video disks in all formats, random access memories (RAMs), read only memories (ROMs), may be used in a computer 10 operating environment.
[0079] The application programs 32 may be associated with one or more application program interfaces. An application programming interface (API) 35 may be an interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them. An API 35 may also be a formalized set of software calls and routines that can be referenced by an application program 32 in order to access supporting application programs or services, which programs may be accessed over a network 90.
[0080] APIs 35 are provided that allow for higher level programming for displaying and mapping subsurface reservoirs. For example, APIs are provided for receiving seismic data, and decomposing, merging, smoothing and averaging the data. Moreover, the APIs allow for determining the output of imaging condition processing and storing it for display.
[0081] Data input and output devices may be connected to the processing unit 11 through a serial interface 50 that is coupled to the system bus. Serial interface 50 may a universal serial bus (USB). A user may enter commands or data into computer 10 through input devices connected to serial interface 50 such as a keyboard 53 and pointing device (mouse) 52. Other peripheral input/output devices 54 may include without limitation a microphone, joystick, game pad, satellite dish, scanner or fax, speakers, wireless transducer, etc. Other interfaces (not shown) that may be connected to bus 40 to enable input/output to computer 10 include a parallel port or a game port. Computers often include other peripheral input/output devices 54 that may be connected with serial interface 50 such as a machine readable media 55 (e.g., a memory stick), a printer 56 and a data sensor 57. A seismic sensor or seismometer for practicing embodiments disclosed herein are nonlimiting examples of data sensor 57. A video display 72 (e.g., a liquid crystal display (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)) or other type of output display device may also be connected to the system bus 40 via an interface, such as a video adapter 70. A map or subsurface display created from imaging condition values as disclosed herein may be displayed with video display 72.
[0082] A computer 10 may operate in a networked environment using logical connections to one or more remote computers. These logical connections are achieved by a communication device associated with computer 10. A remote computer may be another computer, a server, a router, a network computer, a workstation, a client, a peer device or other common network node, and typically includes many or all of the elements described relative to computer 10. The logical connections depicted in Fig. 17 include a local-area network (LAN) or a wide-area network (WAN) 90. However, the designation of such networking environments, whether LAN or WAN, is often arbitrary as the functionalities may be substantially similar. These networks are common in offices, enterprise-wide computer networks, intranets and the Internet.
[0083] When used in a networking environment, the computer 10 may be connected to a network 90 through a network interface or adapter 60. Alternatively computer 10 may include a modem 51 or any other type of communications device for establishing communications over the network 90, such as the Internet. Modem 51, which may be internal or external, may be connected to the system bus 40 via the serial interface 50.
[0084] In a networked deployment computer 10 may operate in the capacity of a server or a client user machine in server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. In a networked environment, program modules associated with computer 10, or portions thereof, may be stored in a remote memory storage device. The network connections schematically illustrated are for example only and other communications devices for establishing a communications link between computers may be used.
[0085] While various embodiments have been shown and described, various modifications and substitutions may be made thereto without departing from the spirit and scope of the disclosure herein. Accordingly, it is to be understood that the present embodiments have been described by way of illustration and not limitation.

Claims

Claims:
1) A computerized method for imaging subsurface energy sources comprising:
acquiring multicomponent seismic data;
arranging 3D volumes of velocity and stress parameters in computer memory by ordering adjacent planes from like-indexed blocks extracted from the 3D volumes so that like-indexed blocks are adjacent in the computer memory; reverse propagating the multicomponent seismic data to obtain a reverse time wave field; and
solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data.
2) The method of Claim 1 further comprising applying at least one imaging condition to the wave field decomposition data to obtain imaging data.
3) The method of Claim 2 further comprising storing the imaging data for display.
4) The method of Claim 1 further comprising updating the stress parameters, loading velocity parameters, synching all threadblocks and writing synched data to global memory.
5) The method of Claim 1 further comprising updating velocity parameters, loading stress parameters, synching all threadblocks and writing synched data to global memory.
6) The method of Claim 1 wherein the 3D volumes of velocity and stress parameters are at least one selected from the group consisting of i) a horizontal velocity, ii) vertical velocity, iii) a horizontal stress, iv) a vertical stress, and v) a cross stress.
7) The method of claim 2 wherein the at least one imaging condition is at least one selected from the group consisting of; i) the zero-lag of the P-wave autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs.
8) The method of claim 2 further comprising applying the group of imaging conditions consisting of; i) the zero-lag of the P-wave autocorrelation, ii) the zero- lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs.
9) A computerized method for imaging subsurface energy sources by solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data comprising:
acquiring multicomponent seismic data;
arranging volumes of velocity and stress parameters in computer memory by ordering adjacent planes from like-indexed blocks extracted from the volumes so that like-indexed blocks are adjacent in memory;
updating the stress parameters, loading velocity parameters, synching all threadblocks and writing synched data to global memory;
updating velocity parameters, loading stress parameters, synching all threadblocks and writing synched data to global memory; and
decomposing a seismic wavefield, computing an imaging condition, writing the computed imaging condition to memory.
10) The method of claim 9 further comprising applying at least one imaging condition to the wave field decomposition data to obtain imaging data; and
11) The method of claim 10 further comprising storing the imaging data for display.
12) The method of Claim 9 wherein the 3D volumes of velocity and stress parameters are at least one selected from the group consisting of i) a horizontal velocity, ii) vertical velocity, iii) a horizontal stress, iv) a vertical stress, and v) a cross stress.
13) The method of claim 10 wherein the at least one imaging condition is at least one selected from the group consisting of; i) the zero-lag of the P-wave autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs. 14) The method of claim 10 further comprising applying the group of imaging conditions consisting of; i) the zero-lag of the P-wave autocorrelation, ii) the zero- lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs.
15) A computerized method for processing 3D seismic data comprising:
acquiring multicomponent seismic data;
acquiring a 3D volume of parameters associated with the multicomponent seismic data;
arranging the volume of parameters in computer memory by ordering adjacent planes from like-indexed blocks extracted from the volume so that like- indexed blocks are adjacent in the computer memory;
reverse propagating the multicomponent seismic data to obtain a reverse time wave field; and
solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data.
16) The method of claim 15 further comprising applying at least one imaging
condition to the wave field decomposition data to obtain imaging data.
17) The method of claim 16 further comprising storing the imaging data for display.
18) The method of claim 16 wherein the at least one imaging condition is at least one selected from the group consisting of; i) the zero-lag of the P-wave
autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs.
19) The method of Claim 15 wherein the 3D volumes parameters associated with the multicomponent seismic data are at least one selected from the group consisting of i) a horizontal velocity, ii) vertical velocity, iii) a horizontal stress, iv) a vertical stress, and v) a cross stress. 20) The method of claim 15 further comprising updating the stress parameters, loading velocity parameters, synching all threadblocks and writing synched data to global memory and updating velocity parameters, loading stress parameters, synching all threadblocks and writing synched data to global memory.
PCT/US2012/053546 2011-08-31 2012-08-31 Full elastic wave equation for 3d data processing on gpgpu WO2013033651A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161529758P 2011-08-31 2011-08-31
US61/529,758 2011-08-31

Publications (1)

Publication Number Publication Date
WO2013033651A1 true WO2013033651A1 (en) 2013-03-07

Family

ID=47756936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/053546 WO2013033651A1 (en) 2011-08-31 2012-08-31 Full elastic wave equation for 3d data processing on gpgpu

Country Status (1)

Country Link
WO (1) WO2013033651A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104635260A (en) * 2013-11-08 2015-05-20 北京瑞拉爱堡地质勘探技术有限公司 Seismic data prestack imaging method
CN105807315A (en) * 2016-03-14 2016-07-27 中国石油大学(华东) Elastic vector reverse time migration imaging method
CN106154331A (en) * 2016-06-29 2016-11-23 中国石油化工股份有限公司 Orthogonal medium Simulation of Seismic Wave frequency dispersion drawing method
US10296340B2 (en) 2014-03-13 2019-05-21 Arm Limited Data processing apparatus for executing an access instruction for N threads
CN113945994A (en) * 2020-06-30 2022-01-18 中国石油化工股份有限公司 Method for high-speed multi-source loading and wave field retrieval by using finite difference model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040141414A1 (en) * 2001-03-13 2004-07-22 Conocophillips Company Method and process for prediction of subsurface fluid and rock pressures in the earth
WO2010085499A1 (en) * 2009-01-20 2010-07-29 Spectraseis Ag Image domain signal to noise estimate
US20110115787A1 (en) * 2008-04-11 2011-05-19 Terraspark Geosciences, Llc Visulation of geologic features using data representations thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040141414A1 (en) * 2001-03-13 2004-07-22 Conocophillips Company Method and process for prediction of subsurface fluid and rock pressures in the earth
US20110115787A1 (en) * 2008-04-11 2011-05-19 Terraspark Geosciences, Llc Visulation of geologic features using data representations thereof
WO2010085499A1 (en) * 2009-01-20 2010-07-29 Spectraseis Ag Image domain signal to noise estimate

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ARTMAN ET AL.: "Source Location Using Time-Reverse Imaging.", GEOPHYSICAL PROSPECTING., vol. 58, 2010, pages 861 - 873, Retrieved from the Internet <URL:http://www.spectraseis.com/wp-content/uploads/2012/03/Source-location-using-time-reverse-imaging.pdf> [retrieved on 20121127] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104635260A (en) * 2013-11-08 2015-05-20 北京瑞拉爱堡地质勘探技术有限公司 Seismic data prestack imaging method
US10296340B2 (en) 2014-03-13 2019-05-21 Arm Limited Data processing apparatus for executing an access instruction for N threads
CN105807315A (en) * 2016-03-14 2016-07-27 中国石油大学(华东) Elastic vector reverse time migration imaging method
CN106154331A (en) * 2016-06-29 2016-11-23 中国石油化工股份有限公司 Orthogonal medium Simulation of Seismic Wave frequency dispersion drawing method
CN113945994A (en) * 2020-06-30 2022-01-18 中国石油化工股份有限公司 Method for high-speed multi-source loading and wave field retrieval by using finite difference model

Similar Documents

Publication Publication Date Title
Abdelkhalek et al. Fast seismic modeling and reverse time migration on a GPU cluster
Weiss et al. Solving 3D anisotropic elastic wave equations on parallel GPU devices
Komatitsch et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Martin et al. Gravity inversion using wavelet-based compression on parallel hybrid CPU/GPU systems: application to southwest Ghana
US9348044B2 (en) Vectorization of fast fourier transform for elastic wave propogation for use in seismic underwater exploration of geographical areas of interest
CN106842320A (en) The parallel 3-D seismics wave field generation methods of GPU and system
WO2013033651A1 (en) Full elastic wave equation for 3d data processing on gpgpu
Giroux et al. Task-parallel implementation of 3D shortest path raytracing for geophysical applications
Xue et al. An efficient GPU implementation for locating micro-seismic sources using 3D elastic wave time-reversal imaging
Alkhimenkov et al. Resolving wave propagation in anisotropic poroelastic media using graphical processing units (GPUs)
Deschizeaux et al. Imaging earth’s subsurface using CUDA
CN113945994B (en) Method for high-speed multi-source loading and wave field retrieval by using finite difference model
Londhe et al. Adaptively accelerating FWM2DA seismic modelling program on multi-core CPU and GPU architectures
CN109490948B (en) Seismic acoustic wave equation vector parallel computing method
Abdelkhalek et al. Fast seismic modeling and reverse time migration on a graphics processing unit cluster
Ferreirinha et al. Acceleration of stochastic seismic inversion in OpenCL-based heterogeneous platforms
CA2952935A1 (en) Re-ordered interpolation and convolution for faster staggered-grid processing
US10454713B2 (en) Domain decomposition using a multi-dimensional spacepartitioning tree
Hou et al. 3D inversion of vertical gravity gradient with multiple graphics processing units based on matrix compression
Serpa et al. Energy efficiency and portability of oil and gas simulations on multicore and graphics processing unit architectures
Yang et al. Unstructured mesh based elastic wave modelling on GPU: a double-mesh grid method
Ma et al. Integration of a Gaussian quadrature grid discretization approach with a generalized stiffness reduction method and a parallelized direct solver for 3-D frequency-domain seismic wave modelling in viscoelastic anisotropic media
Le et al. A pipeline approach for three dimensional time-domain finite-difference multi-parameter waveform inversion on GPUs
Abdelkhalak et al. Application of high performance asynchronous acoustic wave equation stencil solver into a land survey
Medeiros et al. High performance implementation of RTM seismic modeling on FPGAs: Architecture, arithmetic and power issues

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12826772

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/05/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 12826772

Country of ref document: EP

Kind code of ref document: A1