CN113945994B

CN113945994B - Method for high-speed multi-source loading and wave field retrieval by using finite difference model

Info

Publication number: CN113945994B
Application number: CN202110734477.2A
Authority: CN
Inventors: 张昌华
Original assignee: China Petroleum and Chemical Corp
Current assignee: China Petroleum and Chemical Corp
Priority date: 2020-06-30
Filing date: 2021-06-30
Publication date: 2023-07-04
Anticipated expiration: 2041-06-30
Also published as: CN113945994A; US20210406427A1; US11281825B2

Abstract

A method for efficiently injecting a source and retrieving a receiver wavefield on a GPU using a finite difference model for wavefield simulation, wherein the source and receiver are not on numerical grid points and thus are in arbitrary positions. To this end, the method uses GPU texture memory to increase memory read bandwidth, then positions the seismic sources at arbitrary or simulated locations in the finite difference grid, and expands them onto the newly generated grid points.

Description

Method for high-speed multi-source loading and wave field retrieval by using finite difference model

Technical Field

The present invention relates generally to geophysical surveying and seismic data processing using high-speed computing methods in which finite difference models are employed in a graphics processing unit having bandwidth and storage constraints.

Background

1. Summary of the invention

Even with today's high performance computing machines, there may not be enough computing power to model a full slice of the survey area using a single unified computing scheme. Thus, those skilled in the art will turn to implementing advanced level value programming solutions with hybrid processing units. These hybrid approaches require efficient and practical implementation because several different modeling kernels of these units operate in different regions (or sub-domains) of the seismic survey model, and possibly even at different finite difference grid resolutions.

The two main processing units are a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU). GPUs provide high performance computing power at a lower price than comparable CPUs and employ various hardware modules, such as desktops or centers. The main purpose of GPUs is to provide improved numerical performance, and therefore there is a significant difference in performance between GPUs and CPUs. For example, a particular GPU may perform one trillion floating point operations in one second, while the performance of the CPU is determined by the number of cores per slot (e.g., 4 cores) and the performance of the slots (i.e., number of clock cycles per second).

However, GPUs rely on thread-level parallelism to hide long off-chip memory access delays, and thus make judicious use of on-chip memory resources (including register files, shared memory, and data caches) is critical to the performance of the application. In contrast, one on-chip feature of GPUs is that they can address different on-chip memories in parallel, thereby increasing memory bandwidth and latency. Therefore, managing the on-chip memory resources of a GPU is a very important task for application developers. More importantly, performance portability is a difficult challenge due to the different generations of on-chip memory resources of GPUs. However, it is well known in the art that GPUs offer very limited support for general purpose programming because code executing in the GPU cannot allocate memory and need to be done by the host CPU.

These complexities of using hybrid approaches mean that programming from the CPU to the GPU is difficult and require advanced algorithms to be generated at the same or different times. Obviously, this affects the amount of work one skilled in the art can perform in the oil and gas industry when computing large scale seismic wave equation models and seismic wave propagation simulations in complex media.

2. At present, GPU architecture

One of the most widely used high-performance parallel applications in the C/c++ language using the unified computing device architecture (CUDA) programming model is the Tesla architecture of NVIDIA. An example of an NVIDIA architecture is typically a set of streaming multiprocessors, each with its own processor and shared memory (user managed cache). These processors are fully capable of performing integer and single precision floating point operations with additional cores for double precision. However, all stream processors in a multiprocessor share fetch, control, and memory units with global device memory that is not cached by hardware.

In these architectures, the CUDA arranges threads into thread blocks that can be read and written at any shared memory location allocated to the thread blocks. Thus, threads within a thread block may communicate through shared memory or use shared memory as a user-managed cache because the latency of shared memory is two orders of magnitude lower than the latency of global memory.

However, since the same data is accessed and modified in the forward and backward stages in most architectures, a hybrid CPU/GPU implementation requires continuous memory transfers between the host and GPU memory to maintain data coherency. Because of the frequent and large number of memory transfers, current GPU performance is greatly impacted, and then steps are performed in the GPU kernel to access data residing in the Graphics Double Data Rate (GDDRAM) module. This provides an additional benefit in that the GDDR module can request and receive data on the same clock cycle. Thus, the host only needs the computing execution environment, kernel calls, and I/O transfers to and from disk if necessary.

3. Acoustic wave propagation

Computational modeling of acoustic waves has been an active area of research and has evolved from aerospace and the like to seismology, geophysics and even meteorology. In fact, computational modeling and simulation of acoustic waves in spatial arrays is the basis for many scientific and engineering applications, as well as in oil and gas shell and reservoir exploration. However, to date, obtaining good results based on acoustic propagation models in large or complex structures remains a major computational challenge.

The most common method of simulating acoustic wave propagation in this field generally involves a two-stage process: (1) calculating an impulse response representative of the acoustic space; and (2) convolving the impulse response with the anechoic recorded or synthetically generated source signal. However, since these impulse responses rely on accurate computation of modeling the time-varying spatial sound field, their primary challenge remains the computational requirement of accurate solution methods (e.g., finite differences). In most cases, the method for solving for acoustic propagation requires about 3-5 nodes (i.e., grid points) per wavelength from the spatial discretization. Assuming a minimum frequency range of 3Hz and a maximum frequency of 100Hz, the node spacing is 5-50m, with the result that an acoustic space of 10 cubic kilometers fills with about 8 million to 80 hundred million nodes. Thus, the numerical solver of acoustic wave equations is limited to a fairly simple, limited, or small space, which is generally considered to be time consuming. To overcome this problem, most current acoustic simulation systems use geometric acoustic techniques, which are accurate only for higher frequencies and early reflections, and may not model diffraction effects. It can be seen that overcoming one problem creates another.

4. Full Wavefield Inversion (FWI)

FWI is a partial differential equation constrained optimization method that iteratively minimizes the mismatch norms between the measured wavefield and the calculated wavefield. The seismic FWI involves multiple iterations, a single iteration may involve the following calculations: (1) solving a forward equation; (2) solving the accompanying equation; and (3) convolving the forward equations with the solutions of the accompanying equations to produce a gradient of the cost function. It should be noted that for second order optimization methods, such as the Gauss-Newton method, it is also necessary to perform (4) solving the disturbance forward equation.

Classical time domain FWI was originally proposed by Tarantola in 1984 to improve the velocity model by minimizing the energy in the difference between predicted and observed data in the least squares sense (see Tarantola, a.,1984,Inversion of seismic reflection data in the acoustic approximation,Geophysics,49,1259-1266, and Symes, W.W.,2008,Migration velocity analysis and waveform inversion,Geophysical Prospecting,56,765-790). Tarantola was further developed in 1986 and applied to the elastic case (see Pica, A., J. Diet and A.Tarantola,1990,Nonlinear inversion of Peace in alaterally invariant medium,Geophysics,55,284-292). Frequency domain FWI has become an active research area and has provided a hierarchical framework for robust inversion (see Pratt, g., c.shin, et al, 1998, gauss-newton and full newton methods in frequency-space seismic waveform inversion, geophysical Journal International,133, 341-362). Recently Shin and Cha introduced Laplace domain FWI and Laplace Fourier domain variants (see Shin, C. And YH Cha,2008,Waveform inversion in the Laplace domain,Geophysical Journal International,173,922-931,2009; and Waveform inversion in the Laplace-Fourier domain, geophysical Journal International,177, 1067-1079). However, it remains a challenging problem and attracts more and more effort by geophysicists (see Virieux, J. And S.Operto,2009,An overview of full-waveform inversion in exploration Geophysics, geophysics,74, WCC1-WCC 26).

However, the FWI approach has recently been made a widely used approach by making use of GPU advances in computational power and hardware to mitigate computational deficiencies in seismic imaging (see Micikevicius, p.,2009,3D finite difference computation on GPUs using CUDA:Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units,ACM,79-84) and inversion (see bonyasiriwat, c., G.Zhan, M.Hadwiger, M.Srinivasan and g.schuster,2010,Multisource reverse-time migration and full-waveform inversion on a GPGPU, 72 th EAGE conference and exhibition). However, a key problem with GPU computing remains, which is not necessarily related to the computational resources of the GPU, but rather to the communication time of data transfer between the host and client, and their memory resources.

5. Reverse Time Migration (RTM)

Reverse time migration is a well known method of seismic imaging migration in the art that is primarily used to map subsurface reflectivity using recorded seismic waveforms. RTM is capable of handling the worst combination of structural dip with high speed contrast with conditions common in salt basins and other geological basins with complex structures and speed profiles. These subsurface conditions can severely distort any propagated wavefronts and images, thus complicating modeling and imaging of the seismic data. Thus, under these types of conditions, shots from multiple sources are required to create a broad array of wavefield types that can be captured by the sensors.

Another advantage of the RTM method includes evaluating the quality of an image at the same spatial location as the location where the image was formed. Given that most seismic migration applications today still use primary reflections as the sole signal, the ability of RTM to use all calculable wave types is unique and helps to reduce imaging artifacts due to mistaking non-primary waves for primary reflections.

The basic idea of RTM is a three-step process: (1) Forward modeling the wavefield by an appropriate velocity model; (b) Back-propagating the measurement data through the exact same model; and (c) performing super positioning using imaging conditions. The complete wave equation was used for simulation. Imaging is not dip limited because the entire wavefield, including multiple reflections, is used. Thus, even the vertical boundary or lower edge of the object may be imaged.

In most cases, the RTM method looks for an initial image of subsurface reflectivity between extrapolation of time-reversed waveform data and simulated prediction as a best match in image space based on the estimated velocity model and source parameters. In a mathematical sense, the RTM method essentially extrapolates the initially zero x-y-z volume back in time, taking the seismic data P (x, y, z=0, t) as a boundary condition for z=0 at each time step, in order to calculate a snapshot of the x-y-z volume at different times. At time t=0, this x-y-z volume generates an offset result for image I (x, y, z).

6. Finite difference model

Finite difference models are one method widely used in the oil and gas industry for large scale seismic wave equation modeling and seismic wave propagation simulation in complex media, for modeling and migration (e.g., reverse time migration) or inversion (e.g., full waveform inversion) of synthetic data. In this model, the time evolution of the seismic wavefield is sampled over the volume of discrete grid points. However, no excitation or recording wave field of such a wave field is needed at these grid points.

Common commercial applications include reference synthetic seismic data generation, reverse time migration, and full waveform inversion. Commercial finite difference modelers typically solve acoustic or anisotropic pseudo-acoustic wave equations in schemes that apply a common kernel across model domains. These schemes are typically highly optimized, including load balancing, which requires consideration of the additional cost of absorbing boundary regions in certain domains. However, the availability of computing hardware is leading to a larger and increasingly heterogeneous clustered environment in the exploration area. This is due in part to the fast enough speed of variation of the multi-core CPU, the clusters typically contain nodes spanning several generations of CPU technology, and also to the advent of hybrid CPU and GPU clusters and hybrid cores. Thus, this non-uniformity of computing nodes makes load balancing more difficult to achieve.

However, when using finite difference models, the cost of memory access, either on the CPU or on the GPU, is often a bottleneck in performance. For example, for the first order spatial derivative at grid point i of wave field f

In the case of the numerical calculation of (a),n+1 values of f must be loaded from the computer memory resulting in a read-to-calculate ratio of n+1:1, where N is the order of the finite difference. Therefore, reducing the number of memory accesses or increasing the bandwidth of memory reads and writes is critical to the efficiency of the finite difference method.

Injecting one such source requires only one read and one write memory access when the excitation source point is on a finite difference grid point. However, when the source location is not on a grid point, a grid point source combination equivalent to a source located outside the grid point must be found. The effective number of such grid point sources is typically n _x n _y n _z Wherein n is _x(y,z) Is the number of grid points of the source position in the x (y, z) direction, typically taking the same value equal to half the finite difference order N of the accuracy requirement. Thus, for 16 th order finite difference, if n _x(y,z) =n/2=8, the number of combined grid point sources is 512 at most. More importantly, the grid points are not in continuous memory and the memory access cost is increased by at least n _x n _y n _z Multiple times. In Reverse Time Migration (RTM) or Full Waveform Inversion (FWI), the data becomes the source during backward wave propagation. In this case, the number of effective grid point sources is n _x n _y n _z N _x N _y Wherein N is _x And N _y The number of receivers in the x-direction and the y-direction, respectively. Similarly, to retrieve the wavefield at any location, n is needed _x n _y n _z Combinations of wavefield values at the grid points.

On a GPU, off-chip global memory access is very extensive compared to buffered on-chip memory access. Thus, how to increase the bandwidth of source loading or wavefield retrieval on the GPU has a significant impact on the performance of finite difference codes when the source or receiver is not on grid points.

For the above reasons, there is a need to find a better way to increase the bandwidth of the source loading or wave field retrieval process to increase the computational load between available computer resources on the GPU.

Disclosure of Invention

Accordingly, the present invention meets the above-described needs and overcomes one or more deficiencies in the prior art by providing a computer-implemented method of high-speed multi-source loading and wavefield retrieval using finite-difference models.

In general, according to one embodiment of the invention, spatial locality indicates that a thread may spatially read memory addresses of other nearby threads. However, in the finite difference scheme, GPU threads are only mapped to process grid points. Similarly, a thread for source load expression (4) or wavefield retrieval expression (5) will only access the memory of its nearby threads. If the memory of the above example is located on an off-chip global memory resource, such access would be an expensive operation to perform. Thus, by using texture memory according to embodiments of the present invention, threads can read from nearby memory addresses with higher bandwidth, thereby greatly reducing load and retrieve times in multiples of 10.

According to one embodiment of the invention, a sequence of steps for fast multi-source loading at arbitrary locations in a finite difference method on a GPU generally includes: (1) Using n _x(y,z) To determine the effective number of combined grid points for each actual source, where n _x(y,z) Typically taking N/2, N being the finite difference order; (2) Each effective grid point source item is regarded as an independent point source with weight; (3) Distributing independent grid point source grid positions, weights and wave field variables on an off-chip global memory; (4) mapping those global memories to texture memories; (5) Using a CUDA programmed texture memory fetch function and API to read grid point source grid locations, as well as field values and weights on these grid points; and (6) loading the grid point sources with weights on the fields of the grid site.

Similarly, another embodiment of the invention requires a sequence of steps for fast wavefield retrieval at arbitrary positions on the GPU using a finite difference method. The sequence generally includes: (1) Using n _x(y,z) To determine the effective number of combined grid points for each receiver, where n _x(y,z) Typically taking N/2, N being the finite difference order; (2) calculating a contribution weight; (3) Distributing the grid point positions, weights and wave field variables on the off-chip global memory of the GPU; (4) Mapping those global memories into textures Storing; (5) Reading field values and weights on the grid points using a CUDA programmed texture memory fetch function and process; (6) summing the weighted wavefields to obtain a search wavefield.

Finally, while the embodiments disclosed herein describe methods and systems for multi-source injection and wavefield retrieval in wave propagation using finite difference approximations of scalar acoustic wave equations, it will be apparent to those of ordinary skill in the art that they may alternatively be applied to vector wave equations and elastic equations in isotropic and anisotropic media without departing from the true scope of the invention as defined in the following claims.

Symbol list

Drawings

The teachings of the present invention can be readily understood by considering the following description in conjunction with the accompanying drawings.

FIG. 1 schematically shows a schematic diagram of a cross-sectional view of an illustrative environment including a seismic source, a seismic receiver, a borehole location, a borehole, various points of incidence of transmitted rays, and various angles of incidence, in accordance with certain embodiments of the invention;

FIG. 2 is a schematic diagram showing a top view of an exploration area, showing an acquisition modality having a receiver and shot grid points and lines, wherein the receiver and shot are each located at any location on the grid, in accordance with certain embodiments of the present invention;

FIG. 3 is a flowchart of a computer-implemented method for using a finite difference model to load source incident location points at any location on a grid at high speed, according to some embodiments of the invention;

FIG. 4 is a flowchart of a computer-implemented method for using a finite difference model to retrieve receiver locations at high speed at arbitrary locations on a grid, according to some embodiments of the invention; and

FIG. 5 illustrates a graphical user interface of a computer-implemented method according to some embodiments of the invention, showing pressure wavefields at different time-step intervals, and the locations of their respective source and receiver points.

Detailed Description

Reference will now be made in detail to several embodiments of the invention, examples of which are illustrated in the accompanying drawings. It should be noted that wherever applicable, the same or similar reference numbers are used in the drawings to refer to the same or like functions. The figures depict embodiments of the present invention for purposes of illustration only. Those skilled in the art will readily recognize from the following description that alternative embodiments of the structures, systems, and methods illustrated therein may be employed without departing from the principles described herein.

The computer-implemented method of the present invention introduces an efficient way to inject the source and retrieve wavefields using limited differences when the source or receiver is not on a numerical grid point for wavefield simulation on the GPU. In the prior art, memory access is a bottleneck in comparison to computing instructions. Furthermore, when the sources are located anywhere in the finite-difference grid, they must be spread over multiple grid points, effectively increasing the number of source points injected into the simulation. This greatly increases the memory access cost since these expanded grid points are often not in one contiguous memory region, but the increased memory access cost is limited when the actual number of sources is small. However, when performing Reverse Time Migration (RTM) or Full Waveform Inversion (FWI), the number of actual sources during the backward wave simulation is equal to the number of data traces, thereby significantly increasing memory access costs. Similarly, in forward wave simulation, when the receiver is not on a grid point, the wavefield recorded at the receiver must be obtained from its neighboring grid point. In one of the current embodiments of the invention, the computer-implemented method aims to overcome these memory access costs by using GPU texture memory. Although GPU threads for source injection or wavefield retrieval must read non-contiguous memory, memory access patterns exhibit a large amount of spatial locality. In computing applications, this roughly means that a thread may read from an address "near" to the address read by a nearby thread. GPU texture memory is designed to increase the effective bandwidth of this spatially localized memory read. By using such texture memory, the efficiency of source injection and wavefield retrieval can be increased by a factor of ten.

1. Deployment of techniques to increase GPU bandwidth

In order to calculate the acoustic wave propagation equation by the present computer-implemented method, the following algorithm was developed in simplified form:

wherein V is _P Is the velocity of the sound wave,

is due to->

Point excitation at site>

A pressure wavefield at, and s (t) is a source time wavelet.

In the finite difference numerical solution, equation (1) is spatially and temporally discrete, and the mesh sizes in the X, Y, Z and t dimensions are (N _x ,N _y ,N _z ,N _t ). However, the pressure wavefield has values only at these grid points:

thus, in the process of aiming at

In the finite difference model of (a),the computation of reading from and writing to memory resources is always a bottleneck.

When the focus is positioned

At grid points (i) _s ,j _s ,k _s ) In the above case, the source loading for a time step value n in the finite difference model can be simply calculated as:

the memory cost of this source loading is just one read and one write memory resource for each source. Thus, most of the skilled in the art will ignore this cost compared to other operations.

A. When the receiver position is at the grid point

On the other hand, when the receiver is in position

At grid points (i) _r ,j _r ,k _r ) On the upper part, the receiver records the pressure wave field at a time step value n >

Can be simply retrieved as:

nevertheless, the memory cost of this retrieval process is only one read and one write in memory for each receiver. Therefore, most of the skilled in the art will ignore the cost compared to other operations.

However, when the source point

Not on finite difference grid pointsA set of grid points (i, j, k) has to be found, which in combination are equivalent to being located +.>

A seismic source at the location. Therefore, the following equation must be implemented:

wherein,,

and +.>

And->

Is the nearest grid point for any extra-grid source, +.>

And

is a weight function and deltax (deltay, deltaz) is the grid step in the x (y, z) direction.

Theoretically, the set of grid points (i, j, k) for each actual point source is typically spread out to grid space over the entire space. Then, each grid point source in expression (4) is loaded according to expression (2), which significantly increases the cost. In practice, however, for accuracy reasons, the window grid point source of expression (4) is used such that for

Each point source at (1) uses: -n _x /2+1≤i≤n _x /2,-n _y /2+1≤j≤n _y /2,-n _z /2+1≤k≤n _z 2 sets of grid point sources. Thus, the effective number of grid point sources mentioned above is n _x n _y n _z . Also, for accuracy reasons, in practice, 2n _x(y,z) The order N of the finite difference operator is typically taken, so for a 16-order finite difference, N _x(y,z) The effective number of combined grid point source terms is eventually up to 512, which obviously increases the loading cost by a factor of 512. In addition, since the grid points-n are expanded for each point source _x /2+1≤i≤n _x /2,-n _y /2+1≤j≤n _y /2,-n _z /2+1≤k≤n _z And/2 is not in contiguous memory resources, memory access costs are further increased. This increased memory access cost becomes worse during the back wave simulation phase of RTM or FWI because the actual point source has N _x N _y And the number of extended grid point sources is n _x n _y n _z N _x N _y And each.

B. When the receiver position is not at the grid point

Similarly, when the receiver

When not on finite difference grid points, a set of grid point wavefield terms (i, j, k) must be found, which can be combined to get the position +.>

Can use the following expression:

under expression (5), the wavefield R (x) _r ,y _r ,z _r N) and

is dispersed at discontinuous memory grid point-n _x /2+1≤i≤n _x /2,-n _y /2+1≤j≤n _y /2,-n _z /2+1≤k≤n _z N in/2 _x n _y n _z The individual grid point wavefield values combine, thus also resulting in expensive memory access costs.

Although a grid point is required for source loading or wavefield retrieval in the case where the source or receiver is not at the grid point or on a contiguous area in computer memory: -n _x /2+1≤i≤n _x /2,-n _y /2+1≤j≤n _y /2,-n _z /2+1≤k≤n _z 2, they may nevertheless still be spatially located if the grid points are extended from the source or receiver locations. This can be achieved by texture cache memory on the GPU, which was originally designed for graphics applications, where the memory access pattern exhibited a large amount of spatial locality with an effective read-write bandwidth that was about 10 times higher than the read of off-chip global memory. Furthermore, such texture memory is also available for general purpose computing, but has not been utilized so far.

2. Loading or injecting sources or points of incidence

In general, the sequence of steps for fast multi-source injection at arbitrary locations in a finite difference method running on a GPU proposed in one of the embodiments of the present invention comprises:

a) An effective number of combined grid point sources for each actual source is determined. When the focus is at

When not on finite difference grid points, a set of grid point (i, j, k) sources must be found, which in combination are equivalent to being located +.>

A seismic source at the location, which can be according to expressions (4) and

which is the closest grid point to the off-grid source, deltax (deltay, deltaz) is the grid step in the x (y, z) direction,

and->

Is a weight function, which when incorporated into a computer-implemented method is represented by the following algorithm:

For having coordinates

The dot point i of (2) satisfies: />

Wherein n is _x Is the window size of the user input, or n _x =n/2, half of the finite difference order, I ₀ Is a zero order modified bessel function, b=4.31, is a shape control parameter. Similarly, grid points j and +.>

And k and +.>

Thus, the effective number of combined point sources is n in three dimensions _x n _y n _z Each valid grid point source term is considered as an independent point source with a weight of +.>

For precision reasons 2n _(x,y,z) The order N of the finite difference operator is typically taken. Thus, for 16 th order finite difference, if n is taken _x,y,z The number of combined grid point source terms reaches 512, =n/2=8. For a total of N _s A number of sources of the plurality, an effective number of independent sources of n _x n _y n _z N _s 。

b) The n are allocated _x n _y n _z N _s The effective independent grid point source grid positions, their weights, and the wave field variables on off-chip GPU global memory are represented by the following source codes:

//allocate the source position on grid point or no on grid points

cudaMalloc((void**)&config->srcpos_shot,config->NSRC_shot*sizeof(int4))；

//Allocate source weight when offset on grid points

cudaMalloc((void**)&config->srcSw8_shot,config->NSRC_shot*sizeof(float))；

c) Copying a source location from a CPU to a GPU using a CUDA copy API with source code to copy from CPU memory to off-chip GPU global memory resources

cudaMemcpy(config->srcpos_shot,&config->srcpos_shot_thread[0],config->NSRC_shot*sizeof(int4),cudaMemcpyHostToDevice)；

//copy source weigh from CPU to GPU

cudaMemcpy(config->srcSw8_shot,&config->srcSw8_shot_thread[0],config->NSRC_shot*sizeof(float),cudaMemcpyHostToDevice)；

d) Mapping those allocated off-chip global memories to texture memories using a CUDA API (application programming interface), as shown in the following source code;

cudaBindTexture(0,d_srcSw8_tex,config->srcSw8,&channelDesc,

nx*ny*nz*config->NREC*sizeof(float))；

e) Reading independent point source grid locations, and wavefields, using CUDA texture memory acquisition APIs

And the weights on these grid points +.>

f) Loading or injecting an active independent grid point source with weights at grid points (i _s+i ，j _s +j，k _s +k), as shown in the following expression,

source code:

//fetch source location

int nslowsrc＝tex1Dfetch(d_srcpos_tex,isrc)-RADIUS；

int nmiddlesrc＝tex1Dfetch(d_srcpos_tex,1*nsrc+isrc)；

int nfastsrc＝tex1Dfetch(d_srcpos_tex,2*nsrc+isrc)；

int j＝tex1Dfetch(d_srcpos_tex,3*nsrc+isrc)；

size_t globalAddr＝(nslowsrc+RADIUS)*stridemf+nmiddlesrc*stridef+nfastsrc；

//fetch weighted source

float tmp＝tex1Dfetch(src,it*ntsrc+j)；

//fetch wavefield

float pp＝tex1Dfetch(d_p_tex,globalAddr)；

//inject the wavefield

atomicAdd(&p[globalAddr],pp)；

3. retrieving a receiver wavefield

The sequence of steps for fast wavefield retrieval at arbitrary positions in the finite difference method running on the GPU generally includes:

a) An effective number of combined grid points for each receiver is determined and a contribution weight is calculated. When the receiver

When not on finite difference grid points, a set of grid point wavefield terms (i, j, k) must be found, which in combination can obtain the position +.>

The wavefield at which the following expression may be used:

here, n _x(y , _z) Window size, 2n, as commonly understood by one of ordinary skill in the art _x(y , _z) Equal to the finite difference order N, expressed as:

which is the nearest grid point of the receiver outside the grid,

and->

Is a weight function

For the purpose of

/>

Similarly, calculate

And->

For a total of N _r In the case of a receiver of a single-phase receiver,grid point sum n for acquiring wavefields at receiver sites _x n _y n _z N _r 。

b) Allocating these n's on off-chip global memory _x n _y n _z N _r Grid point locations, weights, and wavefield variables at those grid locations;

c) Mapping those allocated off-chip global memory to texture memory;

d) Reading wavefields using CUDA texture memory fetch APIs

And the weights on these grid points +.>

e) Summary by

Weighted wave field->

The position (x) at the time step n is obtained from the above expression _r ,y _r ,z _r ) R (x) _r ,y _r ,z _r N) a search wavefield.

f) The CUDA replication API is used to replicate from off-chip GPU global memory resources to CPU memory.

4. Detailed description of the drawings

FIG. 1 is a cross-sectional view of a portion of the earth above an exploration area 101 in which a preferred embodiment of the present invention may be used. It is important to note that the survey area 101 of FIG. 1 is a land-based area 102. Those of ordinary skill in the art will recognize that the seismic survey area 101 produces detailed images of the local geology to determine the location and size of possible hydrocarbon (oil and gas) reservoirs to determine potential well sites 103. In these exploration areas, acoustic waves bounce from the subsurface rock formations at different points of incidence 104 during an explosion, waves reflected back to the earth's surface are captured by seismic data recording sensors 105, transmitted 303 wirelessly from the sensors 105 by a data transmission system 305, and then stored in memory resources on a central processing unit for subsequent processing, and processed by the central processing unit and a graphics processing unit, typically controlled by computer system devices having output devices such as displays, keyboards, mice, wireless and wired network cards, printers, and the like.

Referring to FIG. 1, a cross-sectional view of a portion of the earth above an exploration area shows different types of

formations

102, 103, 104 that will include seismic exploration data in the present invention. In particular, those of ordinary skill in the art will readily appreciate that the present example illustrates a common midpoint gather in which the seismic data traces are classified by surface geometry to simulate a single reflection point in the earth. In this example, the data from several shots and receivers may be combined into one image gather or used separately depending on the type of analysis to be performed. While the present example may illustrate a planar reflector and corresponding image gather class, other types or classes of image gathers known in the art may be used, the selection of which may depend on the presence of various earth conditions or events.

Furthermore, as shown in FIG. 1, seismic energy from multiple points of incidence or sources 104 will reflect from interfaces between different formations. These reflections will then be captured by a plurality of seismic data record receiving sensors 105 and wells 103, each arranged at a different positional offset 110 from each other. Because all the points of incidence 104 and all the seismic data recording sensors 105 are placed at different offsets 110, survey seismic data or traces (also referred to in the art as gathers) will be recorded at various angles of incidence 108. The point of incidence 104 generates a downward transmitted ray 105 in the earth, which is captured by the upward transmitted reflection by the recording sensor 105. In this example, well site 103 is shown with an existing well bore attached to well bore 109, and multiple measurements along well bore 109 may be obtained using techniques known in the art. The well bore 109 is used to obtain source information (e.g., wavelets) as well as logging data, including P-wave velocity, S-wave velocity, density, and other seismic data. Other sensors, not shown in FIG. 1, are disposed within the survey area 101 to also capture horizon data information required by interpreters and those of ordinary skill in the art to perform various geophysical analyses. In this example, the gathers will be classified according to the field recordings to check the dependence of amplitude, signal-to-noise ratio, dynamic correction, frequency content, phase and other seismic properties with respect to the angle of incidence 108, offset measurement 110, azimuth, and other geometric properties that are important to data processing and imaging and known to those of ordinary skill in the art.

Although the entry point or shot 104 is represented in fig. 1 as a common midpoint shot geometry and the shot lines are mostly horizontal, one of ordinary skill in the art will quickly recognize that the pattern may be readily represented in other ways, such as vertical, diagonal, or a combination of the three, which in turn may result in different shot geometries. However, due to operating conditions, uniform coverage of the receiving sensors 105 for uniform acquisition of wavelets, input parametric models, and seismic data as shown in FIG. 1 is generally not achieved over the entire investigation region.

Referring to FIG. 2, a seismic survey area grid 201 is shown in which a preferred embodiment of the invention may be used. It is important to note that the grid of fig. 2 is land-based area 102, and that a complete survey plan, including the lineup shots 104 at any location and the receivers 105 also at any location, may vary due to the survey characteristics of target, budget, resources, and time.

Those of ordinary skill in the art will recognize that grid 201, for example, may generate detailed images of local geology to determine the location and size of potential hydrocarbon (oil and gas) reservoirs to determine potential well sites 103. The land acquisition grid geometry illustrated by fig. 2 is typically performed by a strip shot wherein the receiver cables are arranged in parallel lines (collinear direction) and the shots are in a vertical direction (transverse direction). In these grids, acoustic waves bounce off of the subsurface rock formations at different points of incidence or shots 104 during an explosion, and waves reflected back to the surface are captured by seismic data recording sensors 105, transmitted wirelessly from the sensors 105 by data transmission system 305, then stored for subsequent processing on a central processing unit having memory resources, and then analyzed by a digital high performance computing system having a graphics processing unit, off-chip global memory resources, and texture memory resources. Although the shots 104 are shown in fig. 2 as geometric shapes having a pattern of intersecting arrangements of nearly horizontally extending shot lines 206, one of ordinary skill in the art will quickly recognize that the pattern may be readily represented in other ways, such as vertical, diagonal, or a combination of the three. Similarly, recording sensor 105 is disposed on receiver line 207, which is shown passing through cannon line 206, but may also be represented as having a different pattern (e.g., diagonal). The linear approach to the cannon produces a large range of source-receiver azimuth angles, which can be a problem during analysis by the graphics processing unit. The source-receiver azimuth is the angle between a reference line (e.g., a receiver line or dip line) and a line through the source and receiver stations. However, due to operating conditions, uniform coverage as shown in FIG. 2 is generally not achieved over the entire survey area.

FIG. 3 illustrates a flowchart of a computer-implemented method for using a finite difference model to load source shots at high speed at any location on a grid, according to some embodiments of the invention. The computer-implemented method 301 uses a combination of a central processing unit and a graphics processing unit, both of which have memory resources for storing some of the generated data. In particular, the computer-implemented method 301 begins with a central processing unit receiving information that needs to be stored via an external source. Such information is stored at step 302 into the memory resources of the central processing unit, which typically includes source time wavelets, source points, and modeling grids, all with source grid points 303 located at arbitrary locations. The way the central processing unit stores 303 the information into its memory resources is by using a sonic velocity model in 3D coordinates. Thereafter, the central processing unit retrieves only the modeling grid from its memory resources at step 304 and begins to extract pressure wavefield variables with arbitrary source grid point locations at step 305. The central processing unit then stores the pressure wavefield variable at step 306 and begins to calculate effective source grid point positions for the stored pressure wavefield variable with arbitrary grid point positions at step 307.

Computing a coordinate according to an algorithm of a finite difference model

These parameters satisfy the following: />

Wherein n is _x =n/2, i.e. half of the finite difference order. Similarly, the algorithm for the finite difference model for grid point j includes calculating +.>

The algorithm for the finite difference model of grid point k comprises calculating +.>

Thus, the effective number of combined point sources is n in three dimensions _x n _y n _z Each valid point source term is considered by the algorithm as an independent point source with weights

For accuracy, step 307 typically employs 2n _(x,y,z) The order N as finite difference operator. Thus, for 16 th order finite difference, the central processing unit takes n in step 307 _x,y,z This in turn results in up to 512 combined grid point source terms, =n/2=8. The central processing unit then stores the valid source grid point locations of the pressure wavefield variable into its memory resource at step 309. After completion of step 309, the central processing unit sends a message hook to the graphics processing unit to initiate the step 310 of moving the valid source grid point locations of the stored pressure wavefield variable from memory on the central processing unit The resources are copied to off-chip global memory resources located on the graphics processing unit, which may use the CUDA copy application programming interface. The graphics processing unit then maps its off-chip global memory resources to its texture memory resources using the following mapping CUDA API at step 311:

cudaBindTexture(0,d_srcSw8_tex,config->srcSw8,&channelDesc,

nx*ny*nz*config->NREC*sizeof(float))；

once the mapping step 311 is complete, the graphics processing unit generates an initial time step value at step 312 that includes the minimum values obtained from the source time wavelet, the source points, and the modeling grid, all of which contain the following three different variables. These variables are: a) Stability requirements; b) Inputting a data sampling interval; and c) imaging condition interval. The graphics processing unit then stores the initial time step value to the off-chip global memory resource at step 313 and sends the graphical user interface to a display monitor connected to the graphics processing unit for the user (typically one of ordinary skill in the art) to enter the user-defined maximum time step value at step 314. Using the same variables as step 312, the user then defines his maximum time step value as the minimum of: a) Stability requirements of 0.1ms to 2 ms; b) An input data sampling interval of 1ms to 4 ms; c) Imaging conditions interval of 2ms to 8 ms. After entering the maximum time step value, the user confirms through the graphical user interface, signaling to store the maximum time step value to their off-chip global memory resource at step 315. The graphics processing unit retrieves the initial time-step value and the user-defined maximum time-step value at step 316 to begin computing the wave-propagation algorithm at step 317, where the valid source grid point locations of the pressure wave field variable are used at each of the input time-step values between the retrieved initial time-step value and the user-defined maximum time-step value. The wave propagation calculation performed at step 317 is an iterative process to solve the algorithm proposed by expression (1). The iterative step 317 essentially uses the efficient computer-implemented method of the present invention to inject sources that generate wavefields on the grid, and then propagate them until the last entered time-step value equals the user-defined maximum time-step value. The graphics processing unit will verify at step 318 whether the last entered time step value is equal to the user-defined maximum time step value. If not, the graphics processing unit repeats the calculation done at step 317 until all of the entered time step values are exhausted. At the same time, the graphics processing unit, with its parallel processing capabilities, verifies whether the calculation finally performed at step 317 is equal to the user-defined maximum time step value, and completes the iterative process 317-318 of wave propagation, and loads or injects new source grid points onto the newly generated grid with valid source pressure wave field variables at step 319. After step 319 is completed, the graphics processing unit begins at step 320 using the CUDA replication application programming interface to replicate the new grid of new source grid points with pressure wavefield variables from the effective locations of each calculated wave propagation algorithm into a random access memory device in the central processing unit. Once the graphics processing unit has completed step 321, it sends a signal to the central processing unit to begin storing a new grid of new source grid points with pressure wavefield variables from each computed wave propagation algorithm's active location into its memory resource (typically a hard disk drive). A graphical user interface is then displayed in a monitor connected to the central processing unit, indicating to the user that the computer-implemented method has completed its first stage, including using a finite-difference model to load the source-incident point at high speed, that the final data has been stored, and that a second stage of the computer-implemented method is being prepared to begin, including using the finite-difference model to retrieve the receiver location at high speed. These data can be used for further geophysical and seismic imaging analysis.

Turning to FIG. 4, a flowchart of a computer-implemented method for using a finite difference model to retrieve receiver locations at high speed at any location on a grid is shown, according to some embodiments of the invention. Computer-implemented method 401 uses a combination of a central processing unit and a graphics processing unit, both of which have memory resources for storing some of the generated data. In particular, the computer-implemented method 401 begins with the central processing unit receiving information that needs to be stored via an external source. Such information is stored at step 402 into the memory resources of the central processing unit, which typically includes source time wavelets, receiver points (sometimes just one point), and a modeling grid, all with receiver grid points 403 located at arbitrary locations. The way the central processing unit stores 403 the information into its memory resources is by using a sonic velocity model in 3D coordinates. Thereafter, the central processing unit retrieves only the modeling grid from its memory resources at step 404 and begins to extract the pressure wavefield variable with any receiver grid point locations at step 405. The central processing unit then stores the pressure wavefield variable at step 406 and begins to calculate effective receiver grid point positions for the stored pressure wavefield variable with arbitrary grid point positions at step 407.

Computing a coordinate according to an algorithm of a finite difference model

These parameters satisfy the following: />

For accuracy, step 407 typically employs 2n _(x,y,z) The order N as finite difference operator. Thus, for 16 th order finite difference, the central processing unit takes n in step 407 _x,y,z This in turn results in a number of combined grid point receiver terms up to 512, =n/2=8. The central processing unit then stores the valid receiver grid point locations of the pressure wave field variable into its memory resources at step 409. After completion of step 409, the central processing unit sends a message hook to the graphics processing unit to initiate copying of the valid receiver grid point locations of the stored pressure wavefield variable from memory resources on the central processing unit to off-chip global memory resources located on the graphics processing unit, which may use the CUDA copy application programming interface, at step 410. The graphics processing unit then maps its off-chip global memory resources to its texture memory resources at step 411 using the following mapping CUDA API:

cudaBindTexture(0,&d_srcSw8_tex,config->srcSw8,&channelDesc,

nx*ny*nz*config->NREC*sizeof(float))；

Once the mapping step 411 is completed, the graphics processing unit generates an initial time step value at step 412 that includes the minimum value obtained from the receiver points and the modeling grid, all of which contain the following three different variables. These variables are: a) Stability requirements; b) Inputting a data sampling interval; and c) imaging condition interval. The graphics processing unit then stores the initial time step value to the off-chip global memory resource at step 413 and sends the graphical user interface to a display monitor connected to the graphics processing unit for the user (typically one of ordinary skill in the art) to enter the user-defined maximum time step value at step 414. Using the same variables as step 412, the user then defines his maximum time step value as the minimum of: a) Stability requirements of 0.1ms to 2 ms; b) An input data sampling interval of 1ms to 4 ms; c) Imaging conditions interval of 2ms to 8 ms. After entering the maximum time step value, the user confirms through the graphical user interface, signaling to store the maximum time step value to their off-chip global memory resource at step 415. The graphics processing unit retrieves the initial time-step value and the user-defined maximum time-step value at step 416 to begin computing the wave-propagation algorithm at step 417, wherein the valid receiver grid point locations of the pressure wave-field variable are used at each of the input time-step values between the retrieved initial time-step value and the user-defined maximum time-step value. The wave propagation calculation performed at step 417 is an iterative process to solve the algorithm proposed by expression (1). The iterative step 417 essentially uses the efficient computer-implemented method of the present invention to retrieve the receiver wavefields generated on the grid and then propagates them until the last entered time-step value equals the user-defined maximum time-step value. The graphics processing unit will verify at step 418 whether the last entered time step value is equal to the user-defined maximum time step value. If not, the graphics processing unit repeats the calculation done at step 417 until all of the entered time step values are exhausted. At the same time, the graphics processing unit verifies with its parallel processing capability whether the calculation finally performed at step 417 is equal to the user-defined maximum time step value and completes the iterative process of wave propagation 417-418 and retrieves the new receiver grid point onto the newly generated grid with valid receiver pressure wave field variables at step 419. After step 419 is completed, the graphics processing unit begins at step 420 with copying a new grid of new receiver grid points having pressure wavefield variables from the effective locations of each calculated wave propagation algorithm into a random access memory device in the central processing unit using the CUDA copying application programming interface. Once the graphics processing unit has completed step 421, it sends a signal to the central processing unit to start storing a new grid of new receiver grid points with pressure wave field variables from each calculated wave propagation algorithm's active location into its memory resource (typically a hard disk drive). A graphical user interface is then displayed in a monitor connected to the central processing unit, indicating to the user that the computer-implemented method has been completed, that the final data has been stored, and that it is available for further geophysical and seismic imaging analysis.

Referring to FIG. 5, a graphical user interface of a computer implemented method according to some embodiments of the invention is shown showing the pressure wavefield at different time step intervals 501, and their respective source and receiver point positions. In particular, the graphical user interface shows a modeling grid 502 having a source shot or point of incidence 503, a receiver 504, and a pressure wavefield 505 captured by the receiver 504 along the subsurface of the earth. When the computer-implemented method performs the wave propagation step from its initial time step value 506 to its maximum user-defined time step value 510, the graphics processing unit loads or injects an extra-grid source (represented by an explosion symbol in fig. 5) into the grid and retrieves an extra-grid receiver grid point (represented by an inverted triangle in fig. 5) into the grid, resulting in clear and accurate wave propagation, as shown by

reference numerals

507, 509, and 511.

In accordance with the preferred embodiments of the present invention, certain hardware and software have been described in detail as exemplary only and not as limitations on the implementation architecture of the disclosed embodiments. For example, while many of the internal and external components of the receiving system apparatus of fig. 3 have been described, those of ordinary skill in the art will understand that such components and their interconnections are well known. Additionally, certain aspects of the disclosed invention may be embodied in software that is executed using one or more receiving systems, computer system devices, or non-transitory computer-readable storage devices. Program aspects of the technology may be regarded as an "article of manufacture" or "article of manufacture" in the form of executable code and/or associated data, typically carried or embodied in some type of machine readable medium. Tangible, non-transitory, "storage" type media and devices include any or all of the memory or other storage for computers, processes, etc., or related modules thereof, such as various semiconductor memories, tape drives, magnetic disk drives, optical or magnetic disks, and components that can provide storage for software programming at any time.

The term "survey area" as used herein refers to a region or volume of geological interest and may be associated with the geometry, pose, and arrangement of the region or volume at any measurement scale. One region may have features such as folding, faults, cooling, unloading, and/or breaking that have occurred therein.

The term "computing" as used herein includes a wide variety of actions including calculating, determining, processing, deriving, exploring, looking up (e.g., looking up in a table, database or another data structure), convincing, and the like. It may also include receiving (e.g., receiving information), accessing (e.g., accessing data in memory), etc. Also, "computing" may include parsing, selecting, constructing, and the like.

The terms "subsurface" and "underground" as used herein refer to the surface of any piece of land above any altitude or in the elevation range (whether above, below, or at sea level), and/or below any ground surface (whether above, below, or at sea level).

Unless specifically stated otherwise, terms such as "defining," "creating," "including," "representing," "pre-analyzing," "pre-defining," "selecting," "constructing," "assigning," "creating," "introducing," "eliminating," "re-meshing," "integrating," "finding," "executing," "predicting," "determining," "inputting," "outputting," "identifying," "analyzing," "using," "distributing," "interfering," "adding," "adjusting," "merging," "simulating," "reducing," "distributing," "specifying," "extracting," "displaying," "executing," "implementing," and "managing" may refer to actions and processes of retrieving a system or other electronic device that converts data in a memory (e.g., memory resource or non-transitory computer-readable memory) of some electronic device, that is represented as a physical quantity (electronic, magnetic or optical), into other data in a memory or transmission device or display device that is similarly represented as a physical quantity.

Embodiments disclosed herein also relate to computer-implemented systems for use as part of a retrieval system for performing the operations described herein. The system may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program or code stored in memory resources or non-transitory computer readable memory. Thus, the computer program or code may be stored or encoded in a computer readable medium or implemented over some type of transmission medium. Computer-readable media includes any medium or mechanism for storing or transmitting information in a form readable by a machine, such as a computer ("machine" and "computer" are synonymously used herein). By way of one non-limiting example, computer readable media may comprise computer readable storage media (e.g., read only memory ROM, random access memory RAM, magnetic disk storage media, optical storage media, flash memory devices, etc.). A transmission medium may be twisted wire pairs, coaxial cable, optical fiber, or some other suitable wired or wireless transmission medium for transmitting signals, such as electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.).

A receiving system or sensor 105 as used herein generally includes at least hardware capable of executing machine-readable instructions, as well as software for performing actions (typically machine-readable instructions) that produce a desired result. In addition, the retrieval system may include a mixture of hardware and software, as well as a computer subsystem.

The hardware typically includes at least platforms having processor functionality, such as client computers (also known as servers) and handheld processing devices (e.g., smartphones, personal digital assistants PDAs, or personal computing devices PCD). Furthermore, the hardware may include any physical device that may store machine-readable instructions, such as memory or other data storage. Other forms of hardware include hardware subsystems, including, for example, transmission devices such as modems, modem cards, ports, and port cards.

Software includes any machine code stored in any storage medium (e.g., RAM or ROM), as well as machine code stored on other devices (e.g., non-transitory computer readable media, such as external hard drives or flash memory). The software may include source or object code containing any set of instructions capable of being executed in a client computer, server computer, remote desktop or terminal.

Combinations of software and hardware may also be used to provide enhanced functionality and performance for certain embodiments of the disclosed invention. One example is the direct fabrication of software functions into silicon chips. It is therefore to be understood that combinations of hardware and software are also included within the definition of a retrieval system, and that the invention contemplates equivalent structures and equivalent methods as possible.

The computer readable medium or memory resources include passive data storage, such as Random Access Memory (RAM), as well as semi-persistent data storage, such as external hard drives and external databases. In addition, embodiments of the invention may be embodied in the RAM of a computer to convert a standard computer to a new specific computing machine.

The data structure is a defined data organization in which embodiments of the invention may be implemented. For example, the data structures may provide organization of data, or organization of executable code. Data signals may be carried across non-transitory transmission media and stored and transmitted across various data structures, and thus may be used to transmit embodiments of the invention.

A system computer may be designed to operate on any particular architecture. For example, the system may execute on a high-performance computing system that typically includes a collection of individual computers physically connected or connected through local area networks, client-server networks, wide area networks, the Internet, and other portable and wireless devices and networks.

"output device" includes direct behavior that results in production, as well as any indirect behavior that facilitates production. Indirect behavior includes providing software to a user, maintaining websites through which the user can influence a display, hyperlinking to such websites, or cooperating with entities performing such direct or indirect behavior. Thus, the user may operate alone or in cooperation with a third party provider to generate the reference signal on the display device. The display device may be included as an output device and should be adapted to display desired information such as, but not limited to, a CRT monitor, LCD monitor, plasma device, tablet device or printer. The display device may include a device that has been calibrated using any conventional software intended for evaluating, correcting, and/or improving display results (e.g., a color monitor that has been adjusted using monitor calibration software). In addition to or instead of displaying the reference image on the display device, the method according to the invention may comprise providing the reference image to the subject. "providing a reference image" may include creating or distributing the reference image to an object by physical, telephone, or electronic transfer, providing access to the reference image over a network, or creating or distributing software to an object configured to run on an object workstation or a computer containing the reference image. In one example, providing the reference image may involve enabling the subject to obtain the reference image in hard copy form via a printer. For example, information, software, and/or instructions may be transferred (e.g., electronically or physically via a data store or hard copy) and/or otherwise available (e.g., via a network) to facilitate a subject using a printer to print a reference image in hard copy form. In such examples, the printer may be a printer that has been calibrated using any conventional software intended for evaluating, correcting, and/or improving print results (e.g., a color printer that has been adjusted using color correction software).

The database or databases may comprise any standard or proprietary database software, such as Oracle, microsoft Access, syBase or DBase II. The database may have fields, records, data, and other database elements that may be associated by database-specific software. In addition, data may be mapped. Mapping is the process of associating one data entry with another. For example, the data contained in the character file location may be mapped to a field in a second table. The physical location of the database is not limited and the database may be distributed. For example, the database may be located at a remote location from the server and run on a separate platform. In addition, the database may be accessed over a local area network and over a wireless network of the Internet.

Furthermore, the modules, features, attributes, methodologies and other aspects can be implemented as software, hardware, firmware or any combination thereof. When implemented as software, the components of the present invention may be implemented as separate programs, as part of a larger program, as multiple separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in all other ways known to those of skill in the art of computer programming, now or in the future. In addition, the invention is not limited to implementation in any particular operating system or environment.

The various terms used herein are defined below. To the extent that the term used in the claims is not defined below, the broadest possible definition persons in the pertinent art have given that term as reflected in at least one printed publication or issued patent.

As used herein, "and/or" interposed between a first entity and a second entity refers to one of (1) the first entity, (2) the second entity, and (3) the first entity and the second entity. The various elements listed with "and/or" should be interpreted in the same manner, i.e., "one or more" of the elements so connected.

Additionally, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified hardware functions or acts, or combinations of special purpose hardware and computer instructions.

While in the foregoing specification this invention has been described in relation to certain preferred embodiments thereof, and many details have been set forth for purposes of illustration, this invention is not to be unduly limited to the foregoing description and has been set forth for purposes of illustration. On the contrary, various modifications and alternative embodiments will be apparent to those skilled in the art without departing from the true scope of the invention as defined by the following claims. It should be further appreciated that structural features or method steps shown or described in any one embodiment herein may be used in other embodiments as well.

Claims

1. A computer-implemented method for using a finite-difference model to load source shots at high speed at any location on a grid, the method comprising:

using the acoustic velocity model in 3D coordinates to store the source time wavelet, the source points, and the modeled grid with source grid points at arbitrary locations into a memory resource on the central processing unit;

retrieving, by the central processing unit, the modeling grid from the memory resource;

extracting, by the central processing unit, pressure wavefield variables having arbitrary source grid point positions from the retrieved modeling grid;

Storing the extracted pressure wavefield variable with arbitrary source grid point positions into a memory resource on the central processing unit;

calculating, by the central processing unit, valid source grid point locations of the stored pressure wave field variable having arbitrary grid point locations;

storing the calculated effective source grid point positions of the pressure wave field variables into a memory resource on the central processing unit;

copying the valid source grid point locations of the stored pressure wavefield variables from memory resources on the central processing unit to off-chip global memory resources located on a graphics processing unit using a CUDA copying application programming interface;

mapping off-chip global memory resources of the graphics processing unit into texture memory resources of the graphics processing unit;

generating, by the graphics processing unit, an initial time step value;

storing the generated initial time step value into off-chip global memory resources of the graphics processing unit;

inputting a user-defined maximum time step value into the graphics processing unit;

storing the input user-defined maximum time step value into off-chip global memory resources of the graphics processing unit;

Retrieving the initial time step value and the user-defined maximum time step value from off-chip global memory resources of the graphics processing unit;

calculating, by the graphics processing unit, a wave propagation algorithm using the copied valid source grid point locations of the pressure wavefield variable at each input time step value between the retrieved initial time step value and the user-defined maximum time step value;

repeating the step of calculating a wave propagation algorithm by the graphics processing unit until the last entered time step value equals the user defined maximum time step value;

loading, by the graphics processing unit, a new grid of new source grid points with pressure wavefield variables from each calculated wave propagation algorithm at the effective locations;

copying a new grid of new source grid points having pressure wavefield variables from each computed wave propagation algorithm at an active location from the graphics processing unit into the central processing unit using a CUDA copying application programming interface; and

the copied new grid of new source grid points with pressure wavefield variables from each calculated wave propagation algorithm's active location is stored into the central processing unit's memory resource.

2. The computer-implemented method of claim 1, wherein the step of using the acoustic velocity model in 3D coordinates to store the source time wavelet, the source points, and the modeled grid with source grid points at arbitrary locations into memory resources on the central processing unit further comprises the source wavelet represented in the time domain by expression s (t).

3. The computer-implemented method of claim 1, wherein the step of using the acoustic velocity model in 3D coordinates to store the source time wavelet, the source points, and the modeled grid with source grid points at arbitrary locations into a memory resource on the central processing unit further comprises individual source points represented by:

wherein,,

is the position of the seismic source (x) _s ，y _s ，z _s ) Is the source coordinates.

4. The computer implemented method of claim 1, wherein the step of using the acoustic velocity model in 3D coordinates to store the source time wavelet, the source point, and the modeled grid with source grid points at arbitrary locations in memory resources on the central processing unit further comprises the step of using the expression V _P A model of the acoustic velocity represented.

5. The computer-implemented method of claim 1, wherein the step of calculating, by the central processing unit, valid source grid positions for the stored pressure wavefield variable having arbitrary grid positions further comprises calculating a maximum of 512 grid positions for each off-grid source.

6. The computer implemented method of claim 1, wherein the step of inputting a user-defined maximum time step value into the graphics processing unit comprises inputting a minimum of a stability requirement, an input data sampling interval, and an imaging condition interval, wherein the stability requirement is in a range of 0.1ms to 2ms, the input data sampling interval is in a range of 1ms to 4ms, and the imaging condition interval is in a range of 2ms to 8 ms.

7. A computer-implemented method for high-speed retrieval of receiver wavefield locations using finite-difference models at arbitrary locations on a grid, the method comprising:

using the acoustic velocity model in 3D coordinates to store the source time wavelet, receiver locations, and modeled grid with receiver grid points at arbitrary locations into memory resources on the central processing unit;

extracting, by the central processing unit, receiver pressure wavefield variables having arbitrary receiver grid point positions from the retrieved modeling grid;

storing the extracted receiver pressure wavefield variable with arbitrary receiver grid point positions into a memory resource on the central processing unit;

Calculating, by the central processing unit, valid receiver grid point positions of the stored receiver pressure wave field variable having arbitrary grid point positions;

storing the calculated valid receiver grid point locations of the receiver pressure wavefield variable into a memory resource on the central processing unit;

copying the valid receiver grid point locations of the stored receiver pressure wave field variables from the memory resources on the central processing unit into off-chip global memory resources located on the graphics processing unit using a CUDA copying application programming interface;

generating, by the graphics processing unit, an initial time step value;

Calculating, by the graphics processing unit, a wave propagation algorithm using the copied valid receiver grid point locations of the receiver pressure wave field variable at each input time step value between the retrieved initial time step value and the user-defined maximum time step value;

loading, by the graphics processing unit, a new grid of new receiver grid points with receiver pressure wave field variables from each calculated wave propagation algorithm at the effective location;

copying a new grid of new receiver grid points having receiver pressure wave field variables from each calculated wave propagation algorithm at an active location from the graphics processing unit into the central processing unit using a CUDA copying application programming interface; and

the copied new grid of new receiver grid points with receiver pressure wave field variables from the effective locations of each calculated wave propagation algorithm is stored into the memory resources of the central processing unit.

8. The computer implemented method of claim 7, wherein the receiver locations and receiver grid points at arbitrary locations are determined using a sonic velocity model in 3D coordinatesThe step of storing the modeled grid of (c) in a memory resource on the central processing unit further comprises storing the modeled grid of (c) in a memory resource on the central processing unit by the expression V _P A model of the acoustic velocity represented.

9. The computer implemented method of claim 7, wherein the step of calculating, by the central processing unit, valid receiver grid point positions for the stored receiver pressure wave field variable having arbitrary grid point positions further comprises calculating a maximum of 512 grid point positions for each grid point external receiver.

10. The computer implemented method of claim 7, wherein the step of inputting a user-defined maximum time step value into the graphics processing unit includes inputting a minimum of a stability requirement, an input data sampling interval, and an imaging condition interval, wherein the stability requirement is in a range of 0.1ms to 2ms, the input data sampling interval is in a range of 1ms to 4ms, and the imaging condition interval is in a range of 2ms to 8 ms.