CN116309921A

CN116309921A - Delay summation acoustic imaging parallel acceleration method based on CUDA technology

Info

Publication number: CN116309921A
Application number: CN202310446933.2A
Authority: CN
Inventors: 涂晓彤; 马晨雨; 伍健雄; 张勇华; 丁兴号; 黄悦
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-06-23

Abstract

A delay solving and acoustic imaging parallel accelerating method based on CUDA technology belongs to the field of acoustic signal processing and the field of neural networks and is used for solving the problems of high computational complexity and the like of a traditional delay summation algorithm. The method comprises the following steps: 1) Assuming that the steering vectors are spatially arranged, regarding the steering vectors as a feature map as a whole; 2) Sequentially carrying out matrix operation on each bar vector and a cross spectrum matrix in the feature map, and keeping the number of channels unchanged after operation; 3) And multiplying the result of the operation of the sum cross spectrum matrix with the original steering vector to obtain the power of the final bar-shaped feature vector. The large-dimension matrix of the cross spectrum matrix and the steering vector is designed into a convolution network, and the result can be improved by nearly 100 times compared with the operation time of the traditional delay and sum algorithm by utilizing GPU parallel acceleration operation, so that the real-time imaging requirement in industrial application is met.

Description

Delay summation acoustic imaging parallel acceleration method based on CUDA technology

Technical Field

The invention belongs to the field of acoustic imaging and the field of deep neural networks, and particularly relates to a delay solving and acoustic imaging parallel acceleration method based on a CUDA (compute unified device architecture) technology.

Background

In daily life, the human ear can hear various sounds and perform recognition and positioning, which is commonly called as "hearing sound differentiation". Specifically, when a person makes a sound, the ear can easily know what direction the person making the sound is in; the driver can easily judge the coming direction of a car which is driven by the driver, and even can know how far the car is. However, the original sound localization function of the human ear is only for solving the living and survival problems, and the localization accuracy is very limited. In order to more accurately receive and locate sound signals, beam imaging techniques based on microphone arrays have begun to be focused, and acoustic imaging techniques that make "sound visible" have been widely used.

Acoustic imaging, also called acoustic camera, refers to collecting multi-channel audio data with a microphone array, calculating sound source distribution information on a scan plane using a beam forming algorithm at a specified frequency, then fusing the sound source distribution information with a live-action image, and quickly determining the spatial position and generation source of the sound source on the intuitive fused image (overlay Lylloff, efrenfern-andez-Grande, finn Agerkvist,

Hald,Elisabet Tiana Roig,and Martin S Andersen.2015.Improving the efficiency of deconvolution algorithms for sound source localization.The journal of the acoustical society of America 138,1(2015),172–180)。

acoustic imaging techniques, which convert acoustic information into image signals to visualize the distribution of spatial sound sources, are now widely used in various fields of transportation, noise detection, and industrial anomaly detection (Roberto Merino-Mart i nez, pieter sijtma, mirjam snellelen, thomas Ahlefeldt, jerome Antoni, christopher J Bahr, daniel blaacodon, daniel Ernst, arthur Finez, stefan Funke, et al 2019.a review of acoustic imaging methods using phased microphone arrays.ceas Aeronautical Journal, 1 (2019), 197-230). Acoustic imaging technology is utilized by Boeing and NASA to detect and locate the noise of a body or an injector in various acoustic wind tunnel tests and the technology is quite mature; the beam forming method is characterized in that a plurality of acoustic imaging algorithms are integrated inside a beam forming system developed by the OptiNav company and are uniformly named as OptiNav BF beam forming; in addition, the company also develops a leakage detector-compressed air leakage detection and identification system, and can clearly provide leakage source identification images and videos.

The Delay And Sum algorithm (DAS) (Don H Johnson And Dan edudyon.1992. Array signal processing: concepts And technologies. Simon & Schuster, inc; barry DVan Veen And Kevin M buckley.1988.Beam shaping: A versatile approach to spatial filtering. Ieee assp magazine 5,2 (1988), 4-24) provides robust, fast And intuitive imaging results by Delay compensation And weighted summation. The delay-and-sum algorithm describes the core idea of acoustic imaging: and inverting the sound source distribution by utilizing the beam forming and the time difference and outputting an image result. However, the algorithm has the problems of large calculation amount, low calculation speed and the like, and causes the phenomena of overhigh parameter amount and redundant calculation resources (Austin Reiter and Muyinatu ALediju Bell.2017.A machine learning approach to identifying point source locations in photoacoustic data.In Photons Plus Ultrasound: imaging and Sensing 2017,Vol.10064.International Society for Optics and Photonics,100643J), so that the algorithm can not meet the requirements of daily industrial production.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a delay solving and acoustic imaging parallel accelerating method based on CUDA technology, which is used for avoiding complex models, long calculation time, convenient floor application, solving the problems of high calculation complexity and the like of the traditional delay summing algorithm, meeting the requirements of actual production application, and simultaneously converting sound information into image signals so as to visualize the distribution of space sound sources, thereby accurately positioning sound.

The invention provides a delay summation acoustic imaging parallel acceleration network based on a CUDA technology, which comprises a frequency point screening unit, a convolution unit and a grouping convolution unit;

the frequency point screening unit is used for selecting the maximum value point in the scanning frequency range, so that the point without value is not needed to be introduced in the calculation process, and unnecessary calculation time is reduced;

the convolution unit is used for taking the steering vector feature image as input, carrying out convolution operation of a convolution neural network on the cross spectrum matrix and the large-dimension matrix of the steering vector to generate a convolution data matrix, and improving operation efficiency;

the grouping convolution unit is used for dividing the whole input characteristic into N groups along the direction of the channel to carry out convolution operation when carrying out convolution processing so as to reduce operation parameters and improve operation efficiency.

Furthermore, the frequency point screening unit obtains sampling values in a scanning frequency range through sampling points and sampling resolution as input signals of the microphone channel, and selects a plurality of sampling points with the largest numerical value from the input signals to participate in subsequent matrix operation so as to improve the calculation efficiency.

The invention provides a delay summation acoustic imaging parallel acceleration method based on a CUDA technology, which comprises the following steps of:

1) Assuming that the steering vectors are spatially arranged, regarding the steering vectors as a feature map as a whole;

2) Sequentially performing matrix operation on each bar vector in the feature diagram of the step 1) serving as the input of a convolution unit and a cross spectrum matrix, wherein the number of channels is kept unchanged after the operation;

3) And (3) performing matrix operation on the matrix operation result obtained in the step (2) and the original steering vector through a grouping convolution unit to obtain the power of the final bar characteristic vector.

In the step 1), the steering vector is regarded as a feature map as a whole, and the steering vector can be regarded as a feature map of 41×41×64×21, wherein 41×41 is the number of grid points of the scanned sound plane, the size of the scanning plane is set to be [ -2,2 meters, and the scanning resolution is set to be 0.1 meter; and 64 is the array element number of the microphone array, 21 is the frequency point number with larger frequency spectrum peak value screened out from each microphone channel in the scanning frequency range, the largest 21 frequency points in the scanning frequency range are calculated by a frequency point screening unit, and the steering vector is used as the input of a characteristic diagram to participate in subsequent convolution operation.

In step 2), the convolution unit carries out convolution operation on the feature map obtained in step 1) and the cross spectrum matrix, and for each pixel on the feature map, the matrix operation steps are the same, so that the process has convolution potential, and as the cross spectrum matrix corresponding to different grid points is the same, but the steering vectors are different, the cross spectrum matrix can be regarded as 1344×64×1×1 eigenvectors, i.e., the cross spectrum matrix can be regarded as a 1×1 convolution kernel of 1344 64 channels, and after the convolution operation of the steering vector eigenvector and the cross spectrum matrix, the steering vector eigenvector and each channel of each convolution output channel are weighted, and finally the convolution layer obtains the characteristics of 1×1344×41×41.

In step 3), the feature map obtained in step 2) and steering vectors are subjected to convolution operation in sequence to obtain the power of the grid point finally, and the parallel processing of microphone multichannel signals is realized through convolution operation of a grouping convolution unit to finally obtain the sound power of all grid points on the whole scanning plane; different from the traditional DAS algorithm that different microphone signals and different frequency points are processed in a for-loop serial mode and then added, the parallel processing of the microphone multichannel signals and the frequency points can be achieved through convolution operation, the calculation time is shortened to a great extent, and the calculation efficiency is improved.

The grouping convolution unit divides the input feature matrix into 21 groups, each of whichThe input dimensions of the group are

Meanwhile, the convolution kernel is divided into 21 groups, the dimension of each group is +.>

Each then convolved separately, outputting +.>

The results of the dimensions are cascaded, and the results obtained by each group are finally formed into a final result; the grouping convolution uses smaller parameter quantity to realize the same result as the standard convolution, so that the model operation speed is improved, and meanwhile, the realization of the grouping convolution enables the multi-branch model parallel learning mode of the model on multiple GPUs to promote model training to be more efficient and the model to be more optimal.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, acoustic signal processing and convolutional neural network are combined, multichannel parallel processing is adopted, effective acoustic imaging characteristics are extracted, the processing speed of a model is improved, the model is simplified, and the redundancy of calculation is reduced.

2. The invention can reduce the running time and is convenient for the model to be put into practical industrial application.

3. The invention can accurately position the acoustic signal, has more accurate positioning effect in a poor environment, and has good robustness and applicability.

Drawings

Fig. 1 is a flow chart of analog microphone array time domain signal localization.

Fig. 2 is a diagram illustrating a convolution operation process according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments.

The embodiment of the invention comprises the following steps:

step 1, modeling a sound source: the sound waves generated by a given sound source will exhibit different phases depending on the propagation distance. Thus, each microphone in the array will "perceive" a different phase, thereby determining the sound source location.

Most standard beamforming techniques assume that the distance between the source and the observer is large enough, the microphone array diameter is small enough, and the directionality of the sound source is ignored, so that a complex sound source can be well approximated.

The propagation of sound waves in a continuous medium can be precisely expressed by the Navier-Stokes equation. However, in most daily and industrial applications, the propagation of sound waves can be considered as a linear isentropic phenomenon. Therefore, the complex Navier-Stokes equation can be significantly simplified into a (Helmholtz) wave equation, and still accurately reflect the propagation phenomenon of acoustic waves.

Wherein c ₀ Is the speed of sound,

is the Laplacian, q (t) represents a monopole source, x ₀ Dirac function delta (x-x ₀ ) Providing geometric position information of a sound source, wherein x is the position of a microphone array, and p is an acoustic signal received by the microphone array. In the absence of a sound source, the right hand side of the above equation is equal to zero.

The non-homogeneous wave equation has a free field solution (no reflection or solid boundaries):

this illustrates some important aspects of acoustic wave propagation.

1. Amplitude of sound pressure and observation distance (|x-x) ₀ I) is inversely proportional to the decay.

Observation of sound pressure at 2.t and source characteristic |x-x at the previous time ₀ |/c ₀ Related to the following. That is, the sound source information is recorded in the medium at a finite speed c ₀ Propagation. Thus, one key concept of acoustic beamforming is the delay time (t ₀ ) Formally defined as:

step 2, time domain signal processing: as shown in fig. 1, there is an array of M microphone elements, a search grid is provided on a plane on which a sound source is expected to lie, the scan plane is divided into n×n grid points, and the distance between the microphone array and the scan plane is z. The positioning process is as follows:

1. determining a plane in which the sound source may be located, dividing the plane into a (rectangular) grid;

2. each mesh node is scanned. For each node, the time signal measured by each microphone is delayed by a respective delay time (t ₀ ＝|x-x ₀ |/c ₀ )；

3. The signals of each microphone are added and divided by the number M of microphones to obtain a beamformer output map.

The mathematical expression of the beam output is:

wherein p is _m For each measured signal of the microphone, M is the number of microphones, x ₀ Is the sound source position and x is the microphone position.

Step 2.1, obtaining microphone array time domain data: the sampling point number 2400 of each microphone channel is obtained through the sampling rate and the duration time to carry out framing, the distance from the sound source point to the signal center is obtained, and the sound pressure level formula is as follows:

where amp is the signal power, scaled by the distance from the source to the center of the array to obtain a given SPL at the center, then a 2400 x 64 dimension microphone time domain signal is obtained by sampling.

Step 3, frequency domain signal processing: the frequency domain beamforming formula starts with a free-field monopole Jie Fuli leaf transform:

in the frequency domain, propagation delay (t ₀ ) Using plural numbers

Represented by, wherein P (x, x ₀ ω) is a frequency domain representation of the signal measured by each microphone and Q (ω) is a frequency domain representation of the sound source signal.

The fourier transform of the beamforming is:

wherein, g (x, x ₀ ω) is the steering vector:

fourier transform of the microphone received signal:

the beamforming Z (ω) may be represented as a matrix form:

the power of the output signal may be represented by L (x) = |z| ² And (3) calculating:

defining a Cross-spectrum Matrix (CSM):

defining a weight vector w of grid points _n ：

Conventional beamforming is located at x ₀ The expression of the grid points is:

and 3.1, carrying out Fourier transform on the acquired microphone time domain signals to obtain microphone frequency domain signals, and assuming that the dimension of each acquired time domain signal is 2400 multiplied by 64, and the sampling frequency of a microphone array is fs=240 KHz, the frequency resolution after the Fourier transform is 100Hz, if the scanning frequency range is 5KHz to 20KHz, selecting 151 points which are positioned in the scanning frequency range in 2400 sampling points, wherein most of the 151 sampling points have zero frequency spectrum peak values, adding frequency point screening, and only selecting 21 points with the largest value to participate in subsequent operation, so that frequency points without values are not calculated, unnecessary calculation time is reduced, and operation efficiency is improved.

Step 3.2, screening steering vectors of the designated frequency points according to the sound source signals, wherein the steering vectors in the step 3) are spatially arranged, and the steering vectors are integrally regarded as a feature map; the steering vector can be regarded as a feature map of 41×41×64×21, where 41×41 is the size of the scanned sound plane, 64 is the number of microphones, 21 is the number of maximum sample points screened out for each microphone channel, and the steering vector is used as an input of the feature map to participate in subsequent convolution operations.

As shown in FIG. 2, each bar vector g in the feature map ^H And cross-spectrum matrix pp ^H Sequentially performing matrix operation, wherein the number of channels is kept unchanged after the operation; for each pixel on the feature map, the matrix operation steps are the same, so that the process has convolution potential, as the cross spectrum matrixes corresponding to different grid points are the same, the cross spectrum matrixes can be regarded as 1344×64×1×1 feature vectors only if steering vectors are different, namely the cross spectrum matrixes can be regarded as 1×1 convolution kernels of 1344 64 channels, the steering vector feature map and the cross spectrum matrixes are subjected to convolution operation and then weighted with each channel of each convolution output channel, and finally the convolution layer obtains the features of 1×1344×41×41.

Multiplying the result of the sum cross spectrum matrix operation with the original steering vector g to obtain the power of the final bar-shaped feature vector, and the whole calculation process can be expressed as g ^H ×PP ^H X g; through continuous convolution operation, the microphone multichannel signal parallel processing is realized, the calculation time is shortened to a great extent, and the calculation efficiency is improved.

Step 3.3, adopting a grouping convolution mode in the convolution operation process of the steering vector and the cross spectrum matrix, wherein the grouping convolution unit divides the input characteristic matrix into 21 groups, and the input dimension of each group is that

At the same time, the convolution kernel is also divided into 21 groups, the dimension of each group being +.>

Each of which is then convolved separately to output

The dimension results are cascaded to form a final result, the grouping convolution uses smaller parameter quantity to realize the same result as the standard convolution, the network training speed is improved, and simultaneously the realization of the grouping convolution enables the model to be trained more in a multi-branch model parallel learning mode on multiple GPUsThe method is efficient and the model is more excellent.

TABLE 1 comparison of GPU and CPU run time results

Model of the place	CPU	GPU
			Run time (s/1000 times)	487.1	4.414

As can be seen from the table 1, the invention designs the cross spectrum matrix and the large dimension matrix of the steering vector into a convolution network, and the GPU parallel acceleration operation is utilized, so that the operation time of the result can be improved by nearly 100 times compared with that of the traditional delay and sum algorithm, and the real-time imaging requirement in industrial application can be met.

In the above embodiment, each unit of each step is only divided according to the functional logic and is not limited to the above division, as long as the corresponding function can be realized; in addition, the naming of the units is also only for convenience of distinction, and is not used to limit the protection scope of the present invention.

The present invention is not limited to the preferred embodiments described above, but various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these modifications are included in the scope of the present invention as defined in the appended claims.

Claims

1. The delay solving and acoustic imaging parallel accelerating network based on the CUDA technology is characterized by comprising a frequency point screening unit, a convolution unit and a grouping convolution unit;

the convolution unit is used for taking the steering vector feature image as input, and carrying out convolution operation of a convolution neural network on the cross spectrum matrix and a large-dimension matrix of the steering vector to generate a convolution data matrix so as to improve operation efficiency;

2. The delay summation acoustic imaging parallel acceleration network based on the CUDA technology as set forth in claim 1, wherein the frequency point screening unit obtains sampling values in a scanning frequency range through sampling points and sampling resolution as input signals of a microphone channel, and selects several sampling points with the largest numerical value from the input signals to participate in subsequent matrix operation so as to improve the calculation efficiency.

3. The delay summation acoustic imaging parallel acceleration method based on the CUDA technology is characterized by comprising the following steps of:

4. The delay-sum acoustic imaging parallel acceleration method of claim 3, wherein in step 1), the steering vector is regarded as a feature map as a whole, and is regarded as a feature map of 41×41×64×21, wherein 41×41 is the number of grid points of the acoustic plane to be scanned, the size of the scan plane is set to [ -2,2 meters, and the scan resolution is set to 0.1 meter; and 64 is the array element number of the microphone array, 21 is the frequency point number with larger frequency spectrum peak value screened out from each microphone channel in the scanning frequency range, the largest 21 frequency points in the scanning frequency range are calculated by a frequency point screening unit, and the steering vector is used as the input of a characteristic diagram to participate in subsequent convolution operation.

5. The parallel acceleration method of delay-sum acoustic imaging based on CUDA technology as set forth in claim 3, wherein in step 2), the convolution unit performs convolution operation on the feature map obtained in step 1) and a cross spectrum matrix, and for each pixel on the feature map, the matrix operation steps are the same, so that the process has convolution potential, and since the cross spectrum matrices corresponding to different grid points are the same, but the steering vectors are different, the cross spectrum matrix can be regarded as 1344×64×1×1 feature vectors, i.e. the cross spectrum matrix is regarded as a 1×1 convolution kernel of 1344 channels 64, the steering vector feature map is weighted with each channel of each convolution output channel after the cross spectrum matrix convolution operation, and the final convolution layer obtains the features of 1×1344×41×41.

6. The delay solving and acoustic imaging parallel accelerating method based on CUDA technology as claimed in claim 3, characterized in that in step 3), the feature map obtained in step 2) is convolved with steering vector in turn to obtain the power of the final grid point, and the parallel processing of microphone multichannel signals is realized through convolution operation of a grouping convolution unit to finally obtain the acoustic power of all grid points on the whole scanning plane; through convolution operation, parallel processing of microphone multichannel signals and each frequency point is realized, calculation time is shortened, and calculation efficiency is improved.

7. The delay-sum acoustic imaging parallel acceleration method based on the CUDA technique as set forth in claim 3, characterized in that in step 3), said group convolution unit divides the input feature matrix into 21 groups, each groupThe input dimension is

Each then convolved separately, outputting +.>

The results of the dimensions are cascaded, and the results obtained by each group are finally formed into a final result; the grouping convolution uses smaller parameter quantity to realize the same result as the standard convolution, so that the model operation speed is improved, and meanwhile, the grouping convolution enables the multi-branch model parallel learning mode of the model on the multi-GPU to promote model training to be more efficient and the model to be more optimal.