CN113658579B

CN113658579B - Audio signal processing method, device, electronic equipment and readable storage medium

Info

Publication number: CN113658579B
Application number: CN202111112448.9A
Authority: CN
Inventors: 张娟
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2024-01-30
Anticipated expiration: 2041-09-18
Also published as: CN113658579A

Abstract

The application provides an audio signal processing method, an audio signal processing device, electronic equipment and a readable storage medium, and relates to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining a first audio signal and a second audio signal which are obtained by audio acquisition of two microphones respectively, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction; and obtaining a target audio signal through superposition according to the first audio signal and the second audio signal. Thus, the target audio signal with higher signal-to-noise ratio can be obtained, and the quality of the audio signal is improved.

Description

Audio signal processing method, device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an audio signal processing method, an audio signal processing device, an electronic device, and a readable storage medium.

Background

When the microphone performs audio collection in an environment where noise exists, not only effective signals but also noise can be collected. The presence of noise can affect the quality of the speech signal and reduce the signal-to-noise ratio of the speech signal. Therefore, how to improve the signal-to-noise ratio of the voice signal is a technical problem that is needed and solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method, an audio signal processing device, electronic equipment and a readable storage medium, which can acquire two paths of audio signals based on two microphones with parallel axes and the same direction, and obtain a target audio signal with higher signal-to-noise ratio through superposition, so that the quality of a voice signal is improved.

Embodiments of the present application may be implemented as follows:

in a first aspect, an embodiment of the present application provides an audio signal processing method, including:

the method comprises the steps of obtaining a first audio signal and a second audio signal which are obtained by audio acquisition of two microphones respectively, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;

and according to the first audio signal and the second audio signal, obtaining a target audio signal through superposition.

In a second aspect, an embodiment of the present application provides an audio signal processing apparatus, including:

the signal acquisition module is used for acquiring a first audio signal and a second audio signal which are acquired by respective audio frequency of the double microphones, wherein the double microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;

and the processing module is used for obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions executable by the processor, where the processor may execute the machine executable instructions to implement the audio signal processing method according to the foregoing embodiment.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the audio signal processing method according to the previous embodiments.

According to the audio signal processing method, the audio signal processing device, the electronic equipment and the readable storage medium, target audio signals are obtained through superposition of first audio signals and second audio signals which are obtained by the dual microphones with parallel axes and the same directions. Therefore, the signal to noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic block diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a flow chart of an audio signal processing method according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating the sub-steps included in step S200 in FIG. 2;

FIG. 4 is a flow chart of aligning two audio signals;

FIG. 5 is a flow chart illustrating the sub-steps included in step S220 in FIG. 3;

FIG. 6 is a flow chart of the sub-steps included in step S223 of FIG. 5;

fig. 7 is a block schematic diagram of an audio signal processing apparatus according to an embodiment of the present application.

Icon: 100-an electronic device; 110-memory; a 120-processor; 130-a communication unit; 200-an audio signal processing device; 210-a signal acquisition module; 220-a processing module.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application area of the dual microphone has two directions. One of the directions is an active noise reduction direction, for example, the active noise reduction method can be applied to a mobile phone, and the gain or the sensitivity of two microphones in the dual microphones are required to be different. The other direction is the sound source localization or speech enhancement direction. Speech enhancement refers to a technique for extracting useful speech signals from noise background, suppressing and reducing noise interference when speech signals are disturbed or even submerged by various kinds of noise. That is, speech enhancement is used to extract as clean original speech as possible from noisy speech. The voice enhancement is mapped to the actual application of the microphone, which corresponds to an increase in pickup distance.

The current method for voice enhancement by using double microphones is as follows: noise is suppressed from a speech signal containing noise, resulting in a more pure and clear effective signal. For example, noise is suppressed by performing a differential operation on two paths of noisy speech signals obtained based on a dual microphone. In this way, the directions of the two microphones are different. For example, the microphone at the back of the upper end and the microphone at the back of the lower end of the mobile phone are respectively provided with a microphone, the microphone at the back of the upper end is mainly used for collecting noise, the microphone at the lower end is used for collecting human voice and noise, and the noise can be restrained by performing differential operation on the two paths of signals. From this, it can be seen that the main direction of research in this way is how speech enhancement is performed by noise reduction algorithms.

At present, the related technology generally uses two paths of data acquired by a double microphone to perform noise reduction by methods such as differential operation, namely, the algorithm is mainly used for noise reduction. However, based on the dual microphones, the voice quality is improved only by reducing noise through a signal difference method, so that the direction and thought are single, and the advantages of the dual microphones cannot be fully exerted.

Based on this, the embodiment of the application provides an audio signal processing method, an audio signal processing device, an electronic device and a readable storage medium, which can fully utilize the advantages of two microphones, improve the signal-to-noise ratio and improve the quality of voice signals.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a block diagram of an electronic device 100 according to an embodiment of the disclosure. The electronic device 100 may be, but is not limited to, a smart phone, a computer, a server, etc. The electronic device 100 may include a memory 110, a processor 120, and a communication unit 130. The memory 110, the processor 120, and the communication unit 130 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

Wherein the memory 110 is used for storing programs or data. The Memory 110 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the memory 110 stores therein an audio signal processing device 200, and the audio signal processing device 200 includes at least one software functional module that may be stored in the memory 110 in the form of software or firmware (firmware). The processor 120 executes various functional applications and data processing, i.e., implements the audio signal processing method in the embodiments of the present application, by running software programs and modules stored in the memory 110, such as the audio signal processing device 200 in the embodiments of the present application.

The communication unit 130 is configured to establish a communication connection between the electronic device 100 and other communication terminals through a network, and is configured to transmit and receive data through the network.

It should be understood that the structure shown in fig. 1 is merely a schematic diagram of the structure of the electronic device 100, and that the electronic device 100 may further include more or fewer components than those shown in fig. 1, or have a different configuration than that shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart of an audio signal processing method according to an embodiment of the present application. The method is applicable to the electronic device 100 described above. The specific flow of the audio signal processing method is described in detail below. In this embodiment, the method may include step S100 and step S200.

Step S100, a first audio signal and a second audio signal obtained by audio acquisition of the two microphones are obtained.

In this embodiment, the audio acquisition device may perform audio acquisition, so as to obtain the first audio signal and the second audio signal. The audio acquisition equipment is a double-microphone device and comprises a first microphone and a second microphone.

The first microphone and the second microphone can acquire the audio frequency of the same sound source (namely the same pickup target) at the same time, so that the first audio frequency signal is obtained by using the first microphone, and the second audio frequency signal is obtained by using the second audio frequency signal. It will be understood, of course, that the sound source is to be within a reasonable range of the audio acquisition device, and cannot be at infinity or in extreme cases such as a small volume inefficiency, so that a target audio signal with a clear effective signal (i.e., speech signal) can be obtained later. For example, the pickup distance of one microphone of the audio collection device is 5m, and the distance between the audio collection device and the sound source is 8m when the audio collection device is used for audio collection.

The microphone is directional. After the first microphone and the second microphone are fixed, the axes of the fixed positions of the first microphone and the second microphone are parallel, and the directions of the fixed positions of the first microphone and the second microphone are consistent, so that a target audio signal can be obtained later. Optionally, a distance between the first microphone and the second microphone is within 10-20 cm. It will be understood, of course, that the distance between the two microphones may be set in connection with the specific situation, as long as it is ensured that a target audio signal with a clear effective signal is subsequently obtained.

As an alternative embodiment, the first microphone and the second microphone may be microphones with the same parameters as those related to acquisition gain and the like. Alternatively, the first microphone and the second microphone may be the same two microphones.

The audio collection device and the electronic device 100 may be the same device or different devices, and may be determined according to practical situations. In the case that the audio capturing device is not the same device as the electronic device 100, the audio electronic device may send the first audio signal and the second audio signal to the electronic device 100 after obtaining the first audio signal and the second audio signal by using the dual microphones, so that the electronic device 100 obtains the first audio signal and the second audio signal.

Step S200, obtaining a target audio signal by superposition according to the first audio signal and the second audio signal.

In this embodiment, since the first audio signal and the second audio signal are signals obtained by simultaneously performing audio acquisition on the same sound source by using the dual microphones, the first audio signal and the second audio signal have a strong correlation before each other, that is, the first audio signal and the second audio signal have the same human voice and have a strong correlation. And aiming at the first audio signal and the second audio signal, the target audio signal with improved signal-to-noise ratio can be obtained through superposition. For example, the first audio signal and the second audio signal are directly overlapped, so that the human voice in the two paths of audio signals is overlapped; or preprocessing the first audio signal and the second audio signal, and then superposing the first audio signal and the second audio signal, so that the human voice in the two paths of audio signals is superposed. Therefore, the energy of the effective signal can be enhanced by superposing the human voice in the two paths of audio signals of the double microphones, so that the signal-to-noise ratio of the audio signal is effectively improved from the angle of enhancing the energy of the effective signal, and the aim of improving the quality of the audio signal is fulfilled. Compared with the conventional mode of improving the signal-to-noise ratio from the noise reduction angle, the mode has the advantages of novelty and high efficiency.

Optionally, in this embodiment, the first microphone and the second microphone use the same sampling frequency for audio capturing, so that a subsequent superposition effect is ensured.

The first sampling frequency used by the first microphone and the second microphone may be a preset sampling frequency. The first sampling frequency may be set according to an actual audio playing frequency. The first sampling frequency may be greater than or equal to the actual audio playback frequency.

For example, if downsampling is not performed later, the first sampling frequency may be set to the actual audio playback frequency. For example, if the actual audio playing frequency is 8000HZ, the first sampling frequency may be set to 8000HZ.

If downsampling is to be performed subsequently, the first sampling frequency may be set to be greater than the actual audio playback frequency, so that after downsampling, the frequency of the signal can meet the actual audio playback frequency requirement.

Alternatively, as an alternative embodiment, in the case of obtaining the first audio signal and the second audio signal, the first audio signal and the second audio signal may be directly superimposed, and the target audio signal may be determined based on the result of the superimposition. In this way, the target audio signal can be obtained quickly. Wherein, the superposition result can be expressed as: x (T) +Y (T), wherein T ε [0, T-1], T represents the total number of sampling points in the first audio signal and the second audio signal, and X (T) represents the signal strength of the T-th sampling point in the first audio signal, namely, represents the first audio signal; y (t) represents the signal strength of the t-th sampling point in the second audio signal, i.e., represents the second audio signal.

Alternatively, the result of the superposition may be directly taken as the target audio signal. Alternatively, the obtained superposition result may be multiplied by a preset weight, and the obtained product may be used as the target audio signal. Thus, data anomalies after superposition can be avoided. The preset weight may be less than 1, and the specific value may be set in combination with the actual application situation.

Because the sound source is fixed in position, but the positions of the first microphone and the second microphone are different, the distances, angles and the like between the sound source and the two microphones are likely to be different, and further the time and the phase when the two microphones acquire the same signal are also different. Thus, the data may be aligned and then superimposed.

Alternatively, as another alternative embodiment, the target audio signal may be obtained by means of fig. 3. Referring to fig. 3, fig. 3 is a flow chart illustrating the sub-steps included in step S200 in fig. 2. In this embodiment, the step S200 may include a sub-step S210 and a sub-step S220.

In the substep S210, the alignment processing is performed on the first audio signal and the second audio signal, so as to obtain the first audio signal and the second audio signal after the alignment processing.

And sub-step S220, superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

Optionally, in a first optional implementation manner, any alignment manner may be directly adopted to directly align the first audio signal and the second audio signal, and the obtained result is used as the first audio signal and the second audio signal after the alignment processing. And then the aligned first audio signal and the aligned second audio signal are overlapped to obtain the target audio signal. Thereby, it is convenient to ensure the quality of the target audio signal.

Usually, when two audio signals are aligned, the problem is solved by converting the time domain into the frequency domain, then further analyzing according to the frequency domain characteristics, and finally converting into the time domain. However, this approach is computationally intensive. To reduce the amount of computation, the alignment may alternatively be performed directly using the time domain signals in the manner shown in fig. 4.

Referring to fig. 4, fig. 4 is a flow chart of aligning two audio signals. The alignment may include sub-step S2201 to sub-step S2203.

Sub-step S2201, calculating in the time domain a cross-correlation function of the two audio signals to be aligned.

Sub-step S2202, determining the maximum value of the cross-correlation function, and taking the shift value corresponding to the maximum value as the target offset.

Substep S2203 aligns the two audio signals to be aligned according to the target offset.

In this embodiment, since the first audio signal and the second audio signal are directly aligned, the two audio signals to be aligned are the first audio signal a and the second audio signal B. The cross-correlation function of the first audio signal a and the second audio signal B may be calculated directly in the time domain.

Because the first audio signal A and the second audio signal B are discrete audio data, the cross-correlation function of the first audio signal A and the second audio signal B can be calculated according to a cross-correlation function formula of the discrete signals:

wherein R1 (n) represents a cross-correlation function sequence of the first audio signal A and the second audio signal B, and n is a sequence subscript; n1 represents the same number of sampling points in the first audio signal A and the second audio signal B, m represents the subscript of the sampling point of the first audio signal A, and the value range is 0-N1-1; n represents a shift value.

A value MAX that maximizes the cross correlation function R1 (n) may be found, and the x value of R1 (x) in the sequence of R1 (n) corresponding to MAX may be used as the target offset. Then, the first audio signal a and the second audio signal B may be aligned according to the target offset x. Thereafter, the target audio signal may be obtained by stacking from the sampling points a (0) and B (x) in alignment.

Optionally, the target audio signal may be obtained according to a superposition formula, the aligned first audio signal and the aligned second audio signal, where the superposition formula is: g (m) =β [ E (m) +f (m+x) ], wherein G (m) represents the target audio signal; beta represents a preset coefficient, namely an attenuation factor; e (m), F (m+x) denote the first audio signal and the second audio signal after the alignment processing, and x denotes the target offset.

By the above superposition formula, the target audio signal can be obtained: beta [ A (m) +B (m+x) ], wherein beta represents an attenuation factor, and the specific value is determined by practical conditions. Beta may be less than 1.A (m), B (m+x) denote the first audio signal and the second audio signal after the alignment processing.

Alternatively, in a second alternative embodiment, the first audio signal and the second audio signal may be first subjected to noise reduction processing, then the noise-reduced signals are aligned, and the obtained result is used as the first audio signal and the second audio signal after the alignment processing. And then the aligned first audio signal and the aligned second audio signal are overlapped to obtain the target audio signal. The specific manner of the noise reduction process may be set according to actual requirements, and is not specifically limited herein. The superposition can be performed according to the superposition formula described above. Therefore, the alignment effect can be ensured, and the definition of effective signals in the subsequent target audio signals is further ensured.

Optionally, any single-microphone noise reduction algorithm may be directly adopted to process the first audio signal and the second audio signal respectively, and the noise-reduced first audio signal and the noise-reduced second audio signal are used as two paths of audio signals to be aligned. Then, the first audio signal and the second audio signal after noise reduction can be aligned in the manner shown in fig. 4, so as to obtain the first audio signal and the second audio signal after the alignment processing, and further obtain the target audio signal through superposition. The single microphone noise reduction algorithm may be SPEEX, WEBRTC, or the like.

Optionally, the first sampling frequency is greater than the actual audio playing frequency, and noise reduction and alignment can be performed in the manner shown in fig. 5 under the condition that the playing frequency can be ensured. Referring to fig. 5, fig. 5 is a flow chart illustrating the sub-steps included in step S220 in fig. 3. Sub-step S220 may include sub-steps S221-S223.

In a substep S221, the first audio signal is downsampled to obtain a third audio signal.

In the substep S222, the second audio signal is downsampled, so as to obtain a fourth audio signal.

In this embodiment, the first audio signal and the second audio signal may be downsampled by using a second sampling frequency, so as to obtain the third audio signal and the fourth audio signal. That is, the sampling frequency used when obtaining the third audio signal and the fourth audio signal is the second sampling frequency. The second sampling frequency is smaller than the first sampling frequency when the double microphones collect audio, and the specific value can be set according to actual requirements. Thus, through the thought of 'high-exploitation low', noise carried in the audio signal acquired by the double microphones is reduced, and meanwhile, the calculated amount can be reduced.

For example, the first sampling frequency may be 48000HZ and the second sampling frequency may be 8000HZ. The noise carried by 1/48000 is less than the noise carried by 1/8000.

In step S223, the third audio signal and the fourth audio signal are aligned to obtain the aligned first audio signal and the aligned second audio signal.

Alternatively, the third audio signal and the fourth audio signal may be aligned directly in any manner, and the obtained result may be used as the first audio signal and the second audio signal after the alignment processing. Alternatively, the third audio signal and the fourth audio signal may be used as two audio signals to be aligned in the alignment manner shown in fig. 4, and the third audio signal and the fourth audio signal may be aligned by using the method shown in fig. 4, and then the obtained result may be used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping to obtain the target audio signal. Wherein, superposition can be performed according to the superposition formula.

Optionally, to further reduce the influence of noise, further noise reduction may be performed as shown in fig. 6, so as to obtain the target audio signal. Referring to fig. 6, fig. 6 is a flow chart illustrating the sub-steps included in step S223 in fig. 5. Sub-step S223 may include sub-step S2231 and sub-step S2232.

And step S2231, performing noise reduction processing on the third audio signal and the fourth audio signal, so as to obtain a fifth audio signal and a sixth audio signal.

In the substep S2232, the fifth audio signal and the sixth audio signal are aligned, and the aligned fifth audio signal and sixth audio signal are used as the aligned first audio signal and second audio signal.

Any single-microphone noise reduction algorithm may be adopted to respectively reduce noise of the third audio signal and the fourth audio signal, so as to obtain a fifth audio signal and a sixth audio signal. The single microphone noise reduction algorithm may be SPEEX, WEBRTC, or the like. Then the fifth audio signal and the sixth audio signal can be used as two paths of audio signals to be aligned in the manner shown in fig. 4, then the fifth audio signal and the sixth audio signal are aligned in the manner shown in fig. 4, and the obtained results are used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping to obtain the target audio signal. This process can be expressed as: g (m) =β [ E (m) +f (m+x) ], wherein G (m) represents the target audio signal; beta represents a preset coefficient; e (m), F (m+x) denote the first audio signal and the second audio signal after the alignment processing, that is, the fifth audio signal and the sixth audio signal after the alignment; x represents the target offset. E (m) represents the signal strength of the mth sample point in the fifth audio signal, and F (m+x) represents the signal strength of the mth+x sample point in the sixth audio signal.

In this embodiment, the first-stage noise reduction is performed by the "high-low-use" method, so that the second-stage noise reduction can achieve a better effect by using a general single-microphone noise reduction algorithm. Noise affecting the accuracy of the calculation result of alignment by directly using the time domain signal can be removed by performing noise reduction twice, and meanwhile, the sampling frequency is reduced, so that the accuracy of the calculation result is ensured, the calculation amount is reduced, and a higher efficient effect is achieved.

And, the signal after the second-level noise reduction and alignment is overlapped, so that the energy of the same signal can be maximized. The noise is random, misaligned, and attenuated or slightly enhanced after superposition, so that the overall signal-to-noise ratio of the resulting target audio signal is instead improved compared to the original signal-to-noise ratios of the first and second audio signals.

Therefore, the embodiment of the application uses the cross correlation of the two paths of audio data of the double microphones to align the data and then superimpose the data, thereby enhancing the energy of effective signals and improving the signal to noise ratio.

After the target audio signal is obtained, the target audio signal can be played under the condition of need, so that the played sound source information content can be better heard. By utilizing the embodiment of the application to collect voice, the voice quality is higher, the voice information distance which can be heard is farther, and the pickup distance of the double microphones can be increased.

The embodiment of the application fully utilizes the correlation between two paths of data of the double microphones, effectively improves the signal-to-noise ratio of the voice signal from the angle of enhancing the effective signal energy, and has the advantages of innovation and high efficiency compared with the prior art from the angle of only reducing noise. In the data calculation process, the algorithm simplification degree directly influences the calculation efficiency and the accuracy of the real-time result. Compared with the common method for analyzing and processing by converting the time domain signals into the frequency domain signals, the method for aligning the signals directly in the time domain has the advantages that the complexity and the calculated amount of the algorithm are greatly reduced by calculating only in the time domain, and the efficiency of the algorithm is improved.

In addition, before alignment, the embodiment of the application reduces noise in multiple stages in advance, so that the algorithm is clear in thought, and the calculation accuracy of the correlation is guaranteed. The multi-level noise reduction algorithm used in the embodiments of the present application is an optimization algorithm performed on the basis of a conventional noise reduction algorithm. The idea of "high sampling and low sampling" is to increase the sampling frequency and then downsample the sample, so that the noise carried by the obtained sampling point is much smaller than that of the direct original frequency sampling. The method is simple and convenient, and does not introduce other adverse factors. And then, the conventional noise reduction algorithm for a single microphone is directly used for reducing noise of two paths of microphone data, so that a good noise reduction effect can be achieved, the noise reduction algorithm is convenient to use and high in stability, and the accuracy of subsequent data processing is better ensured.

The above-described audio signal processing method is exemplified below.

The axes of the fixed positions of the two microphones in the audio acquisition equipment are parallel and the directions are consistent. The distance between the two microphones is in the range of 10cm-20 cm.

And playing a section of sound source audio signal at a certain relative position of the audio acquisition device as a pickup target. The sound source is required to be within a reasonable range which can be acquired by the audio acquisition equipment, and the sound source cannot be far from infinity or has infinite small volume. Such as a microphone, at a pick-up distance of 5m, the sound source may be at a distance of 8m from the audio collection device.

And starting two microphones of the audio acquisition equipment to acquire audio, so as to obtain a first audio signal A and a second audio signal B. The first sampling frequency used at this time is larger than the sampling frequency actually needed. Assuming that the sampling frequency actually required to be used is 8000HZ, the first sampling frequency may be set to 48000HZ.

And downsampling the first audio signal A and the second audio signal B to obtain a third audio signal C and a fourth audio signal D.

And respectively performing secondary noise reduction on the third audio signal C and the fourth audio signal D by using a noise reduction algorithm to obtain a fifth audio signal E and a sixth audio signal F. The noise reduction algorithm used here may be a noise reduction algorithm commonly used in a single microphone.

For the fifth audio signal E and the sixth audio signal F, the offset-offset number of sampling points of the signals of the fifth audio signal E and the sixth audio signal F is calculated. The fifth audio signal E and the sixth audio signal F are two paths of discrete audio data, and the cross-correlation function of the two signals is calculated according to a cross-correlation function formula of the discrete signals. The calculation process is as follows:wherein, R (N) represents the cross-correlation function sequence of the fifth audio signal E and the sixth audio signal F, N represents the sequence index, N represents the number of the related sampling points in the fifth audio signal E and the sixth audio signal F, m represents the sampling point index of the fifth audio signal E, and m is 0-N-1. Finding the value MAX that maximizes the cross correlation function R (n), assuming R (x) in the R sequence corresponding to MAX, x is the target offset of E, F two-way data sample points. The fifth audio signal E and the sixth audio signal F can be aligned by the target offset x.

After alignment, superposition may be performed to obtain the target audio signal. The alignment and overlay process can be expressed as: g (m) =β [ E (m) +f (m+x) ], where G (m) represents superimposed data, E (m), F (m+x) represent aligned fifth audio signal E and sixth audio signal F, that is, aligned first audio signal and second audio signal, and β represents an attenuation factor.

Thus, a target audio signal with high speech quality can be obtained.

In order to perform the corresponding steps in the above embodiments and the various possible ways, an implementation of the audio signal processing apparatus 200 is given below, and alternatively, the audio signal processing apparatus 200 may employ the device structure of the electronic device 100 shown in fig. 1 and described above. Further, referring to fig. 7, fig. 7 is a block diagram of an audio signal processing apparatus 200 according to an embodiment of the present application. It should be noted that, the basic principle and the technical effects of the audio signal processing apparatus 200 provided in this embodiment are the same as those of the above embodiment, and for brevity, reference should be made to the corresponding contents of the above embodiment. The audio signal processing apparatus 200 may include: the signal obtaining module 210 and the processing module 220.

The signal obtaining module 210 is configured to obtain a first audio signal and a second audio signal obtained by audio capturing by the two microphones respectively. The dual microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction.

The processing module 220 is configured to obtain a target audio signal by superimposing according to the first audio signal and the second audio signal.

Optionally, in this embodiment, the processing module 220 is specifically configured to: performing alignment processing on the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal after the alignment processing; and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

Optionally, in this embodiment, the manner in which the processing module 220 implements alignment includes: calculating in the time domain to obtain the cross-correlation function of two paths of audio signals to be aligned; determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset; and aligning the two paths of audio signals to be aligned according to the target offset.

Optionally, in this embodiment, the processing module 220 is specifically configured to: obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:

G(m)＝β[E(m)+F(m+x)]

wherein G (m) represents the target audio signal, β represents a preset coefficient, E (m), F (m+x) represent the first audio signal and the second audio signal after alignment processing, and x represents a target offset.

Optionally, in this embodiment, the processing module 220 is specifically configured to: downsampling the first audio signal to obtain a third audio signal; downsampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used in obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used in audio acquisition by the double microphones; and aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.

Optionally, in this embodiment, the processing module 220 is specifically configured to: respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal; aligning the fifth audio signal and the sixth audio signal, and taking the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.

Alternatively, the above modules may be stored in the memory 110 shown in fig. 1 or solidified in an Operating System (OS) of the electronic device 100 in the form of software or Firmware (Firmware), and may be executed by the processor 120 in fig. 1. Meanwhile, data, codes of programs, and the like, which are required to execute the above-described modules, may be stored in the memory 110.

The embodiment of the application also provides a readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the audio signal processing method.

In summary, the embodiments of the present application provide an audio signal processing method, an apparatus, an electronic device, and a readable storage medium, where a target audio signal is obtained by superimposing a first audio signal and a second audio signal obtained by two microphones with parallel axes and the same direction. Therefore, the signal to noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely an alternative embodiment of the present application and is not intended to limit the present application, and various modifications and variations may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. An audio signal processing method, comprising:

according to the first audio signal and the second audio signal, a target audio signal is obtained through superposition;

wherein, according to the first audio signal and the second audio signal, the target audio signal is obtained by superposition, including:

downsampling the first audio signal to obtain a third audio signal;

downsampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used in obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used in audio acquisition by the double microphones;

respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal;

and based on the fifth audio signal and the sixth audio signal, obtaining the target audio signal through alignment and superposition, wherein the alignment is realized through cross-correlation calculation in a time domain.

2. The method of claim 1, wherein the obtaining the target audio signal by superposition from the first audio signal and the second audio signal comprises:

performing alignment processing on the first audio signal and the second audio signal to obtain an aligned first audio signal and an aligned second audio signal, wherein the aligned first audio signal and the aligned second audio signal are an aligned fifth audio signal and an aligned sixth audio signal;

and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

3. The method of claim 2, wherein the performing an alignment process comprises:

calculating in the time domain to obtain the cross-correlation function of two paths of audio signals to be aligned;

determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset;

and aligning the two paths of audio signals to be aligned according to the target offset.

4. A method according to claim 2 or 3, wherein the superimposing the aligned first audio signal and second audio signal to obtain the target audio signal comprises:

obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:

G(m)＝β[E(m)+F(m+x)]

5. An audio signal processing apparatus, comprising:

the processing module is used for obtaining a target audio signal through superposition according to the first audio signal and the second audio signal;

the processing module is specifically configured to:

downsampling the first audio signal to obtain a third audio signal;

6. The apparatus of claim 5, wherein the processing module is specifically configured to:

7. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the audio signal processing method of any of claims 1-4.

8. A readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the audio signal processing method according to any one of claims 1-4.