CN117912462A - Voice gain control method, device, terminal and storage medium - Google Patents

Voice gain control method, device, terminal and storage medium Download PDF

Info

Publication number
CN117912462A
CN117912462A CN202311622161.XA CN202311622161A CN117912462A CN 117912462 A CN117912462 A CN 117912462A CN 202311622161 A CN202311622161 A CN 202311622161A CN 117912462 A CN117912462 A CN 117912462A
Authority
CN
China
Prior art keywords
frequency point
signal
frequency
voice signal
gain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311622161.XA
Other languages
Chinese (zh)
Inventor
王江
张家源
崔斌
王鑫
林友钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leedarson Lighting Co Ltd
Original Assignee
Leedarson Lighting Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leedarson Lighting Co Ltd filed Critical Leedarson Lighting Co Ltd
Priority to CN202311622161.XA priority Critical patent/CN117912462A/en
Publication of CN117912462A publication Critical patent/CN117912462A/en
Pending legal-status Critical Current

Links

Landscapes

  • Control Of Amplification And Gain Control (AREA)

Abstract

The invention provides a voice gain control method, a voice gain control device, a terminal and a storage medium. The method comprises the following steps: performing Fourier transform on the original voice signal of the current frame to obtain a frequency spectrum of the original voice signal of the current frame, and determining amplitude values of all frequency points in the frequency spectrum; calculating the signal to noise ratio of each frequency point in the frequency spectrum; according to the signal-to-noise ratio and the amplitude value of each frequency point, determining the gain coefficient of the corresponding frequency point; and enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target voice signal. The method can carry out framing processing on the original voice signal, and determines the adaptive gain value based on the signal-to-noise ratio and the amplitude values of different frequency points of the original voice signal of the current frame in the frequency domain, thereby reducing the interference of noise in the gain control process, improving the voice gain control effect and the response speed.

Description

Voice gain control method, device, terminal and storage medium
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method and apparatus for controlling speech gain, a terminal, and a storage medium.
Background
In a real-time speech interaction scenario, the speech quality is disturbed by noise in the environment in which it is located. In order to improve the voice quality, the following methods are commonly used in the prior art.
First, a DAGC (Delayed Automatic Gain Control, delay automatic gain control) algorithm based on the signal amplitude, which determines the gain by measuring the amplitude of the input signal. It typically first performs Root Mean Square (RMS) or peak detection on the input signal and then adjusts the gain based on the detected amplitude. This approach is straightforward but may be affected in the case of processing nonlinear signals or the presence of noise.
Second, the DAGC algorithm, which is based on the statistical properties of the signal, uses the statistical properties of the signal (e.g., mean, variance, etc.) to adjust the gain. For example, if the average of the input signal is too low, the gain may be increased to boost the strength of the signal. This approach has certain advantages in dealing with non-linearity and noise problems, as it can take into account the overall characteristics of the signal, not just a single instantaneous value. However, this method requires a certain time to accumulate statistical data, and thus its response speed may be slow.
It can be seen that the existing voice processing method has poor gain control effect in the real-time voice interaction scene.
Disclosure of Invention
The embodiment of the invention provides a voice gain control method, a voice gain control device, a voice gain control terminal and a voice gain control storage medium, which are used for solving the problem of poor voice gain control effect in a real-time voice interaction scene in the prior art.
In a first aspect, an embodiment of the present invention provides a method for controlling a speech gain, including:
Performing Fourier transform on the original voice signal of the current frame to obtain a frequency spectrum of the original voice signal of the current frame, and determining amplitude values of all frequency points in the frequency spectrum;
Calculating the signal to noise ratio of each frequency point in the frequency spectrum;
according to the signal-to-noise ratio and the amplitude value of each frequency point, determining the gain coefficient of the corresponding frequency point;
and enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target voice signal.
In a second aspect, an embodiment of the present invention provides a speech gain control apparatus, including:
the frequency spectrum acquisition module is used for carrying out Fourier transform on the original voice signal of the current frame to obtain the frequency spectrum of the original voice signal of the current frame, and determining the amplitude value of each frequency point in the frequency spectrum;
the signal-to-noise ratio calculation module is used for calculating the signal-to-noise ratio of each frequency point in the frequency spectrum;
The gain coefficient determining module is used for determining the gain coefficient of the corresponding frequency point according to the signal-to-noise ratio and the amplitude value of each frequency point;
And the voice enhancement module is used for enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain a target voice signal.
In a third aspect, an embodiment of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any one of the possible implementations of the first aspect above when the computer program is executed.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as described in any one of the possible implementations of the first aspect above.
The embodiment of the invention provides a voice gain control method, a voice gain control device, a voice gain control terminal and a voice gain control storage medium, wherein the method comprises the steps of firstly carrying out Fourier transform on an original voice signal of a current frame to obtain a frequency spectrum of the original voice signal of the current frame, and determining amplitude values of various frequency points in the frequency spectrum; then calculating the signal-to-noise ratio of each frequency point in the frequency spectrum; according to the signal-to-noise ratio and the amplitude value of each frequency point, determining the gain coefficient of the corresponding frequency point; and finally, enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target voice signal. The method can carry out framing processing on the real-time voice signals collected in the real-time voice interaction scene, and determines the adaptive gain values based on the signal-to-noise ratio and the amplitude values of different frequency points of the original voice signals of the current frame in the frequency domain, so that the interference of noise is reduced in the gain control process, the voice gain control effect is improved, and the response speed is also improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an implementation of a voice gain control method according to an embodiment of the present invention;
Fig. 2 is a schematic structural diagram of a voice gain control device according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the following description will be made by way of specific embodiments with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of an implementation of a voice gain control method according to an embodiment of the present invention is shown, and details are as follows:
S101: and carrying out Fourier transform on the original voice signal of the current frame to obtain a frequency spectrum of the original voice signal of the current frame, and determining amplitude values of all frequency points in the frequency spectrum.
Specifically, in the voice real-time interaction scene, after the audio acquisition device acquires the original voice signal, the audio acquisition device stores the original voice signal in the audio library, the audio acquisition thread of the terminal acquires the original voice signal stored in the audio library, and stores the acquired original voice signal in the cache, so that the terminal extracts the original voice signal in the cache to perform subsequent voice gain control.
The method can collect audio data from the audio library in real time and also can collect historical audio data stored in the audio library, so that the method can adapt to a voice real-time interaction scene and a non-real-time interaction scene.
The audio acquisition thread of the terminal acquires the original voice signals from the buffer memory according to frames, and then carries out Fourier transform on the original voice signals of the current frame to obtain the frequency spectrum of the original voice signals of the current frame. The abscissa of the frequency spectrum is frequency, and the ordinate is the amplitude value of the voice at different frequency points. Therefore, after the frequency spectrum of the original voice signal of the current frame is obtained, the amplitude values of different frequency points can be determined. Wherein, the audio collection thread can be one or more.
In one possible implementation manner, before S101, the method provided by this embodiment further includes:
And intercepting the original voice signal of the current frame from the cached original voice signal by adopting a window function, wherein the moving step length of the window function is smaller than the window length of the window function.
In this embodiment, the data processing thread of the terminal reads the original voice signal from the buffer memory by using a window function to prevent data leakage. Wherein the window function may be a hamming window, a hanning window, etc.
Specifically, the terminal adopts a window function to carry out windowing processing on the original voice signal, and intercepts the original voice signal of the current frame in the buffer memory. The moving step length of the window function is smaller than that of the window function, namely, an overlapped part exists in the original voice signals of the front frame and the rear frame. So as to further improve the speech processing effect.
Preferably, the movement step of the window function is set to half the window step of the window function.
For example, the data processing thread newly reads 256 points of data each time, splices the 256 points of data read by the previous frame with the 256 points of data read by the previous frame to obtain 512 points of data, and then performs windowing processing to obtain the original voice signal of the current frame.
After the original voice signal of the current frame is obtained, the original voice signal of the current frame is subjected to fast Fourier transform, and the frequency spectrum of the original voice signal of the current frame is obtained. Then, gain processing is carried out on the original voice signal of the current frame in the frequency domain.
S102: and calculating the signal to noise ratio of each frequency point in the frequency spectrum.
In one possible implementation, the specific implementation procedure of S102 includes:
S201: and carrying out noise estimation on the frequency spectrum to obtain noise estimation values corresponding to all the frequency points.
In one possible implementation, the specific implementation procedure of S201 includes:
And carrying out noise estimation on the frequency spectrum based on IMCRA (Improved Minima Controlled Recursive Averaging, improved minimum control recursive average algorithm) algorithm to obtain noise estimation values corresponding to all frequency points.
Specifically, when noise estimation is performed on the spectrum, MCRA (Minima Controlled Recursive Averaging, minimum control recursive average algorithm) or neural network algorithm may also be used.
S202: and calculating the voice gain corresponding to the corresponding frequency point according to the noise estimation value corresponding to each frequency point, and filtering the frequency spectrum by adopting the voice gain corresponding to each frequency point to obtain a pure voice signal of the frequency domain.
In one possible implementation, the specific implementation procedure of S202 includes:
And deriving the voice gain corresponding to each frequency point by combining noise estimation values corresponding to each frequency point based on OMLSA (Optimally-modified log-spectral amplitude, optimal improved log-spectrum amplitude estimation) algorithm.
S203: and subtracting the pure voice signal from the frequency spectrum to obtain a noise signal.
Specifically, subtracting the amplitude value of the corresponding frequency point in the pure voice signal from the amplitude value of each frequency point in the frequency spectrum to obtain a noise signal.
S204: and calculating the signal-to-noise ratio of each frequency point based on the energy spectrum of the pure voice signal and the energy spectrum of the noise signal.
Specifically, the energy value of the pure voice signal of the same frequency point is divided by the energy value of the noise signal to obtain the signal-to-noise ratio of the frequency point.
S103: and determining the gain coefficient of the corresponding frequency point according to the signal-to-noise ratio and the amplitude value of each frequency point.
In one possible implementation, the specific implementation procedure of S103 includes:
Determining a first gain coefficient of a corresponding frequency point according to the signal-to-noise ratio of each frequency point, wherein the signal-to-noise ratio and the first gain coefficient are positively correlated;
Determining a second gain coefficient of the corresponding frequency point according to the amplitude value of each frequency point, wherein the amplitude value and the second gain coefficient are in negative correlation;
Multiplying the first gain coefficient and the second gain coefficient of the same frequency point to obtain the gain coefficient of the frequency point.
Specifically, the signal-to-noise ratio is exponentially related to the first gain factor. The magnitude value is inversely proportional to the second gain factor.
Specifically, the specific implementation procedure of S103 may further include:
And searching the amplitude value of each frequency point and the gain coefficient corresponding to the signal to noise ratio from a preset gain table. The preset gain table comprises corresponding relations among amplitude values, signal to noise ratios and gain coefficients.
S104: and enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target voice signal.
In one possible implementation, the specific implementation procedure of S104 includes:
multiplying the gain coefficient of each frequency point with the amplitude value of the corresponding frequency point to obtain a first voice signal;
And performing inverse Fourier transform on the first voice signal to obtain a target voice signal.
Specifically, the embodiment comprehensively considers the amplitude value and the signal-to-noise ratio to adaptively enhance the voice signal, and can enhance the voice by using a larger gain value on the frequency points with small amplitude value and large signal-to-noise ratio; the voice is enhanced by using a smaller gain value on a frequency point with a large amplitude value and a small signal-to-noise ratio, so that the problem that the voice is easy to be interfered by noise in the real-time voice interaction process can be solved, and the method is low in calculation cost and wide in applicability.
In one possible implementation manner, after obtaining the target voice signal, the method provided in this embodiment further includes:
and superposing the target voice signal of the current frame and the final voice signal of the previous frame by adopting an overlap-add method to obtain the final voice signal of the current frame.
Specifically, in this embodiment, the overlapping portion of the target speech signal of the current frame and the overlapped portion of the final speech signal of the previous frame are added by using the overlap-add method, so that the signal quality of the final speech signal of the current frame can be improved, and the second half portion of the final speech signal of the current frame can be overlapped with the target speech signal of the next frame to improve the speech quality, so that the overall speech enhancement effect can be better improved, and the overall quality of the speech signal can be improved.
As can be seen from the above embodiments, in this embodiment, the processing method of performing the frequency division gain processing on the original speech signal of the current frame in the frequency domain can avoid the problem that the noise is amplified simultaneously when the speech is amplified in a scene where the high-frequency noise and the low-frequency noise are more obvious, compared with the conventional processing method of uniformly using one gain value in the time domain and the full frequency band. The embodiment uses a lower gain value for a frequency band with a low signal-to-noise ratio of high frequency or low frequency and a higher gain value for a medium frequency band with a high signal-to-noise ratio, so that not only can a voice signal be enhanced, but also the interference caused by noise increase can be avoided, the voice processing response speed can be ensured, and the voice quality can be improved.
In addition, the noise signal and the clean voice signal can be well estimated by combining the noise reduction algorithms IMCRA and OMLSA, so that the calculation accuracy of the signal-to-noise ratio can be improved, and the voice signal is further improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
The following are device embodiments of the invention, for details not described in detail therein, reference may be made to the corresponding method embodiments described above.
Fig. 2 is a schematic structural diagram of a speech gain control apparatus according to an embodiment of the present invention, and for convenience of explanation, only the portions related to the embodiment of the present invention are shown, which are described in detail below:
as shown in fig. 2, the speech gain control apparatus 100 includes:
The frequency spectrum acquisition module 110 is configured to perform fourier transform on the current frame original speech signal to obtain a frequency spectrum of the current frame original speech signal, and determine an amplitude value of each frequency point in the frequency spectrum;
The signal-to-noise ratio calculation module 120 is configured to calculate a signal-to-noise ratio of each frequency point in the spectrum;
The gain coefficient determining module 130 is configured to determine a gain coefficient of a corresponding frequency point according to the signal-to-noise ratio and the amplitude value of each frequency point;
The voice enhancement module 140 is configured to enhance the amplitude value of each frequency point based on the gain coefficient of the corresponding frequency point, so as to obtain a target voice signal.
In one possible implementation, the gain factor determination module 130 includes:
Determining a first gain coefficient of a corresponding frequency point according to the signal-to-noise ratio of each frequency point, wherein the signal-to-noise ratio and the first gain coefficient are positively correlated;
Determining a second gain coefficient of the corresponding frequency point according to the amplitude value of each frequency point, wherein the amplitude value and the second gain coefficient are in negative correlation;
Multiplying the first gain coefficient and the second gain coefficient of the same frequency point to obtain the gain coefficient of the frequency point.
In one possible implementation, the signal-to-noise ratio calculation module 120 includes:
the noise estimation unit is used for carrying out noise estimation on the frequency spectrum to obtain noise estimation values corresponding to all frequency points;
The pure voice signal extraction unit is used for calculating voice gains corresponding to the corresponding frequency points according to the noise estimation values corresponding to the frequency points, and filtering the frequency spectrum by adopting the voice gains corresponding to the frequency points to obtain a pure voice signal of a frequency domain;
a noise signal extraction unit, configured to subtract the pure speech signal from the spectrum to obtain a noise signal;
And the signal-to-noise ratio calculation unit is used for calculating the signal-to-noise ratio of each frequency point based on the energy spectrum of the pure voice signal and the energy spectrum of the noise signal.
In one possible embodiment, the noise signal extraction unit includes:
And carrying out noise estimation on the frequency spectrum based on IMCRA algorithm to obtain noise estimation values corresponding to all frequency points.
In one possible embodiment, the clean speech signal extraction unit comprises:
And deriving the voice gain corresponding to the corresponding frequency point by combining the noise estimation value corresponding to each frequency point based on OMLSA algorithm.
In one possible implementation, the speech enhancement module 140 includes:
multiplying the gain coefficient of each frequency point with the amplitude value of the corresponding frequency point to obtain a first voice signal;
And performing inverse Fourier transform on the first voice signal to obtain a target voice signal.
In one possible implementation, the speech gain control apparatus 100 further includes:
the signal acquisition module is used for intercepting the original voice signal of the current frame from the cached original voice signal by adopting a window function, wherein the moving step length of the window function is smaller than the window length of the window function;
The speech gain control apparatus 100 further includes:
And the signal superposition module is used for superposing the target voice signal of the current frame and the final voice signal of the previous frame by adopting an overlap-add method to obtain the final voice signal of the current frame.
Fig. 3 is a schematic diagram of a terminal according to an embodiment of the present invention. As shown in fig. 3, the terminal 3 of this embodiment includes: a processor 30 and a memory 31. The memory 31 is used for storing a computer program 32, and the processor 30 is used for calling and running the computer program 32 stored in the memory 31 to execute the steps in the above-mentioned embodiments of the voice gain control method, such as the steps S101 to S104 shown in fig. 1. Or the processor 30 is configured to invoke and run the computer program 32 stored in the memory 31 to implement the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 110 to 140 shown in fig. 2.
Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 32 in the terminal 3. For example, the computer program 32 may be partitioned into modules 110 through 140 shown in FIG. 2.
The terminal 3 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal 3 may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the terminal 3 and does not constitute a limitation of the terminal 3, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal may further include an input-output device, a network access device, a bus, etc.
The processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (DIGITAL SIGNAL processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-programmable gate array (field-programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may be an internal storage unit of the terminal 3, such as a hard disk or a memory of the terminal 3. The memory 31 may also be an external storage device of the terminal 3, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the terminal 3. The memory 31 is used for storing the computer program as well as other programs and data required by the terminal. The memory 31 may also be used for temporarily storing data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing the relevant hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the embodiments of the method of voice gain control when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. A method for controlling speech gain, comprising:
Performing Fourier transform on the original voice signal of the current frame to obtain a frequency spectrum of the original voice signal of the current frame, and determining amplitude values of all frequency points in the frequency spectrum;
Calculating the signal to noise ratio of each frequency point in the frequency spectrum;
according to the signal-to-noise ratio and the amplitude value of each frequency point, determining the gain coefficient of the corresponding frequency point;
and enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target voice signal.
2. The method according to claim 1, wherein the determining the gain coefficient of the corresponding frequency point according to the signal-to-noise ratio and the amplitude value of each frequency point comprises:
Determining a first gain coefficient of a corresponding frequency point according to the signal-to-noise ratio of each frequency point, wherein the signal-to-noise ratio and the first gain coefficient are positively correlated;
Determining a second gain coefficient of the corresponding frequency point according to the amplitude value of each frequency point, wherein the amplitude value and the second gain coefficient are in negative correlation;
Multiplying the first gain coefficient and the second gain coefficient of the same frequency point to obtain the gain coefficient of the frequency point.
3. The method according to claim 1, wherein the calculating the signal-to-noise ratio of each frequency point in the spectrum includes:
carrying out noise estimation on the frequency spectrum to obtain noise estimation values corresponding to all frequency points;
Calculating the voice gain corresponding to the corresponding frequency point according to the noise estimation value corresponding to each frequency point, and filtering the frequency spectrum by adopting the voice gain corresponding to each frequency point to obtain a pure voice signal of the frequency domain;
Subtracting the pure voice signal from the frequency spectrum to obtain a noise signal;
and calculating the signal-to-noise ratio of each frequency point based on the energy spectrum of the pure voice signal and the energy spectrum of the noise signal.
4. The method of claim 3, wherein the performing noise estimation on the spectrum to obtain noise estimation values corresponding to each frequency point comprises:
And carrying out noise estimation on the frequency spectrum based on IMCRA algorithm to obtain noise estimation values corresponding to all frequency points.
5. The method of claim 3, wherein the calculating the voice gain corresponding to the corresponding frequency point according to the noise estimation value corresponding to each frequency point comprises:
And deriving the voice gain corresponding to the corresponding frequency point by combining the noise estimation value corresponding to each frequency point based on OMLSA algorithm.
6. The method for controlling speech gain according to claim 1, wherein the step of enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain the target speech signal comprises:
multiplying the gain coefficient of each frequency point with the amplitude value of the corresponding frequency point to obtain a first voice signal;
And performing inverse Fourier transform on the first voice signal to obtain a target voice signal.
7. The method of claim 6, further comprising, prior to said fourier transforming the current frame original speech signal:
Intercepting an original voice signal of a current frame from the cached original voice signal by adopting a window function, wherein the moving step length of the window function is smaller than the window length of the window function;
after obtaining the target voice signal, the method further comprises:
and superposing the target voice signal of the current frame and the final voice signal of the previous frame by adopting an overlap-add method to obtain the final voice signal of the current frame.
8. A speech gain control apparatus, comprising:
the frequency spectrum acquisition module is used for carrying out Fourier transform on the original voice signal of the current frame to obtain the frequency spectrum of the original voice signal of the current frame, and determining the amplitude value of each frequency point in the frequency spectrum;
the signal-to-noise ratio calculation module is used for calculating the signal-to-noise ratio of each frequency point in the frequency spectrum;
The gain coefficient determining module is used for determining the gain coefficient of the corresponding frequency point according to the signal-to-noise ratio and the amplitude value of each frequency point;
And the voice enhancement module is used for enhancing the amplitude value of the corresponding frequency point based on the gain coefficient of each frequency point to obtain a target voice signal.
9. A terminal comprising a processor and a memory, the memory for storing a computer program, the processor for invoking and running the computer program stored in the memory to perform the speech gain control method according to any of claims 1 to 7.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the speech gain control method according to any of the preceding claims 1 to 7.
CN202311622161.XA 2023-11-29 2023-11-29 Voice gain control method, device, terminal and storage medium Pending CN117912462A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311622161.XA CN117912462A (en) 2023-11-29 2023-11-29 Voice gain control method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311622161.XA CN117912462A (en) 2023-11-29 2023-11-29 Voice gain control method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN117912462A true CN117912462A (en) 2024-04-19

Family

ID=90689978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311622161.XA Pending CN117912462A (en) 2023-11-29 2023-11-29 Voice gain control method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN117912462A (en)

Similar Documents

Publication Publication Date Title
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
CN111583949A (en) Howling suppression method, device and equipment
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
CN111261148B (en) Training method of voice model, voice enhancement processing method and related equipment
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN110634500A (en) Method for calculating prior signal-to-noise ratio, electronic device and storage medium
WO2021007841A1 (en) Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
US20140072132A1 (en) Method and system for reducing impulsive noise disturbance
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
US20120243702A1 (en) Method and arrangement for processing of audio signals
CN111986694B (en) Audio processing method, device, equipment and medium based on transient noise suppression
CN113241089A (en) Voice signal enhancement method and device and electronic equipment
CN111968620B (en) Algorithm testing method and device, electronic equipment and storage medium
CN117594053A (en) Voice noise reduction method, processing terminal and storage medium
CN117912462A (en) Voice gain control method, device, terminal and storage medium
US20230290367A1 (en) Hum noise detection and removal for speech and music recordings
CN112489669B (en) Audio signal processing method, device, equipment and medium
CN114220451A (en) Audio denoising method, electronic device, and storage medium
CN114360572A (en) Voice denoising method and device, electronic equipment and storage medium
CN113205824A (en) Sound signal processing method, device, storage medium, chip and related equipment
CN113763975A (en) Voice signal processing method and device and terminal
CN110648681A (en) Voice enhancement method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination