CN114464205A - Audio processing method based on howling detection and electronic equipment - Google Patents

Audio processing method based on howling detection and electronic equipment Download PDF

Info

Publication number
CN114464205A
CN114464205A CN202210124282.0A CN202210124282A CN114464205A CN 114464205 A CN114464205 A CN 114464205A CN 202210124282 A CN202210124282 A CN 202210124282A CN 114464205 A CN114464205 A CN 114464205A
Authority
CN
China
Prior art keywords
volume
howling
audio frame
frequency point
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210124282.0A
Other languages
Chinese (zh)
Inventor
林泽阳
陈凯达
卢博文
秦明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Lianji Technology Co ltd
Original Assignee
Hangzhou Lianji Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Lianji Technology Co ltd filed Critical Hangzhou Lianji Technology Co ltd
Priority to CN202210124282.0A priority Critical patent/CN114464205A/en
Publication of CN114464205A publication Critical patent/CN114464205A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the application provides an audio processing method based on howling detection, and belongs to the technical field of audio processing. The method comprises the following steps: acquiring an audio signal, performing Fourier transform on the audio signal, and acquiring an audio frame of a frequency domain space, wherein the audio frame comprises a plurality of frequency points; carrying out howling frequency point screening on the plurality of frequency points, and acquiring corresponding screening results; detecting the target sound of the audio frame, and acquiring a corresponding detection result, wherein the detection result is used for indicating whether the target sound exists in the audio frame; and dynamically adjusting the volume level of the audio frame according to the screening result and the detection result. According to the method, the volume is dynamically adjusted based on the howling frequency point screening result and the target sound detection result, and the problem that howling can not be flexibly and efficiently suppressed and the sound quality can not be guaranteed is solved.

Description

Audio processing method based on howling detection and electronic equipment
Technical Field
The present application belongs to the field of audio processing technologies, and in particular, to an audio processing method and an electronic device based on howling detection.
Background
In general, when sound is amplified using sound equipment such as an audio system, howling is likely to occur particularly in a scene in which a microphone is used as a sound collector indoors. The howling is caused by: echo exists in a room, when reflected sound of original sound and generated echo pass through the microphone, the reflected sound and the generated echo are picked up by the microphone again, and therefore the echo enters sound amplification equipment such as a sound box and the like to be amplified; and when the echo and the original sound are the same in phase, the two sounds are superposed and enhanced, the enhanced sound and the echo generated again enter sound equipment and other sound amplification equipment for amplification through a microphone, the echo is enhanced again, and the strong harsh howling is generated in this way repeatedly.
Howling can seriously affect the user experience and may damage the device in serious cases, so that the howling needs to be suppressed. The existing howling suppression mode generally uses a wave trap to suppress a few howling frequency points on the premise of assuming that the number of the howling frequency points is limited or even extremely small. However, the existing howling frequency point screening basis is too single, and false detection is easy to occur, so that useful sound information such as human voice and music is damaged, the howling suppression effect is poor, and the user experience is influenced.
Therefore, how to flexibly and effectively perform audio processing based on howling detection to achieve both sound quality and howling suppression effect is a problem to be solved urgently.
Disclosure of Invention
The application provides an audio processing method based on howling detection and electronic equipment, which dynamically adjust the volume through screening results based on howling frequency points and detection results of target sounds, and solve the problem that howling can not be flexibly and efficiently suppressed and tone quality can not be guaranteed.
In a first aspect, an audio processing method based on howling detection is provided, and is applied to an electronic device, and the method includes:
acquiring an audio signal, and acquiring an audio frame of a frequency domain space according to the audio signal, wherein the audio frame comprises a plurality of frequency points;
carrying out howling frequency point screening on the plurality of frequency points, and acquiring corresponding screening results; and (c) a second step of,
performing target sound detection on the audio signal, and acquiring a corresponding detection result, wherein the detection result is used for indicating whether the target sound exists in the audio frame, and the target sound comprises human voice and/or music voice;
according to the screening result and the detection result, dynamically adjusting the volume level of the audio frame; wherein, when the screening result indicates that the howling frequency point exists,
if the detection result indicates that the target sound exists, the volume of the audio frame is adjusted downwards according to a preset volume adjustment level, and the steps are repeated until the volume reaches a preset first volume threshold value; or,
if the detection result indicates that the target sound does not exist, the volume of the audio frame is adjusted downwards according to a preset volume adjustment level, and the steps are repeated until the volume reaches a preset second volume threshold, wherein the volume corresponding to the second volume threshold is lower than the volume corresponding to the first volume threshold.
Wherein, repeating the above steps may specifically comprise repeating the following steps: repeatedly acquiring an audio signal, and acquiring an audio frame of a frequency domain space according to the audio signal, wherein the audio frame comprises a plurality of frequency points; carrying out howling frequency point screening on the plurality of frequency points, and acquiring corresponding screening results; detecting a target sound of the audio signal, and acquiring a corresponding detection result, wherein the detection result is used for indicating whether the target sound exists in the audio frame, and the target sound comprises a human sound and/or a music sound; and dynamically adjusting the volume level of the audio frame according to the screening result and the detection result.
In one implementation, the first volume threshold may be a lowest volume that can be adjusted downward when a howling frequency point exists and a target sound exists; the second volume threshold may be the lowest volume that can be turned down when there is a howling frequency point but there is no target sound. In order to highlight the target sound, the first volume threshold may be higher than the second volume threshold. For example, in the case of an audio frame having a howling frequency point, if there is no target sound, the lowest sound volume capable of being adjusted downward (second sound volume threshold) may be set to 40 db in order to suppress howling, and if there is a target sound, although howling needs to be suppressed, the lowest sound volume capable of being adjusted downward is limited in order to guarantee sound quality, for example, the lowest sound volume (first sound volume threshold) may be limited to 50 db, where the first sound volume threshold corresponds to 50 db and is 40 db higher than the lowest sound volume capable of being adjusted downward (second sound volume threshold) when there is no target sound.
It should be understood that specific values of the first volume threshold and the second volume threshold may be flexibly set as needed, for example, the first volume threshold may be set to a volume corresponding to complete howling suppression, the second volume threshold may be set to a volume corresponding to partial howling suppression, and the like, which is not limited in this embodiment of the application.
In an implementation manner, the step of down-adjusting the volume of the audio frame may specifically include down-adjusting the current volume of the audio frame according to preset volume down-adjustment levels, where each preset volume down-adjustment level may correspond to a certain decibel number, such as 5 decibels, 10 decibels, and the like.
It should be understood that, in the method of this implementation manner, a stepped volume adjustment manner may be adopted, specifically, when each round of howling is detected, the volume level may be reduced according to a preset volume down-regulation level to attenuate the volume by a certain decibel number, and since the howling may exist periodically, if the howling is detected continuously, the volume level may be continuously down-regulated, so that the volume may be reduced rapidly until the howling suppression disappears by the set lowest volume.
The volume adjusting mode has small and stable calculation amount, does not need filtering operation, and does not increase suddenly because of the increase of howling frequency.
According to the audio processing method based on howling detection provided by the implementation mode, the target sound is detected simultaneously in the process of screening the howling frequency points, and when the howling frequency points exist, the volume is dynamically adjusted based on the detection result of the target sound, and the volume is adjusted to the lowest volume level corresponding to the detection result of the target sound, so that howling can be effectively and flexibly suppressed, effective sound information is not lost, the tone quality is guaranteed, and the user experience is improved.
With reference to the first aspect, in certain implementations of the first aspect, when the screening result indicates that the howling frequency point does not exist, the method further includes:
adjusting the volume of the audio frame according to a preset volume up-regulation level, and repeating the steps until the volume reaches a preset third volume threshold; or,
directly up-regulating the volume of the audio frame to a third volume threshold.
The third volume threshold may be the highest volume that can be adjusted up and is preset without the howling frequency point.
For example, the volume corresponding to the third volume threshold may be the highest volume preset by the user, or may be another volume suitable for the user to listen to the audio and the like set initially (or by default). The specific value (decibel) corresponding to the third volume threshold may be flexibly set according to the needs, which is not limited in the embodiment of the present application.
It should be understood that when the filtering result indicates that there is no howling frequency point, the volume of the audio frame may be adjusted up, wherein the volume of the audio frame may be adjusted up in various ways, including: based on the detection result of the target sound, the volume of the audio frame is adjusted up in a stepped mode; or, the volume of the audio frame is directly adjusted to the corresponding preset volume without using a mode of adjusting the volume in a step mode.
It should be understood that the method provided by this implementation manner can perform targeted adjustment on howling suppression when it is recognized that the target sound exists in the audio frame (for example, when there is human voice and/or musical sound, the maximum volume attenuation degree is limited, or when there is human voice and/or musical sound, the audio level is adjusted to directly recover the normal volume, etc.), thereby avoiding loss of important information and ensuring the sound quality.
According to the method provided by the implementation mode, under the condition that no howling is detected, the volume level can be continuously adjusted upwards until the normal volume is recovered by adopting a mode of adjusting the volume of the audio frame in a stepped mode, and when the howling does not exist all the time, the volume level is more accurately controlled; and through the mode of directly adjusting the volume of the audio frame under the condition of detecting no howling, when the howling does not exist, the volume can be quickly recovered, the volume level is increased, the volume is recovered to a certain decibel number, and the efficiency of volume adjustment is improved.
With reference to the first aspect, in some implementation manners of the first aspect, the performing howling frequency point screening on the multiple frequency points and obtaining corresponding screening results specifically includes:
acquiring power spectrum energy corresponding to the multiple frequency points according to the audio frame;
selecting at least one first candidate frequency point with the highest power spectrum energy from the plurality of frequency points;
and screening the howling frequency point from the at least one candidate frequency point according to a peak threshold power ratio PTPR, a peak-to-average power ratio PAPR, a peak harmonic power ratio PHPR and a short-time autocorrelation function STACF.
According to the method provided by the implementation mode, the howling frequency points are comprehensively screened by utilizing multiple algorithms, the accuracy of screening the howling frequency points can be improved, the problem that the howling frequency points are missed or mistakenly selected due to single screening basis is avoided, and the howling can be conveniently and effectively inhibited subsequently.
With reference to the first aspect, in certain implementation manners of the first aspect, the performing target sound detection on the audio signal and obtaining a corresponding detection result specifically includes:
when the howling frequency point screening is carried out on the plurality of frequency points, the number of harmonic waves in the audio frame is counted;
and acquiring the detection result of the target sound according to the power spectrum energy and the harmonic quantity corresponding to the multiple frequency points.
With reference to the first aspect, in some implementation manners of the first aspect, the obtaining a detection result of the target sound according to the power spectrum energy and the harmonic number corresponding to the multiple frequency points specifically includes:
calculating identification characteristics corresponding to the target sound according to the power spectrum energy corresponding to the multiple frequency points, wherein the identification characteristics comprise at least one of a harmonicity characteristic, a frequency spectrum flux characteristic and a sub-band energy ratio balance characteristic;
performing historical smoothing processing on the identification features to obtain target identification features;
and acquiring the detection result of the target sound according to the target identification feature and the harmonic quantity.
According to the audio processing method based on howling detection provided by the implementation mode, in the process of detecting the howling frequency point, the detection of the human voice and/or music is carried out by using the characteristics such as the harmonicity characteristic, the spectral flux characteristic, the subband energy proportion balance degree and the like, so that whether the human voice and/or the music voice exist in the audio can be more accurately obtained, the subsequent flexible volume adjustment according to the corresponding recognition result is facilitated, the howling is suppressed, the human voice and/or the music voice are protected, the loss of important information is avoided, and the tone quality is ensured.
With reference to the first aspect, in certain implementation manners of the first aspect, the acquiring an audio signal and acquiring an audio frame of a frequency domain space according to the audio signal specifically includes:
acquiring the audio signal from a buffer area, wherein the audio signal received by the electronic equipment is stored in the buffer area;
performing framing processing on the audio signal according to a preset sampling rate and the number of samples corresponding to the length of an audio frame to obtain the audio frame;
windowing the audio frame according to a window function to obtain the audio frame after windowing;
and carrying out Fourier transform on the audio frame subjected to windowing processing to obtain the audio frame in a frequency domain space.
With reference to the first aspect, in certain implementations of the first aspect, the screening the howling frequency point from the at least one candidate frequency point according to a peak-to-threshold power ratio PTPR, a peak-to-average power ratio PAPR, a peak-to-harmonic power ratio PHPR, and a short-time autocorrelation function STACF specifically includes:
carrying out primary screening on the first candidate frequency point by utilizing a peak threshold power ratio criterion PTPR and a peak-to-average power ratio criterion PAPR to obtain a second candidate frequency point;
carrying out secondary screening on the second candidate frequency point by utilizing a peak harmonic power ratio PHPR to obtain a third candidate frequency point;
and screening the third candidate frequency point for three times by using a short-time autocorrelation function STACF to obtain the howling frequency point.
According to the audio processing method based on howling detection provided by the implementation mode, howling frequency points are screened by using multiple basis criteria, and when the howling frequency points exist, the volume level is dynamically adjusted by screening results based on the howling frequency points, so that howling can be efficiently and flexibly suppressed on the basis of ensuring the tone quality, and the user experience is improved.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes:
carrying out periodic timing operation through a timer, wherein the timing operation comprises timing starting and timing ending;
and when the timing is finished each time, acquiring the screening result and the detection result corresponding to the timing period.
According to the method provided by the implementation mode, the timer is used for setting the starting time and the ending time of the howling frequency point detection, so that the periodic repeated detection of the howling frequency point can be realized, the howling detection is more flexible, and the obtained howling frequency point detection result is more timely.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes:
when the timing is finished, the corresponding detection result during the timing indicates that the target sound does not exist, and the screening result indicates that the howling frequency point exists, the audio frame is adjusted downwards by one volume level according to the preset volume down-adjustment level; or,
and when the timing is finished and the screening result indicates that the howling frequency point does not exist, adjusting the volume level of the audio frame according to the preset volume up-regulation level.
According to the method provided by the implementation mode, the beneficial effects that the calculated amount is small and stable, the filtering operation is not needed, and the calculated amount is not increased suddenly due to the increase of the howling frequency can be realized by adjusting the volume level by level according to the volume level.
In a second aspect, an electronic device for howling suppression is provided, including:
one or more processors;
one or more memories;
the memory stores computer-readable program instructions that, when executed by the processor, cause the electronic device to perform a howling detection-based audio processing method as described in any of the implementations of the first aspect.
In a third aspect, a computer-readable storage medium is provided, which comprises computer instructions that, when executed, cause a method for audio processing based on howling detection as described in any of the implementations of the first aspect above to be implemented.
In a fourth aspect, a computer product is provided, which includes computer instructions that, when executed in a computer, enable the method for audio processing based on howling detection described in any one of the implementations of the first aspect to be implemented.
Drawings
Fig. 1 is a schematic structural diagram of an electronic device for suppressing howling according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of an audio processing method based on howling detection according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of processing an audio signal according to an embodiment of the present application.
Fig. 4 is a schematic flowchart of another audio processing method based on howling detection according to an embodiment of the present application.
Fig. 5 is a schematic flowchart of another audio processing method based on howling detection according to an embodiment of the present application.
Fig. 6 is a schematic flowchart of another audio processing method based on howling detection according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a hardware structure of an electronic device for suppressing howling according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
It is noted that the terminology used in the description of the embodiments of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application. In the description of the embodiments of the present application, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more, and "at least one", "one or more" means one, two or more, unless otherwise specified.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a definition of "a first" or "a second" feature may explicitly or implicitly include one or more of the features.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The most common condition for judging whether a system has howling is the nyquist stability criterion, that is, when the system overall loop gain of a frequency point is greater than or equal to 1 and the system overall phase difference is an integral multiple of 2 pi, the frequency point is considered to be capable of forming howling, the system is unstable, and howling suppression is needed at this moment. In an actual scene, a room or an environment may change, a position of a device may also change, and thus a phase condition is uncertain, so in an extreme case, howling may occur in the phase condition of all frequency points, and the howling frequency points may be very many. Therefore, howling suppression is required as long as the overall loop gain of the system may be 1 or more.
With the introduction in the background art, at present, a trap filter is usually used for howling suppression, however, when there are many howling frequency points, the number of the required trap filters is increased, which leads to a rapid increase of the calculation amount; or the number of the wave traps is limited, the multi-frequency howling situation or the broadband howling situation cannot be met, and the howling cannot be completely suppressed. In other existing techniques for suppressing howling, frequency bands are usually divided according to experience conditions of a fixed scene, and it is assumed that frequency ranges of human voice, music and howling are within a certain limited range. However, in practice, the effective frequency ranges of the three are overlapped, the overall loop gain is not only related to the environment, but also related to the frequency response of the device itself, and when the device changes, the frequency point of the common howling changes, so that the howling suppression method is poor in universality and is easy to lose important sound information. In addition, the existing howling suppression means generally cannot cope with the existence of network delay, and cannot effectively detect the problem of periodic howling caused by delay.
In view of the above-mentioned deficiencies in howling detection and suppression performance, embodiments of the present application provide an audio processing method based on howling detection, which identifies human voice, music voice, and the like simultaneously in the screening process of the howling frequency points, and then dynamically adjusts the volume according to the screening result of the howling frequency points and the detection result of the sound, so that howling can be flexibly and stably suppressed on the basis of avoiding the loss of important sound information, and howling suppression and protection of important sound information are both considered.
The audio processing method based on howling detection provided by the embodiment of the application can be applied to various scenes with howling suppression requirements, such as improvement of sound amplification tone quality, for example, an indoor conference sound amplification scene, a karaoke scene or a karaoke scene sung by using a portable mobile terminal. The audio processing method based on howling detection provided by the embodiment of the application can be applied to various electronic devices, such as smart phones, computers, microphones, sound equipment and the like.
Exemplarily, as shown in fig. 1, a schematic structural diagram of an electronic device for suppressing howling provided in an embodiment of the present application is shown. The howling suppression electronic device 100 may include a frequency domain transformation module 101, a howling frequency point detection module 102, a human voice and music detection module 103, and an energy control logic module 104.
In some embodiments, the frequency domain transform module 101 is mainly used to perform various basic processing on the input audio signal, including buffering, framing, windowing, and discrete fourier transforming the input audio signal, so as to transform the audio signal from a time domain form to a frequency domain form, and finally taking the amplitude spectrum and the power spectrum and statistically calculating the total energy and the average energy of the current audio frame.
In some embodiments, the howling frequency point detecting module 102 is mainly configured to detect whether howling exists, including selecting a candidate frequency point, performing a primary screening on the candidate frequency point, performing a secondary screening on the candidate frequency point, performing a tertiary screening on the candidate frequency point, and the like. After multiple screening, the finally reserved frequency point is the howling frequency point.
In some embodiments, the voice music detection module 103 is mainly configured to detect whether there is voice or music, and includes calculating identification features such as a harmonic degree feature, a spectral flux feature, a subband energy ratio balance degree feature, and the like, and corresponding determination logic.
In some embodiments, the energy control logic module 104 is mainly configured to control energy, and adjust the current volume level according to the detection results of the howling frequency point detection module and the human voice and music detection module, so as to achieve fast suppression of howling and fast recovery of volume.
It should be noted that the structure of the howling suppression electronic device described above is merely an example, and in practical applications, the howling suppression electronic device may further include other modules (such as an audio receiving module) and have other functions, which is not limited in this embodiment of the present application.
In order to more clearly describe the audio processing method based on howling detection provided by the embodiment of the present application, an implementation process of the audio processing method based on howling detection provided by the embodiment of the present application is described below with reference to the accompanying drawings.
Exemplarily, as shown in fig. 2, a schematic flowchart of an audio processing method based on howling detection provided by an embodiment of the present application is shown. The execution subject of the method is an electronic device (the howling suppression electronic device shown in fig. 1 above), and the process may include the following steps:
s201, acquiring an audio signal, and acquiring an audio frame of a frequency domain space according to the audio signal, wherein the audio frame comprises a plurality of frequency points.
In some embodiments, the specific process of acquiring the audio signal by the electronic device and acquiring the audio frame of the frequency domain space according to the audio signal may include: the electronic device can receive the audio signal through an audio acquisition module such as a microphone and store the audio signal into a buffer area (such as a buffer); then, the electronic device may obtain an audio signal (such as a currently latest stored audio signal) from the buffer region, and perform framing processing on the audio signal according to a preset sampling rate and a sampling number corresponding to the length of an audio frame to obtain an audio frame; then, windowing the audio frame according to the window function to obtain a windowed audio frame; and then, carrying out Fourier transform on the audio frame subjected to windowing processing to obtain an audio frame of a frequency domain space.
It should be noted that, a frequency point in the embodiment of the present application may refer to a center frequency corresponding to a frequency band, and one frequency point may be used to represent one frequency band. In addition, each frequency point has a corresponding power spectrum energy, and thus the meaning of the frequency point in the embodiment of the present application may also include its corresponding power spectrum energy.
In some embodiments, the electronic device may further obtain the total energy and the average energy corresponding to the audio frame according to the frequency points in the audio frame.
The specific implementation manner of the process may refer to the following description in the embodiment of fig. 3, which is not described in detail here.
S202, howling frequency point screening is carried out on the plurality of frequency points, and corresponding screening results are obtained.
In some embodiments, the process of performing howling frequency point screening on a plurality of frequency points may include: acquiring power spectrum energy corresponding to a plurality of frequency points according to the audio frame; selecting at least one first candidate frequency point with the highest power spectrum energy from a plurality of frequency points; howling frequency points are screened from the at least one candidate frequency point according to a peak-to-threshold power ratio (PTPR), a peak-to-average power ratio (PAPR), a peak-to-harmonic power ratio (PHPR), and a short time autocorrelation function (STACF).
For example, the process of screening the howling frequency point from the at least one candidate frequency point according to the peak threshold power ratio PTPR, the peak-to-average power ratio PAPR, the peak harmonic power ratio PHPR, and the short-time autocorrelation function STACF may include: carrying out primary screening on the first candidate frequency point by utilizing a peak threshold power ratio criterion PTPR and a peak-to-average power ratio criterion PAPR to obtain a second candidate frequency point; carrying out secondary screening on the second candidate frequency point by utilizing a peak harmonic power ratio PHPR to obtain a third candidate frequency point; and screening the third candidate frequency point for three times by using a short-time autocorrelation function STACF to obtain the howling frequency point.
It should be noted that, the execution sequence of the criteria for screening the howling frequency points is only an example, and in practical applications, the specific execution sequence of each screening is not limited, for example, the screening criteria for one screening are not limited to PAPR and PTPR, but may also be PHPR criteria or STACF criteria, etc. In addition, in practical applications, the combination of the criteria for screening the howling frequency points may be other combinations, for example, two of the above-mentioned screening criteria are used to determine the howling frequency points, and the embodiment of the present application is not limited thereto.
The specific process of performing howling frequency point screening on multiple frequency points by using a multi-layer algorithm in this step will be described in more detail in the embodiment of fig. 4 below, and will not be described in detail here.
S203, performing target sound detection on the audio signal, and acquiring a corresponding detection result, where the detection result is used to indicate whether a target sound exists in the audio frame, and the target sound includes human sound and/or music sound.
In some embodiments, the process of performing target sound detection on the audio signal and obtaining a corresponding detection result may include: when a plurality of frequency points are screened, the number of harmonic waves in an audio frame is counted; and acquiring a detection result of the target sound according to the power spectrum energy and the harmonic quantity corresponding to the multiple frequency points.
Further, the process of acquiring, by the electronic device, the detection result of the target sound according to the power spectrum energy and the number of harmonics corresponding to the multiple frequency points may include: calculating identification characteristics corresponding to the target sound according to the power spectrum energy corresponding to the multiple frequency points, wherein the identification characteristics comprise at least one of harmonicity characteristics, spectrum flux characteristics and sub-band energy ratio balance characteristics; performing historical smoothing processing on the identification features to obtain target identification features; and acquiring a detection result of the target sound according to the target identification feature and the harmonic quantity.
The specific implementation process of obtaining the target identification feature and obtaining the detection result of the target sound according to the target identification feature and the number of harmonics may refer to the following description in the embodiment of fig. 5, which will not be described in detail here.
S204, dynamically adjusting the volume level of the audio frame according to the screening result and the detection result, wherein when the screening result indicates that the howling frequency point exists, the method comprises the following steps: if the detection result indicates that the target sound exists, the volume of the audio frame is adjusted downwards according to a preset volume down-adjustment level, and the steps are repeated until the volume reaches a preset first volume threshold value; or if the detection result indicates that the target sound does not exist, the volume of the audio frame is adjusted downwards according to a preset volume adjustment level, and the steps are repeated until the volume reaches a preset second volume threshold, wherein the volume corresponding to the second volume threshold is lower than the volume corresponding to the first volume threshold.
The step of repeating may specifically be to repeat the steps S201 to S203. Specifically, an audio signal is repeatedly acquired, and an audio frame of a frequency domain space is acquired according to the audio signal, wherein the audio frame comprises a plurality of frequency points; carrying out howling frequency point screening on the plurality of frequency points, and acquiring corresponding screening results; detecting a target sound of the audio signal, and acquiring a corresponding detection result, wherein the detection result is used for indicating whether the target sound exists in the audio frame, and the target sound comprises a human sound and/or a music sound; and dynamically adjusting the volume level of the audio frame according to the screening result and the detection result. That is, the process of the howling frequency point screening and target sound detection for the audio signal of the next round is performed again.
In some embodiments, the electronic device adjusting the volume level of the audio frame according to the pre-acquired detection result and the filtering result of the target sound may include: under the condition that the screening result indicates that the howling frequency point exists, if the detection result indicates that the target sound exists, adjusting down the volume of the audio frame according to a preset volume adjustment level, and repeating the steps S201 to S203 until the volume reaches a preset first volume threshold; or, if the detection result indicates that the target sound does not exist, the volume of the audio frame is adjusted downward according to a preset volume down level, and the steps S201 to S203 are repeated until the volume reaches a preset second volume threshold, where the volume corresponding to the second volume threshold is lower than the volume corresponding to the first volume threshold.
In some embodiments, the preset volume down level may correspond to a certain decibel number, such as 5 decibels, 10 decibels, and the like. The volume down-regulation level can be flexibly set according to needs, and the embodiment of the application does not limit the volume down-regulation level.
In some embodiments, the first volume threshold may be the lowest volume that can be turned down when there is a howling frequency point and there is a target sound; the second volume threshold may be the lowest volume that can be turned down when there is a howling frequency point but there is no target sound. In order to highlight the target sound, the first volume threshold may be higher than the second volume threshold. For example, in the case of an audio frame having a howling frequency point, if there is no target sound, the lowest sound volume capable of being adjusted downward (second sound volume threshold) may be set to 40 db in order to suppress howling, and if there is a target sound, although howling needs to be suppressed, the lowest sound volume capable of being adjusted downward is limited in order to guarantee sound quality, for example, the lowest sound volume (first sound volume threshold) may be limited to 50 db, where the first sound volume threshold corresponds to 50 db and is 40 db higher than the lowest sound volume capable of being adjusted downward (second sound volume threshold) when there is no target sound.
It should be understood that specific values of the first volume threshold and the second volume threshold may be flexibly set as needed, for example, the first volume threshold may be set to a volume corresponding to complete howling suppression, the second volume threshold may be set to a volume corresponding to partial howling suppression, and the like, which is not limited in this embodiment of the application.
It should be understood that in the method of this implementation manner, a stepwise volume adjustment manner may be adopted, specifically, when each round of howling is detected, the volume level may be decreased to attenuate the volume by a certain decibel number, and since the howling may exist periodically and continuously, if the howling is detected continuously, the volume level may also be continuously decreased, so that the volume may be decreased rapidly until the set lowest volume disappears the howling suppression.
The volume adjusting mode has small and stable calculation amount, does not need filtering operation, and does not increase suddenly because of the increase of howling frequency.
In other embodiments, when the filtering result indicates that there is no howling frequency point, the method further includes: adjusting the volume of the audio frame according to a preset volume up level, and repeating the steps S201 to S203 until the volume reaches a preset third volume threshold; alternatively, the volume of the audio frame is directly adjusted up to the third volume threshold.
The third volume threshold may be the highest volume that is preset and can be adjusted up, and the howling frequency point does not exist.
For example, the volume corresponding to the third volume threshold may be the highest volume preset by the user, or may be another volume suitable for the user to listen to the audio and the like set initially (or by default). The specific value (decibel) corresponding to the third volume threshold may be flexibly set according to the needs, which is not limited in the embodiment of the present application.
In some embodiments, the preset volume up level may correspond to a certain decibel number, such as 5 decibels, 10 decibels, and the like. The volume up-regulation level can be flexibly set in advance according to needs, and the embodiment of the application does not limit the volume up-regulation level.
It should be understood that when the filtering result indicates that there is no howling frequency point, the volume of the audio frame may be adjusted upwards, wherein the manner of adjusting the volume of the audio frame upwards is various, including: gradually increasing the volume of the audio frame based on the detection result of the target sound, as mentioned above, according to the preset volume increase level, more specifically, one (or more) volume increase levels can be increased in each round of volume adjustment; or, the volume of the audio frame is directly adjusted up to the corresponding preset volume without using a mode of gradually adjusting the volume up.
It should also be understood that the method provided by this implementation manner can perform targeted adjustment on howling suppression when it is recognized that the target sound exists in the audio frame (for example, when there is human voice and/or musical sound, the maximum volume attenuation degree is limited, or when there is human voice and/or musical sound, the audio level is adjusted to directly recover the normal volume, etc.), so as to avoid loss of important information and ensure the sound quality.
It should be noted that, the howling frequency point screening process in the embodiment of the present application may be performed in a loop. For example, the electronic device may perform a periodic timing operation by using a timer, where the timing operation includes a timing start and a timing end, and during each timing cycle, the electronic device may perform the filtering of the howling frequency point and the detection of the target sound according to the above process; when each time counting is finished, the screening result and the detection result corresponding to the time counting period can be obtained.
In some embodiments, adjusting the volume level of the audio frame according to the filtering result and the detection result may include: and after each time of timing is finished, adjusting the volume level of the audio frame by one volume level according to the screening result and the detection result. For example, after the timing is finished, if the screening result corresponding to the timing period indicates that the howling frequency point exists and the detection result indicates that the target sound does not exist, the volume level of the audio frame is adjusted downwards by one volume adjustment level; or after the timing is finished, when the detection result indicates that the target sound exists, the volume level of the audio frame is adjusted up by one volume up-adjustment level.
In some embodiments, after the volume level of the audio frame is adjusted, a timer may be reset for a new round of howling screening and target sound identification.
The specific implementation manner of this process can be referred to in the embodiments of fig. 4 and fig. 6, and will not be described in detail here.
According to the audio processing method based on howling detection provided by the implementation mode, the target sound is detected simultaneously in the process of screening the howling frequency points, and when the howling frequency points exist, the volume is dynamically adjusted based on the detection result of the target sound, and the volume is adjusted to the lowest volume level corresponding to the detection result of the target sound, so that howling can be effectively and flexibly suppressed, effective sound information is not lost, the tone quality is guaranteed, and the user experience is improved.
The audio processing method based on howling detection provided by the embodiment of the application has wide universality, is still effective even under the condition of larger integral loop gain, and has the effect not limited by the number of howling frequencies and the effective frequency range; the frequency bands can be divided at will, different parameters or threshold values are set aiming at different frequency bands to adapt to different scenes, and more efficient detection, suppression and the like can be specially carried out aiming at the frequency bands which are easy to squeal and are in fixed scenes; the method is also suitable for network communication conditions and is also effective for periodic howling.
Fig. 3 is a schematic flowchart illustrating processing of an audio signal according to an embodiment of the present application. The process is used for carrying out frequency domain conversion on a time domain audio signal received by the electronic equipment and calculating total energy and average energy corresponding to an audio frame in a frequency domain space. The process may be performed by an electronic device, such as an audio signal receiving module (not shown in fig. 1) and a frequency domain transformation module in the electronic device, and specifically may include the following steps:
s301, receiving an audio signal.
Wherein, the electronic device can receive the audio signal through an audio receiving module (such as a corresponding microphone).
S302, the audio signal is buffered to a buffer and is divided into frames, and audio frames are obtained.
In some embodiments, the electronic device may buffer the acquired audio signal to a buffer. Then, the electronic device may obtain the audio frame from the buffered audio signal of the buffer according to a preset sampling rate, so as to process the taken audio frame in the following. For example, assuming that a frame length of one audio frame is N, when performing framing based on an audio signal, the sampling frequency may be set to 8000Hz, and the frame length N may be 1024 sampling points, that is, one audio frame includes 1024 sampling points.
For example, the electronic device may process the current audio signal simultaneously with the acquisition of other audio signals. For example, the audio signal may be processed in real time and synchronously in a public address scene to obtain an audio frame.
S303, perform windowing on the audio frame signal.
It should be noted that before performing the discrete fourier transform on the audio signal, the obtained audio frame needs to be windowed, i.e. multiplied by a window function. The purpose of windowing is to make the amplitude of a frame of audio signal after transformation to frequency domain space gradually change to 0 at both ends, so as to obtain more easily analyzed discrete fourier transform data.
In some embodiments, the electronic device may perform a windowing operation on the audio frame obtained in step S302, or perform a windowing operation after the frame division (audio frame) obtained in step S302 is truncated.
Wherein the windowing operation can be performed by the following formula (1-1):
y[i]=x[i]×w[i],0≤i≤N-1 (1-1)
wherein x [ i ] represents the ith sample point of the audio frame before windowing; w [ i ] represents the ith sample point of the window function; y [ i ] represents the ith sample point of the windowed audio frame; n denotes a frame length.
It should be noted that the window function used in the windowing operation in the embodiment of the present application is not limited, and may be implemented by using various existing window functions, such as hamming window, hanning window, and kaiser window. Illustratively, when a Hanning window is employed, its expression may be as shown in equations (1-2):
Figure BDA0003499732270000101
wherein w [ i ] represents the ith sample point of the window function; n denotes a frame length.
S304, carrying out Fourier transform on the audio frame signal after windowing to obtain the audio frame of the frequency domain space.
In some embodiments, the windowed audio frame is fourier transformed to obtain an audio signal in frequency domain space. Illustratively, the electronic device may perform a fast fourier transform using an acceleration algorithm using the following equations (1-3):
Figure BDA0003499732270000102
wherein X [ k ] represents an energy value of a k-th frequency point of the audio frame spectrum; i is the ith sampling point; and N is the frame length.
S305, calculating a corresponding amplitude spectrum and a corresponding power spectrum according to the audio frame of the frequency domain space.
Here, the magnitude spectrum may be a magnitude spectrum energy, and the power spectrum may be a power spectrum energy.
In some embodiments, the process of calculating a corresponding magnitude spectrum of an audio frame in frequency domain space may include: and taking a module value of the frequency spectrum complex sequence to obtain the corresponding amplitude spectrum energy of the audio frame in the frequency domain space. Illustratively, the magnitude spectrum may be calculated, for example, using the following equations (1-4):
Y[k]=|X[k]|,0≤k≤P-1 (1-4)
where P is the number of frequency points in the audio frame.
In some embodiments, the process of calculating a corresponding magnitude spectrum of an audio frame in frequency domain space may include: after the amplitude spectrum is obtained by calculation according to the mode, the square of the amplitude spectrum is calculated to obtain a corresponding power spectrum. Illustratively, the power spectrum may be calculated, for example, using the following equations (1-5):
Z[k]=|X[k]|2,0≤k≤P-1 (1-5)
wherein Y [ k ] represents the amplitude spectrum of the kth frequency point of the amplitude spectrum of the audio frame; z [ k ] represents the power spectrum energy value of the k-th frequency point of the audio frame power spectrum; p represents the number of frequency points in the audio frame.
Optionally, after obtaining the audio frame of the frequency domain space, the total energy and the average energy corresponding to the audio frame may also be calculated, which may specifically be implemented by the following steps:
and S306, counting the total energy and the average energy corresponding to the audio frame according to the amplitude spectrum and the power spectrum.
In some embodiments, the electronic device may count the total energy and the average energy in the frequency domain corresponding to the audio frame according to the amplitude spectrum and the power spectrum calculated in step S305. Specifically, the total energy in the frequency domain may be calculated according to the following formulas (1-6), and the average energy corresponding to the audio frame may be calculated according to the following formulas (1-7):
Figure BDA0003499732270000111
Figure BDA0003499732270000112
wherein E represents the frequency domain total energy corresponding to the audio frame; z [ k ]]Representing a power value corresponding to a kth frequency point of the audio frame power spectrum; a. theveRepresenting the average energy corresponding to the audio frame; p is the number of frequency points in the audio frame.
It should be noted that, according to the processes shown in the foregoing step S301 to step S306, the electronic device may sequentially process a plurality of audio frames to obtain total energy of frequency domain space corresponding to the plurality of audio frames and average energy corresponding to each audio frame.
According to the audio processing method based on howling detection provided by the embodiment of the application, the audio signal in the time domain space is subjected to framing, windowing and Fourier transformation and is transformed into the frequency domain space, and then howling frequency points are screened by using multiple criteria, so that the accuracy of screening the howling frequency points can be improved, howling can be inhibited accurately and efficiently, and user experience is improved.
For convenience of understanding, an exemplary process of how to perform howling frequency point screening in the audio processing method based on howling frequency point detection provided by the embodiment of the present application is described below with reference to fig. 4.
Exemplarily, as shown in fig. 4, a schematic flowchart of another audio processing method based on howling detection provided by an embodiment of the present application is shown. The process may be executed by a howling suppression electronic device (e.g., the howling suppression electronic device shown in fig. 1, hereinafter referred to as an electronic device), and specifically may include the following steps:
s401, receiving an audio signal, and acquiring audio energy corresponding to an audio frame in a frequency domain space based on the audio signal.
The audio energy of the audio frame corresponding to the frequency domain space may include distribution of each frequency point in the audio frame in the frequency domain space, that is, distribution of the frequency domain energy of the audio frame corresponding to the frequency domain space. The audio energy may include total energy in the frequency domain and average energy corresponding to the audio frame, where the average energy corresponding to the audio frame is the total energy in the frequency domain of the audio frame divided by the number of frequency points.
It should be noted that the frequency point referred to in the embodiments of the present application may also be used to refer to energy, for example, the frequency point (or the energy of the frequency point) in the following embodiments may be used to represent the energy of the power spectrum corresponding to the frequency point. It should be understood that each frame of audio may correspond to multiple frequency bands (characterized by frequency points), each frequency point corresponding to a power spectral energy.
In some embodiments, after receiving an audio signal, the electronic device buffers and frames the audio signal, converts the audio frame from a time domain to a frequency domain by using discrete fourier transform, acquires each frequency point included in the audio frame, and may calculate the total energy and the average energy corresponding to the frequency-domain spatial audio frame according to the power spectrum energy corresponding to each frequency point. The specific implementation of the calculation process may be specifically described in the embodiment of fig. 3 above, and is not described here again.
S402, according to the audio energy, selecting frequency points corresponding to the first n highest audio energies as first candidate frequency points, wherein n is an integer greater than or equal to 1.
Here, the frequency points corresponding to the n highest audio energies correspond to at least one first candidate frequency point in the foregoing.
It should be noted that, if the current audio frame is a mute frame, the audio frame has no howling, and at this time, howling frequency point detection and howling suppression do not need to be performed. Therefore, the embodiments of the present application are mainly described in the context of a current audio frame being a non-silent frame.
The first candidate frequency point in the embodiment of the present application may be at least one frequency point with the largest energy selected from the frequency points of the audio frame based on an energy value (e.g., power spectrum energy).
Specifically, when the audio frame is an un-mute frame, the first n frequency points with the largest energy may be selected from all the frequency points of the audio frame as the first candidate frequency points according to the energy corresponding to each frequency point in the audio frame. That is, according to the energy corresponding to the frequency points, a specific number of frequency points are selected as the first candidate frequency points. For example, in one implementation, the number n of first candidate frequency points may be preset, and then n frequency points with the largest energy may be selected from the frequency points of the audio frame.
Alternatively, in other embodiments, the first candidate frequency points may be selected according to the energy of the frequency points, instead of the predetermined number. For example, frequency points with energy greater than a first threshold may be selected from the audio frame as first candidate frequency points, that is, frequency points with energy greater than the first threshold are all selected as the first candidate frequency points. The first threshold may be flexibly set as needed, which is not limited in the embodiment of the present application.
S403, the first candidate frequency point is subjected to primary screening by using a peak threshold power ratio criterion PTPR and a peak-to-average power ratio criterion PAPR to obtain a second candidate frequency point.
In some embodiments, after the first candidate frequency point is obtained, a first filtering of the howling frequency point may be performed by using a peak-to-threshold power ratio (PTPR) criterion and a peak-to-average power ratio (PAPR) criterion to obtain a second candidate frequency point.
Specifically, the PTPR criterion relates to the formula shown in (1-8), and the PAPR criterion relates to the formula shown in (1-9):
PTPR:Z[k]<thres1 (1-8)
PAPR:Z[k]<thres2×Ave (1-9)
where Ave represents the average energy of the audio frame; thres1 and thres2 respectively represent preset thresholds, and specific values thereof can be flexibly set according to needs, which is not limited in the present application.
S404, secondary screening is carried out on the second candidate frequency point by utilizing the peak harmonic power ratio PHPR, and a third candidate frequency point is obtained.
In some embodiments, after the second candidate frequency point is obtained, the candidate frequency points may be subjected to a second screening using a peak harmonic power ratio PHPR criterion to obtain a third candidate frequency point.
Specifically, the PHPR criterion relates to the formula shown in (1-10):
PHPR:Z[2k]>thres3×Z[k]and Z[3k]>thres4×Z[k] (1-10)
wherein, thres3 and thres4 respectively represent preset thresholds, and specific values thereof can be flexibly set according to needs, which is not limited in the present application.
In some embodiments, the harmonic number of the frequency points meeting the screening condition can be counted and history smoothing is performed. Specifically, the history smoothing process may be performed by using the following formula (1-11):
acc_har_num[n+1]=a×acc_har_num[n]+(1-a)×har_num (1-11)
wherein har _ num represents the statistical harmonic number, and acc _ har _ num [ n ] represents the history smoothing value; acc _ har _ num [ n +1] represents the updated smoothed value; a is a constant, and may be, for example, 0.90 to 0.99.
S405, a third candidate frequency point is screened for three times by using a short-time autocorrelation function STACF, and a howling frequency point is obtained.
In some embodiments, after the third candidate frequency point is obtained, three times of screening may be performed on the candidate frequency point by using the STACF short-time autocorrelation function, where the screening conditions are the following formulas (1-12):
STACF:max{ρ[k][t],delay_min≤t≤delay_max}<thres5 (1-12)
wherein max { } denotes a maximum function; rho [ k ] [ t ] represents a normalized autocorrelation value; delay _ min and delay _ max respectively represent the preset maximum delay frame number of the system; t represents the number of autocorrelation time shift frames; thres5 represents a preset threshold, and the specific value thereof can be flexibly set according to the needs, which is not limited in this application.
Specifically, ρ [ k ] [ t ] in the above formula (1-12) can be obtained by calculation according to the following formulas (1-13) to (1-16):
Figure BDA0003499732270000131
Figure BDA0003499732270000132
Figure BDA0003499732270000133
Figure BDA0003499732270000134
wherein, in the above-mentioned (1-13) to the formula (1-16), L represents the maximum tracking frame number, Mean [ k ]]Represents the historical amplitude spectral mean of the kth frequency point, Y [ k ]][l]Represents the amplitude spectrum value of the kth frequency point of the first frame,
Figure BDA0003499732270000135
representing zero-averaged amplitude spectrum values, R k][t]Representing a short-time autocorrelation function.
In some embodiments, the zero-averaged amplitude spectrum values may take an accelerated form, such as performing binarization, and the like.
In some embodiments, after the three filters, the frequency point finally reserved in the filter result is used as the howling frequency point.
It should be noted that the execution sequence of the above steps S203 to S205 is only an example, and in practical applications, the specific implementation sequence of the above steps is not limited, for example, the filtering criteria based on one filtering is not limited to PAPR and PTPR, but may also be PHPR criteria and/or STACF criteria, and the like, and is not limited to sequentially executing the filtering criteria according to the sequence illustrated above. In addition, in practical applications, the combination of the criteria for screening the howling frequency points may be other combinations, for example, two of the above-mentioned screening criteria are used to determine the howling frequency points, and the embodiment of the present application is not limited thereto.
Optionally, in some embodiments, after the howling frequency point is screened out, howling suppression may be performed according to the screening result. The howling suppression process may include the steps of:
and S406, after the howling frequency point is screened out, gradually reducing the current volume of the audio signal according to a preset volume reduction level.
In some embodiments, the process of screening the howling frequency points may be performed periodically. For example, the electronic device may perform a periodic timing operation (i.e., perform a circular timing) by using a timer, where the timely operation includes a timely start and a timely end; if the howling frequency point is screened out after one time of timing is finished, the volume can be adjusted down according to the preset volume down-adjustment level, and howling suppression is carried out; if the howling frequency point is not screened after one time of timing is finished, the volume can be adjusted according to the preset volume up-regulation level, and the audio playing effect is improved.
In some embodiments, a plurality of volume down levels and volume up levels may be preset, for example, each volume down level and each volume up level may correspond to a certain decibel, such as 5 decibels, 10 decibels, and the like. The volume down level corresponding to the volume down adjustment and the volume up level corresponding to the volume up adjustment may be the same or different, and the embodiment of the present application does not limit this.
In some embodiments, if the howling frequency points are screened out in the multi-round howling frequency point screening process, the volume may be adjusted down step by step according to a preset volume adjustment level. After the volume down-regulation level is reduced, the timer can be reset to carry out another round of timing, and the howling frequency point screening process of the next round is carried out; if the howling frequency point is still screened after the next round of screening is finished, one volume down-regulation level can be continuously reduced; and then, repeating the above process until the true volume of the audio reaches a preset first volume threshold, or until the screening result indicates that no howling frequency point exists.
In some embodiments, when the howling frequency point is not screened through the above screening process, the volume may be gradually adjusted according to a preset volume adjustment level. After the volume up-regulation level is increased, the timer can be reset to carry out another round of timing, and the squealing frequency point screening process of the next round is carried out; if the howling frequency point is not screened after the next round of screening is finished, continuously increasing a volume up-regulation level; and then, repeating the process until the volume level reaches a preset third volume threshold value, or until a howling frequency point is detected. The volume corresponding to the preset third volume threshold may be a preset maximum volume, or may be another volume that is initially (or by default) set and is suitable for the user to listen to the audio. The specific value (decibel) corresponding to the third volume threshold may be flexibly set according to the needs, which is not limited in the embodiment of the present application.
Optionally, after the volume level is adjusted up, the howling frequency point is screened out in the next round of screening process, the volume may be adjusted down according to a preset volume level, and the specific process may refer to related descriptions in other embodiments of the present application and is not described herein again.
It should be noted that, the filtering result of the howling frequency point is taken as a basis for performing the volume adjustment, which is only one possible example provided in the embodiment of the present application, but in other embodiments, the filtering result may be taken in combination with other detection results (e.g., the detection result of the target sound) as a basis for performing the volume adjustment.
According to the audio processing method based on howling detection provided by the embodiment of the application, the howling frequency points are screened by using multiple criteria, so that the accuracy of howling frequency point detection is improved, and the problems of false detection, missing detection and the like are reduced.
It should be noted that, in the audio processing method based on howling detection provided in the embodiment of the present application, while filtering the howling frequency points, the number of harmonics corresponding to an audio frame may be counted, and then identification features such as a harmonic degree, a spectral flux, a subband energy ratio equalization degree feature and the like are calculated, and the identification features are used to identify human voice, music voice and the like. The following describes a process of recognizing human voice and/or musical sound in an audio processing method based on howling detection according to an embodiment of the present application, with reference to the accompanying drawings.
With reference to the drawings, a specific process of how to perform target sound detection in the audio processing method based on howling detection provided in the embodiment of the present application is described below.
Exemplarily, as shown in fig. 5, a schematic flowchart of yet another audio processing method based on howling detection provided by the embodiment of the present application is shown. The process may include the steps of:
s501, calculating the identification characteristics corresponding to the target sound according to the average energy corresponding to the audio frame.
The target sound includes a human voice and/or a musical sound.
The average energy corresponding to an audio frame is the frequency domain total energy of the audio frame divided by the number of frequency points. The identifying features herein may include: harmonicity characteristics, spectral flux characteristics, sub-band energy ratio equalization characteristics, and the like can be used for voice recognition.
In some embodiments, the harmonicity feature may be calculated using equations (1-17) and equations (1-18) below:
Figure BDA0003499732270000141
Figure BDA0003499732270000142
wherein h represents a harmonicity value (harmony); r [ k ] represents the power ratio of the frequency point; floor () represents rounding down; i represents the ith sample point in the audio frame; j represents the jth frequency point; p represents the number of frequency points.
In some embodiments, the spectral flux signature may be calculated using the following equations (1-19):
Figure BDA0003499732270000143
wherein f represents a spectral flux value (spectral flux); m represents the statistical window length; p represents the number of frequency points; i represents the ith frequency point in the audio frame; j denotes the jth frequency point.
In some embodiments, the equalization degree of the subband energy ratio of the current frame may be calculated using the following equations (1-20). Freely dividing corresponding sub-bands, such as dividing K sub-bands, and calculating the energy ratio of each sub-band:
Figure BDA0003499732270000151
wherein, sub _ r [ i ] represents the energy ratio of the ith sub-band, and sub _ e [ i ] represents the total energy of the ith sub-band. Sub _ e [ i ] in this formula (1-20) can be calculated according to the following formula (1-21):
Figure BDA0003499732270000152
wherein, sub _ r [ i ] represents the energy ratio of the ith sub-band, sub _ e [ i ] represents the total energy of the ith sub-band, and kmin _ i represents the lower frequency limit of the ith sub-band; kmax _ i represents the upper frequency limit of the ith sub-band.
In some embodiments, the degree of equalization may be measured numerically using flatness characteristics or standard deviation, for example, the flatness characteristics may be calculated by the following equation (1-22):
Figure BDA0003499732270000153
the standard deviation can be calculated by the following equations (1-23):
Figure BDA0003499732270000154
wherein std represents a standard deviation; sub _ r [ i ] represents the energy ratio of the ith sub-band; sub _ r _ mean represents the mean of the sub-band energy ratios. The sub _ r _ mean in this equation (1-23) can be calculated according to the following equation (1-24):
Figure BDA0003499732270000155
wherein, sub _ r _ mean represents the average value of the energy ratio of the sub-band; sub _ r [ i ] represents the energy fraction of the ith sub-band.
And S502, performing history smoothing processing on the identification features corresponding to the target sound to obtain the target identification features.
In some embodiments, after the identification features (the harmonicity feature, the spectral flux feature, and the sub-band energy-to-energy ratio balance feature) are obtained through the above step S401, the identification features may be further subjected to history smoothing processing, and the target identification features after the history smoothing processing are obtained. Specifically, the formula of the history smoothing process is shown as (1-25):
y[n+1]=a×y[n]+(1-a)×x (1-25)
wherein x represents a feature value of the current frame; y [ n ] represents a historical smoothed value; y [ n +1] represents the updated smoothed value; a is usually 0.90 to 0.99.
S503, judging whether the target sound exists in the audio frame according to the target identification characteristics.
In some embodiments, after obtaining the target identification feature, the electronic device may detect whether human voice and/or music voice is present in the audio frame according to the target identification feature and the number of harmonics. The harmonic quantity can be counted in the process of screening whether the audio frame has the howling frequency point.
For example, the process of determining whether there is human voice or music in the current frame according to the smoothed feature value and the number of harmonics counted in the howling frequency point detection module may include:
first, the existence level is preset for the human voice and/or the music voice, for example, the initial existence level is set to 0, the lowest existence level is set to 0, and the highest existence level is set to 3, wherein the higher existence level means that the probability of the human voice and/or the music voice existing is higher, that is, the more protection is needed to be performed on the sound in the audio, and thus the existence level can also be described as the protection level.
Then, obtaining a comparison result between each target characteristic value and a corresponding preset threshold, and determining a protection level (or a level of existence) according to the comparison result, for example, if the harmonic value is greater than a preset threshold thres6, it can be considered that the harmonic is rich, and the probability of existence of human voice and music is high, the protection level is increased by 1; if the counted number of the harmonic waves is larger than a preset threshold value thres7, the harmonic waves are considered to be rich, and the probability of existence of voice and music is high, the protection level is increased by 1; if the frequency spectrum flux is larger than a preset threshold thres8, the probability of howling is considered to be high, and the protection level is decreased by 1; if the balance degree of the sub-band energy ratio meets the condition (the flatness is greater than the preset threshold, and the standard deviation is smaller than the preset threshold), and the preset threshold is thres9, the power spectrum energy distribution is considered to be uniform, the probability of music existence is high, and the protection level is increased by 1.
Then, whether the audio frame has human voice and/or music voice is determined according to the protection level (or existence level). For example, history smoothing is performed on the final protection level result, while the protection level is always limited within the allowable range, and if the protection level is greater than the preset threshold thres10, it is considered that human voice or music exists.
In the embodiment of the present application, the above-described identification features are only exemplified as the harmonicity feature, the spectral flux feature, and the subband energy ratio equalization feature, but the present application is not limited thereto. For example, in practical applications, other identification features such as spectral centroid features and zero-crossing rate frame difference features may also be used to detect whether human voice and/or music voice exist in the audio frame. For another example, in practical applications, the audio processing method based on howling detection provided by the embodiment of the present application may also support machine learning or deep learning methods such as a vector machine, a decision tree, and a neural network to detect whether human voice and/or music voice exist in an audio frame. The embodiments of the present application do not limit this.
According to the audio processing method based on howling detection provided by the embodiment of the application, the human voice and/or the music voice in the audio frame are/is identified in the process of detecting the howling frequency point, so that the subsequent flexible volume adjustment can be conveniently carried out according to the corresponding identification result, the howling is inhibited, the human voice and/or the music voice are/is protected, the loss of important information is avoided, and the tone quality is ensured.
The following describes how to perform the volume adjustment process according to the detection result of the target sound and the howling frequency point screening result, with reference to the accompanying drawings.
Exemplarily, as shown in fig. 6, a schematic flowchart of yet another audio processing method based on howling detection provided by the embodiment of the present application is shown. The process comprises the following steps:
s601, obtaining a screening result of the howling frequency point, wherein when the screening result indicates that the howling frequency point exists in a preset time, historical statistical information of the howling frequency point is updated.
The preset time duration may correspond to a time duration from the beginning of the timer to the end of the timer, and each preset time duration may be used to perform each round of howling frequency point screening, target sound detection, and a process of dynamic adjustment based on the screening result (of the howling frequency point) and the detection result (of the target sound).
Specifically, the electronic device may set a preset duration, such as a periodic preset duration, and the electronic device may perform the filtering of the howling frequency point within each preset duration. Specifically, the electronic device may perform cyclic timing by setting a timer, start a round of howling frequency point screening every time the timing is started, and obtain a corresponding screening result when the timing of the timer is finished. Wherein, the screening result may indicate that there is a howling frequency point or may indicate that there is no howling frequency point.
In some embodiments, when the filtering result indicates that there is a howling frequency point, the filtering result may be recorded in the howling frequency point history statistical information, for example, the number of howling frequency points in the howling frequency point history statistical information is increased by 1, and the identification information corresponding to the howling frequency point is recorded.
S602, obtaining the detection result of the target sound, and dynamically adjusting the volume of the audio frame according to the detection result and the screening result.
It should be noted that whether there is a howling frequency point in the audio frame may correspond to different volume adjustment modes (volume up or volume down). For example, when the filtering result indicates that there is a howling frequency point, the volume may be adjusted down according to a preset volume adjustment level; if the screening result indicates that there is no howling frequency point, the volume can be adjusted up according to a preset volume up-regulation level.
For example, in the case that a howling frequency point exists, if the detection result indicates that the target sound exists, the amplitude of the sound volume attenuation may be limited, and the corresponding lowest sound volume capable of being turned down is recorded as a first sound volume threshold, and if the detection result indicates that the target sound does not exist, the corresponding lowest sound volume capable of being turned down is recorded as a second sound volume threshold, and the first sound volume threshold may be higher than the second sound volume threshold, so that while the howling is suppressed, sound information is also guaranteed not to be lost.
For another example, in the case that there is no howling frequency point, if the detection result indicates that the target sound exists in the audio frame, the volume may be adjusted up according to a preset volume level, and the highest volume that can be adjusted up at this time is recorded as a third volume threshold; the third volume threshold may be higher than the first volume threshold and the second volume threshold such that the sound quality of the audio is improved when howling is not suppressed.
It should be understood that, by means of the volume adjustment method, when the target sound exists in the audio frame, the maximum attenuation degree of the volume can be limited, and loss of important information caused by too much attenuation of the volume can be avoided.
Optionally, both the volume attenuation process and the volume recovery process may be performed within a preset timing duration, and the volume down and the volume recovery process may employ non-equal timing.
The volume attenuation includes, but is not limited to, performing amplitude scaling and the like on the time domain waveform signal, and may be accelerated by shifting and the like. The volume attenuation adopts step-type adjustment.
In some embodiments, in the case of turning down the volume, the timer may be reset for another round of timing every time when the volume level is lowered, and if there is still a howling frequency point after the round of timing is finished, the volume level is lowered continuously. Thereafter, the process may be repeated until it is detected that there is no howling frequency point, or the volume level reaches a preset minimum volume level.
In some embodiments, in the case of adjusting the volume up, the timer may be reset to perform another round of timing every time the volume level is adjusted up, and if no howling frequency point is detected after the round of timing is ended, the volume level may be continuously adjusted up. Thereafter, the process may be repeated until a point of howling frequency is detected, or the volume level reaches a preset maximum volume level, or the original (or default) set volume is restored. If the howling frequency point is screened out during the second detection after the volume level is adjusted upwards, the volume can be reduced step by step according to the preset volume level which is adjusted downwards.
It should be noted that, because of the adoption of the stepwise volume adjustment control, when howling is detected, the volume level is reduced to attenuate the volume by a certain decibel number, and because howling exists periodically and continuously until the howling exists continuously, if the howling is detected continuously, the volume level is also reduced continuously, so that the volume can be reduced rapidly until the set lowest volume is reached, and the howling is completely suppressed and disappears. Otherwise, when no howling is detected, the volume level is increased to restore the volume to a certain decibel number, and when the howling disappears, the volume level is continuously adjusted upwards until the normal volume is restored, so that the volume can be quickly restored after the howling disappears. And if the timer finishes timing, counting the voice and music detection result within the timing time, if voice or music exists, limiting the maximum volume attenuation degree, and if the protection level is higher, adjusting the volume level upwards so as to protect the voice and the music. The volume adjusting mode has small and stable calculation amount, does not need filtering operation, and does not increase suddenly because of the increase of howling frequency.
And S603, performing windowing overlap-add on the audio frames after the volume adjustment, storing the audio frames in a buffer area, and outputting corresponding audio signals.
In some embodiments, after the audio frame is adjusted in volume, the overlapped portion of the previous and next frames may be windowed and overlapped, and then placed in the output buffer, and then the audio data may be fetched from the output buffer for output.
According to the audio processing method based on howling detection provided by the embodiment of the application, howling frequency points are screened by using multiple basis criteria, and when the howling frequency points exist, the volume level is dynamically adjusted by using the screening result based on the howling frequency points, so that howling can be efficiently and flexibly suppressed on the basis of ensuring the tone quality, and the user experience is improved. The audio processing method based on howling detection provided by the embodiment of the application has wide universality, is still effective even under the condition of larger integral loop gain, and has the effect not limited by the number of howling frequencies and the effective frequency range; the frequency bands can be divided at will, different parameters or threshold values are set aiming at different frequency bands to adapt to different scenes, and more efficient detection, suppression and the like can be specially carried out aiming at the frequency bands which are easy to squeal and are in fixed scenes; the method is also suitable for network communication conditions and is also effective for periodic howling.
In addition, according to the audio processing method based on howling detection provided by the embodiment of the application, high accuracy of howling detection and less false detection can be realized. Even if the false detection is carried out, the negative influence of the false detection is very small, the false detection only brings small fluctuation of the volume, the human ear can not sense the fluctuation basically, and the normal volume can be recovered quickly; meanwhile, the voice and music can be protected, important information loss is avoided, and the tone quality is guaranteed; the method is small and stable in calculation amount, filtering operation is not needed, and the calculation amount is not increased suddenly due to the fact that howling frequency is increased.
Exemplarily, as shown in fig. 7, a schematic diagram of a hardware structure of an electronic device for suppressing howling provided in the embodiment of the present application is shown. The electronic device 700 may include at least one processor 701 and at least one memory 702 storing computer-readable program instructions, the at least one processor 701 and the at least one memory 702 may be communicatively coupled via a universal serial bus 703. When the processor 701 executes the computer-readable program instructions, the howling suppression electronic device 700 is caused to execute the audio processing method based on howling detection provided by the embodiment of the present application.
Based on the same technical concept, the embodiment of the present application also provides a computer-readable storage medium, which includes computer instructions, when the computer instructions are executed in a computer, the method is implemented.
Based on the same technical concept, the embodiment of the present application further provides a computer product, which includes computer instructions, and when the computer instructions are executed in a computer, the method is implemented.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer commands. The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program commands are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer commands may be stored in or transmitted through a computer-readable storage medium. The computer commands may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program that can be executed by associated hardware, the computer program can be stored in a computer-readable storage medium, and the processes when executed can include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. An audio processing method based on howling detection is applied to an electronic device, and the method comprises the following steps:
acquiring an audio signal, and acquiring an audio frame of a frequency domain space according to the audio signal, wherein the audio frame comprises a plurality of frequency points;
carrying out howling frequency point screening on the plurality of frequency points, and acquiring corresponding screening results; and the number of the first and second groups,
performing target sound detection on the audio frame, and acquiring a corresponding detection result, wherein the detection result is used for indicating whether the target sound exists in the audio frame, and the target sound comprises human voice and/or music voice;
dynamically adjusting the volume of the audio frame according to the screening result and the detection result; when the screening result indicates that the howling frequency point exists, the method includes:
if the detection result indicates that the target sound exists, the volume of the audio frame is adjusted downwards according to a preset volume adjustment level, and the steps are repeated until the volume reaches a preset first volume threshold value; or,
if the detection result indicates that the target sound does not exist, the volume of the audio frame is adjusted downwards according to a preset volume adjustment level, and the steps are repeated until the volume reaches a preset second volume threshold, wherein the volume corresponding to the second volume threshold is lower than the volume corresponding to the first volume threshold.
2. The method according to claim 1, wherein when the screening result indicates that the howling frequency point does not exist, the method further comprises:
adjusting the volume of the audio frame according to a preset volume up-regulation level, and repeating the steps until the volume reaches a preset third volume threshold; or,
directly up-regulating the volume of the audio frame to a third volume threshold.
3. The method according to claim 1 or 2, wherein the performing howling frequency point screening on the plurality of frequency points and obtaining corresponding screening results specifically includes:
acquiring power spectrum energy corresponding to the multiple frequency points according to the audio frame;
selecting at least one first candidate frequency point with the highest power spectrum energy from the plurality of frequency points;
and screening the howling frequency point from the at least one candidate frequency point according to a peak threshold power ratio PTPR, a peak-to-average power ratio PAPR, a peak harmonic power ratio PHPR and a short-time autocorrelation function STACF.
4. The method according to claim 1 or 2, wherein the detecting the target sound of the audio signal and obtaining a corresponding detection result specifically comprises:
when the howling frequency point screening is carried out on the plurality of frequency points, the number of harmonic waves in the audio frame is counted;
and acquiring the detection result of the target sound according to the power spectrum energy and the harmonic quantity corresponding to the multiple frequency points.
5. The method according to claim 4, wherein the obtaining the detection result of the target sound according to the power spectrum energy and the harmonic number corresponding to the plurality of frequency points specifically comprises:
calculating identification characteristics corresponding to the target sound according to the power spectrum energy corresponding to the multiple frequency points, wherein the identification characteristics comprise at least one of a harmonicity characteristic, a frequency spectrum flux characteristic and a sub-band energy ratio balance characteristic;
performing historical smoothing processing on the identification features to obtain target identification features;
and acquiring the detection result of the target sound according to the target identification feature and the harmonic quantity.
6. The method according to claim 1 or 2, wherein the obtaining an audio signal and obtaining an audio frame of a frequency domain space according to the audio signal specifically comprises:
acquiring the audio signal from a buffer area, wherein the audio signal received by the electronic equipment is stored in the buffer area;
performing framing processing on the audio signal according to a preset sampling rate and the number of samples corresponding to the length of an audio frame to obtain the audio frame;
windowing the audio frame according to a window function to obtain the audio frame after windowing;
and carrying out Fourier transform on the audio frame subjected to windowing processing to obtain the audio frame in a frequency domain space.
7. The method according to claim 3, wherein the selecting the howling frequency point from the at least one candidate frequency point according to a peak-to-threshold power ratio (PTPR), a peak-to-average power ratio (PAPR), a peak-to-harmonic power ratio (PHPR), and a short-time autocorrelation function (STACF) comprises:
carrying out primary screening on the first candidate frequency point by utilizing a peak threshold power ratio criterion PTPR and a peak-to-average power ratio criterion PAPR to obtain a second candidate frequency point;
carrying out secondary screening on the second candidate frequency point by utilizing a peak harmonic power ratio PHPR to obtain a third candidate frequency point;
and screening the third candidate frequency point for three times by using a short-time autocorrelation function STACF to obtain the howling frequency point.
8. The method according to claim 1 or 2, characterized in that the method further comprises:
carrying out periodic timing operation through a timer, wherein the timing operation comprises timing starting and timing ending;
and when the timing is finished each time, acquiring the screening result and the detection result corresponding to the timing period.
9. The method of claim 8, further comprising:
when the timing is finished, the corresponding detection result during the timing indicates that the target sound does not exist, and the screening result indicates that the howling frequency point exists, a volume level is adjusted downwards for the audio frame according to the preset volume adjustment level; or,
and when the timing is finished and the screening result indicates that the howling frequency point does not exist, adjusting the audio frame by a volume level according to the preset volume up-regulation level.
10. An electronic device for howling suppression, comprising:
one or more processors;
one or more memories;
the memory stores computer-readable program instructions that, when executed by the processor, cause the electronic device to perform the howling detection-based audio processing method of any one of claims 1-9.
11. A computer-readable storage medium comprising computer instructions that, when executed, cause the method of audio processing based on howling detection of any one of claims 1 to 9 to be implemented.
CN202210124282.0A 2022-02-10 2022-02-10 Audio processing method based on howling detection and electronic equipment Pending CN114464205A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210124282.0A CN114464205A (en) 2022-02-10 2022-02-10 Audio processing method based on howling detection and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210124282.0A CN114464205A (en) 2022-02-10 2022-02-10 Audio processing method based on howling detection and electronic equipment

Publications (1)

Publication Number Publication Date
CN114464205A true CN114464205A (en) 2022-05-10

Family

ID=81414429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210124282.0A Pending CN114464205A (en) 2022-02-10 2022-02-10 Audio processing method based on howling detection and electronic equipment

Country Status (1)

Country Link
CN (1) CN114464205A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724575A (en) * 2022-06-09 2022-07-08 广州市保伦电子有限公司 Howling detection method, device and system
CN118016042A (en) * 2024-04-09 2024-05-10 成都启英泰伦科技有限公司 Howling suppression method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724575A (en) * 2022-06-09 2022-07-08 广州市保伦电子有限公司 Howling detection method, device and system
CN118016042A (en) * 2024-04-09 2024-05-10 成都启英泰伦科技有限公司 Howling suppression method and device
CN118016042B (en) * 2024-04-09 2024-05-31 成都启英泰伦科技有限公司 Howling suppression method and device

Similar Documents

Publication Publication Date Title
US8284947B2 (en) Reverberation estimation and suppression system
US9967661B1 (en) Multichannel acoustic echo cancellation
CN114464205A (en) Audio processing method based on howling detection and electronic equipment
US9524735B2 (en) Threshold adaptation in two-channel noise estimation and voice activity detection
CN111445920B (en) Multi-sound source voice signal real-time separation method, device and pickup
US11404073B1 (en) Methods for detecting double-talk
CN106612482B (en) Method for adjusting audio parameters and mobile terminal
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
CN112485761B (en) Sound source positioning method based on double microphones
EP3011757A1 (en) Acoustic feedback canceller
CN109637552A (en) A kind of method of speech processing for inhibiting audio frequency apparatus to utter long and high-pitched sounds
US11380312B1 (en) Residual echo suppression for keyword detection
AU2024200622A1 (en) Methods and apparatus to fingerprint an audio signal via exponential normalization
JP2001309483A (en) Sound pickup method and sound pickup device
CN111128167A (en) Far-field voice awakening method and device, electronic product and storage medium
CN113614828A (en) Method and apparatus for fingerprinting audio signals via normalization
EP3066842B1 (en) Multi-band harmonic discrimination for feedback suppression
WO2016101162A1 (en) Sound feedback detection method and device
JP2000081900A (en) Sound absorbing method, and device and program recording medium therefor
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
WO2022068440A1 (en) Howling suppression method and apparatus, computer device, and storage medium
CN111477246A (en) Voice processing method and device and intelligent terminal
CN114267370A (en) Howling suppression method and device based on frequency domain processing
CN114143667A (en) Volume adjusting method, storage medium and electronic device
CN114520948A (en) Howling detection method, howling detection device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination