CN112652323B - Audio signal screening method and device, electronic equipment and storage medium - Google Patents

Audio signal screening method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112652323B
CN112652323B CN202011549996.3A CN202011549996A CN112652323B CN 112652323 B CN112652323 B CN 112652323B CN 202011549996 A CN202011549996 A CN 202011549996A CN 112652323 B CN112652323 B CN 112652323B
Authority
CN
China
Prior art keywords
audio signal
signal
energy
noise ratio
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011549996.3A
Other languages
Chinese (zh)
Other versions
CN112652323A (en
Inventor
刘鲁鹏
元海明
李贝
王晓红
陈佳路
高强
夏龙
郭常圳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ape Power Future Technology Co Ltd
Original Assignee
Beijing Ape Power Future Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ape Power Future Technology Co Ltd filed Critical Beijing Ape Power Future Technology Co Ltd
Priority to CN202011549996.3A priority Critical patent/CN112652323B/en
Publication of CN112652323A publication Critical patent/CN112652323A/en
Application granted granted Critical
Publication of CN112652323B publication Critical patent/CN112652323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application relates to an audio signal screening method, an audio signal screening device, electronic equipment and a storage medium. The method comprises the following steps: carrying out noise reduction processing on the first audio signal to obtain a second audio signal; determining a signal-to-noise ratio of the first audio signal from the first audio signal and the second audio signal; and determining whether the first audio signal is a target audio signal according to the comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold. The scheme provided by the application can simply and effectively realize screening out the target audio signal with low background noise, and has better universality.

Description

Audio signal screening method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a method and an apparatus for screening an audio signal, an electronic device, and a storage medium.
Background
With the development of artificial intelligence technology, audio processing technology is also continuously developed, and is widely applied to daily life and work of people, for example, speech recognition technology is used in various intelligent terminals.
In the field of artificial intelligence of speech recognition, a large number of audio signal samples are needed for machine learning, and the quality of the audio signal samples can directly influence the accuracy of a training model in the machine learning process. The audio signals collected in daily life have a lot of noises, which are not beneficial to model training of voice categories, so that the audio signals with smaller noises need to be screened out from a plurality of audio signals. In the audio screening method in the related art, the characteristics of the audio to be screened are compared with the characteristics of the target audio (the audio meeting the noise requirement), and if the comparison result meets the preset condition, the audio to be screened is used as the available audio or as the training sample.
However, in the solutions implemented by the related technologies, before feature comparison, feature extraction needs to be performed on each audio signal, and audio feature extraction is not easy, and the accuracy of screening is not high possibly due to error in audio feature extraction; in addition, according to training requirements of different types or functions, corresponding feature extraction models need to be set for audio feature extraction, the feature extraction models are low in universality, and the complexity of implementation is high.
Disclosure of Invention
In order to overcome the problems in the related art, the application provides an audio signal screening method, an audio signal screening device, an electronic device and a storage medium.
A first aspect of the present application provides an audio signal screening method, including:
carrying out noise reduction processing on the first audio signal to obtain a second audio signal; the second audio signal is an audio signal without background noise after the noise reduction processing;
determining a signal-to-noise ratio of the first audio signal from the first audio signal and the second audio signal;
determining whether the first audio signal is a target audio signal according to a comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold value, so as to screen out a target audio signal with low background noise as a training sample of a training model;
the determining a signal-to-noise ratio of the first audio signal from the first audio signal and the second audio signal comprises:
respectively calculating the signal energy of the first audio signal and the second audio signal to obtain first signal energy and second signal energy;
calculating the signal-to-noise ratio of the first audio signal according to the second signal energy and the difference value of the first signal energy and the second signal energy;
the separately calculating the signal energy of the first audio signal and the second audio signal comprises:
determining n sampling points in the first audio signal, and calculating the signal energy of the first audio signal according to the sampling values corresponding to the n sampling points in the first audio signal;
determining n sampling points in the second audio signal corresponding to the first audio signal, and calculating the signal energy of the second audio signal according to the sampling values corresponding to the n sampling points in the second audio signal;
said calculating a signal-to-noise ratio of said first audio signal based on said second signal energy and a difference between said first signal energy and said second signal energy comprises:
calculating a difference between a first signal energy and the second signal energy;
and carrying out logarithmic operation according to the ratio of the second signal energy to the difference value, and determining the signal-to-noise ratio of the first audio signal.
In one mode, the determining whether the first audio signal is a target audio signal according to the comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold includes:
and determining the first audio signal as a target audio signal according to the fact that the signal-to-noise ratio is larger than the set signal-to-noise ratio threshold.
In one approach, the set snr threshold value ranges from 15 to 25dB.
The second aspect of the present application provides an audio signal screening apparatus, comprising:
the noise reduction unit is used for carrying out noise reduction processing on the first audio signal to obtain a second audio signal; the second audio signal is an audio signal without background noise after the noise reduction processing;
a calculating unit for determining a signal-to-noise ratio of the first audio signal according to the first audio signal and the second audio signal;
the screening unit is used for determining whether the first audio signal is a target audio signal according to the comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold value, so that the target audio signal with low background noise is screened out and used as a training sample of a training model;
the calculation unit includes:
the first calculating subunit is used for calculating the signal energy of the first audio signal and the second audio signal respectively to obtain a first signal energy and a second signal energy;
a second calculating subunit, configured to calculate a signal-to-noise ratio of the first audio signal according to the second signal energy and a difference between the first signal energy and the second signal energy;
the calculating the signal energy of the first audio signal and the second audio signal respectively comprises:
determining n sampling points in the first audio signal, and calculating the signal energy of the first audio signal according to the sampling values corresponding to the n sampling points in the first audio signal;
determining n sampling points at positions corresponding to the first audio signal in the second audio signal, and calculating the signal energy of the second audio signal according to the sampling values corresponding to the n sampling points in the second audio signal;
said calculating a signal-to-noise ratio of said first audio signal based on said second signal energy and a difference between said first signal energy and said second signal energy comprises:
calculating a difference between a first signal energy and the second signal energy;
and carrying out logarithmic operation according to the ratio of the second signal energy to the difference value, and determining the signal-to-noise ratio of the first audio signal.
A third aspect of the present application provides an electronic device comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
A fourth aspect of the present application provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a method as described above.
The technical scheme provided by the application can comprise the following beneficial effects:
according to the technical scheme, the noise reduction processing is carried out on the first audio signal (namely, the audio signal to be screened) to obtain the second audio signal (namely, the audio signal after noise reduction), the signal to noise ratio of the first audio signal is obtained through calculation according to the audio signals before and after noise reduction, the background noise of the audio signal to be screened can be judged through comparing the signal to noise ratio with a set signal to noise ratio threshold (namely, an experience threshold of the signal to noise ratio), and therefore the target audio signal with small background noise is screened out. The screening method is simple and effective, has strong universality, can effectively reduce the complexity of audio signal screening, and improves the screening efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the application.
Fig. 1 is a schematic flowchart of an audio signal screening method according to an embodiment of the present application;
fig. 2 is another schematic flow chart of an audio signal screening method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an audio signal screening apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
Detailed Description
Preferred embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In the artificial intelligence field of speech recognition, a large number of audio signal samples are needed for model training, and audio signals collected in daily life have a large number of noises, which are not beneficial to model training of speech categories, so that audio signals with smaller noises need to be screened out from a large number of audio signals. In the related art, the characteristics of the audio to be screened are compared with the characteristics of the target audio (audio meeting the noise requirement), and if the comparison result meets a preset condition, the audio to be screened can be used as the audio or used as a training sample. Before feature comparison, feature extraction needs to be performed on each audio signal, the audio feature extraction is not easy, and the accuracy of screening is not high and the screening efficiency is low due to the fact that the audio feature extraction is wrong.
In order to solve the above problem, an embodiment of the present invention provides an audio signal screening method, which can simply and effectively screen out a target audio signal with low background noise.
The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of an audio signal screening method according to an embodiment of the present application.
Referring to fig. 1, an embodiment of an audio signal screening method in an embodiment of the present application includes:
step 101, performing noise reduction processing on the first audio signal to obtain a second audio signal.
The first audio signal is an audio signal to be screened, and the second audio signal is an audio signal obtained by reducing the noise of the first audio signal.
In this embodiment of the application, the algorithm for performing noise reduction processing on the audio signal may be a Minimum tracking noise estimation algorithm, a Minimum Controlled Recursive Averaging (MCRA) algorithm, or a Minimum Controlled recursive Averaging (IMCRA) algorithm based on wiener filtering.
It is to be understood that the noise reduction algorithm in the embodiment of the present application is not limited, and may be any algorithm capable of reducing the background noise in the audio signal.
Step 102, determining a signal-to-noise ratio of the first audio signal according to the first audio signal and the second audio signal.
The SIGNAL-to-NOISE RATIO (SNR) refers to the RATIO of the SIGNAL to the NOISE in an electronic device or electronic system, and in the embodiment of the present application, refers to the RATIO of the effective sound SIGNAL to the background NOISE in the first audio SIGNAL. Assuming that the second audio signal is an audio signal without background noise (i.e., an effective sound signal) after the noise reduction processing, performing a difference operation on the signal energy of the first audio signal and the second audio signal to obtain the signal energy of the background noise in the first audio signal, and calculating a ratio of the signal energy of the second audio signal to the signal energy of the background noise to obtain the signal-to-noise ratio of the first audio signal.
And 103, determining whether the first audio signal is a target audio signal according to a comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold.
The set signal-to-noise ratio threshold is an empirical threshold for determining the magnitude of background noise in the audio signal. In the embodiment of the present application, an empirical threshold is preset, that is, a signal-to-noise ratio threshold is set, that is, if the signal-to-noise ratio of a certain first audio signal is greater than the set signal-to-noise ratio threshold by default, it is determined that the first audio signal is a target audio signal, that is, a clean audio signal with low background noise.
In practical applications, the value range of the snr threshold may be set to be 15 to 25dB, for example, 20dB, according to practical requirements.
According to the technical scheme, the noise reduction processing is carried out on the first audio signal (namely, the audio signal to be screened) to obtain the second audio signal (namely, the audio signal after noise reduction), the signal-to-noise ratio of the first audio signal is obtained through calculation according to the audio signal before and after noise reduction, the background noise of the audio signal to be screened can be judged through comparing the signal-to-noise ratio with the set signal-to-noise ratio threshold (namely, the experience threshold of the signal-to-noise ratio), and therefore the target audio signal with small background noise is screened out. The screening method is simple and effective, has strong universality, can effectively reduce the complexity of audio signal screening, and improves the screening efficiency.
For convenience of understanding, an application example of the audio signal screening method is provided below for explanation, and an example of the audio signal screening method in this application example includes:
in the embodiment of the present application, it is assumed that a training model of speech recognition needs to recognize a speaker voice with environmental sounds, and a training sample of the training model needs an audio signal of the speaker voice with low background noise (or meets the requirement of low background noise). The background noise of the audio signal to be screened in the embodiment of the present application may be environmental sound, that is, the audio signal whose environmental sound meets the requirement needs to be screened out in the embodiment of the present application, and the audio signal is used as a training sample of a training model.
Fig. 2 is another schematic flow chart of the audio signal screening method according to the embodiment of the present application.
Referring to fig. 2, an embodiment of an audio signal screening method in an embodiment of the present application includes:
step 201, performing noise reduction processing on the first audio signal to obtain a second audio signal.
In the embodiment of the present application, it is assumed that the first audio signal is x, i.e., the audio signal to be filtered, and the second audio signal is s, i.e., the audio signal after noise reduction. The algorithm selected for the noise reduction processing of the audio signal is not limited.
Step 202, calculating the signal energy of the first audio signal and the second audio signal respectively to obtain a first signal energy and a second signal energy.
In the embodiment of the present application, n sampling points in the first audio signal may be determined, and the signal energy of the first audio signal may be calculated according to sampling values corresponding to the n sampling points in the first audio signal. Illustratively, the first signal energy may be calculated according to the following equation:
Figure GDA0003869407390000081
wherein E is x Is the first signal energy, n is the total number of sample points in the first audio signal, x i Representing the value of the ith sample point in the first audio signal x.
In the embodiment of the present application, the second audio signal is an audio signal after noise reduction, n sampling points in the second audio signal corresponding to the first audio signal may be determined, and the signal energy of the second audio signal is calculated according to sampling values corresponding to the n sampling points in the second audio signal. Illustratively, the second signal energy may be calculated according to the following formula:
Figure GDA0003869407390000082
wherein E is s Is the second signal energy, n is the total number of sample points in the second audio signal, s i Representing the value of the ith sample point in the second audio signal s.
It will be appreciated that in practical applications, the calculation of the audio signal energy may be implemented by other methods, and the above algorithm description is only exemplary and should not be taken as the only limitation of the calculation of the audio signal energy.
Step 203, calculating the signal-to-noise ratio of the first audio signal according to the second signal energy and the difference between the first signal energy and the second signal energy.
Calculating the difference between the first signal energy and the second signal energy; and carrying out logarithmic operation according to the ratio of the energy of the second signal to the difference value, and determining the signal-to-noise ratio of the first audio signal.
Illustratively, let the signal-to-noise ratio snr be snr x Calculating the signal-to-noise ratio according to the following formula;
snr x =10log 10 (E s /(E x -E s ))
wherein, snr is x To signal-to-noise ratio, E x Is the energy of the first signal, E s The second signal energy.
In the embodiment of the present application, it is assumed that the second audio signal is an audio signal without background noise (i.e., an effective sound signal) after the noise reduction processing, and the signal energy of the background noise in the first audio signal can be obtained by performing difference operation on the signal energy of the first audio signal and the signal energy of the second audio signal, and the signal-to-noise ratio of the first audio signal is obtained by calculating the ratio of the signal energy of the second audio signal to the signal energy of the background noise.
And step 204, determining whether the first audio signal is a target audio signal according to a comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold.
In this step, the first audio signal is determined to be a target audio signal according to the fact that the signal-to-noise ratio is greater than the set signal-to-noise ratio threshold.
In the embodiment of the present application, snr in an audio signal is assumed x Is an empirical threshold of, i.e. snr x Set snr threshold value snr thresh Is 20dB. It should be noted that the snr threshold is set to 20dB for illustration only, but not limited to, and can be adjusted as needed. If snr is x >snr thresh Then, it means that the background noise of the audio signal x is less, and the audio signal x is determined to be the target audio signal, and the first audio signal x may be selected into a sample library of the speech recognition model training. Otherwise, the first audio signal x is discarded.
In the embodiment of the application, it is assumed that a sample voice library needs to be constructed, wherein the sample voice library can be historical voice data and historical text data corresponding to the historical voice data, which are uttered by surrounding users at different distances and different orientations relative to a target user; the historical voice data can comprise common communication phrase voice data, and the historical text data comprises common communication phrase text data; the commonly used communication words comprise names, titles and commonly used chatting words between the surrounding users and the target users, and calling words between the surrounding users and the target users. The audio signals in the sample voice library are all audio signals with small background noise after being screened by the audio signal screening method in the embodiment of the application, so that the training effect can be more excellent when the sample voice library is used for model training.
Corresponding to the embodiment of the application function realization method, the application also provides an audio signal screening device, electronic equipment and a corresponding embodiment.
Fig. 3 is a schematic structural diagram of an audio signal screening apparatus according to an embodiment of the present application.
Referring to fig. 3, the audio signal screening apparatus includes:
the noise reduction unit 301 is configured to perform noise reduction processing on the first audio signal to obtain a second audio signal.
A calculating unit 302, configured to determine a signal-to-noise ratio of the first audio signal according to the first audio signal and the second audio signal.
The screening unit 303 is configured to determine whether the first audio signal is a target audio signal according to a comparison result between the signal-to-noise ratio and a set signal-to-noise ratio threshold. The screening unit 303 determines that the first audio signal is the target audio signal according to the snr is greater than the set snr threshold.
Wherein the set SNR threshold is an empirical threshold. In practical applications, the range of the snr threshold may be set to be 15 to 25dB, for example, 20dB, according to actual requirements.
Further, the calculation unit 302 may include a first calculation subunit (not shown in the figure) and a second calculation subunit (not shown in the figure).
And the first calculating subunit is used for calculating the signal energy of the first audio signal and the second audio signal respectively to obtain a first signal energy and a second signal energy.
And the second calculating subunit is used for calculating the signal-to-noise ratio of the first audio signal according to the second signal energy and the difference value of the first signal energy and the second signal energy.
The calculating the signal energy of the first audio signal and the second audio signal respectively comprises: determining n sampling points in the first audio signal, and calculating the signal energy of the first audio signal according to the sampling values corresponding to the n sampling points in the first audio signal; and determining n sampling points at the corresponding positions of the second audio signal and the first audio signal, and calculating the signal energy of the second audio signal according to the sampling values corresponding to the n sampling points in the second audio signal.
It can be found that, according to the technical scheme of the application, noise reduction processing is performed on a first audio signal (i.e., an audio signal to be screened) to obtain a second audio signal (i.e., an audio signal after noise reduction), and then a signal-to-noise ratio of the first audio signal is obtained according to the audio signal before and after noise reduction, that is, the background noise of the audio signal to be screened can be judged by comparing the signal-to-noise ratio with a set signal-to-noise ratio threshold (i.e., an empirical threshold of the signal-to-noise ratio), so that a target audio signal with small background noise can be screened. The screening method is simple and effective, has strong universality, can effectively reduce the complexity of audio signal screening, and improves the screening efficiency.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 4 is a schematic structural diagram of an electronic device shown in an embodiment of the present application. The electronic device may be a mobile terminal device or a server device, etc.
Referring to fig. 4, the electronic device 400 includes a memory 410 and a processor 420.
The Processor 420 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 410 may include various types of storage units, such as system memory, read Only Memory (ROM), and a persistent storage device. Wherein the ROM may store static data or instructions for processor 420 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 410 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 410 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 410 has stored thereon executable code that, when processed by the processor 420, may cause the processor 420 to perform some or all of the methods described above.
The aspects of the present application have been described in detail hereinabove with reference to the accompanying drawings. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. Those skilled in the art should also appreciate that acts and modules referred to in the specification are not necessarily required in the present application. In addition, it can be understood that the steps in the method of the embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs, and the modules in the device of the embodiment of the present application may be combined, divided, and deleted according to actual needs.
Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.
Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or electronic device, server, etc.), causes the processor to perform part or all of the steps of the above-described method according to the present application.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the applications disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (6)

1. A method for audio signal screening, comprising:
carrying out noise reduction processing on the first audio signal to obtain a second audio signal; the second audio signal is an audio signal without background noise after the noise reduction processing;
determining a signal-to-noise ratio of the first audio signal from the first audio signal and the second audio signal;
determining whether the first audio signal is a target audio signal according to a comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold value, so as to screen out a target audio signal with low background noise as a training sample of a training model;
the determining a signal-to-noise ratio of the first audio signal from the first audio signal and the second audio signal comprises:
respectively calculating the signal energy of the first audio signal and the second audio signal to obtain first signal energy and second signal energy;
calculating the signal-to-noise ratio of the first audio signal according to the second signal energy and the difference value of the first signal energy and the second signal energy;
the separately calculating the signal energy of the first audio signal and the second audio signal comprises:
determining n sampling points in the first audio signal, and calculating the signal energy of the first audio signal according to the sampling values corresponding to the n sampling points in the first audio signal;
determining n sampling points in the second audio signal corresponding to the first audio signal, and calculating the signal energy of the second audio signal according to the sampling values corresponding to the n sampling points in the second audio signal;
said calculating a signal-to-noise ratio of said first audio signal based on said second signal energy and a difference between said first signal energy and said second signal energy comprises:
calculating a difference between a first signal energy and the second signal energy;
and carrying out logarithmic operation according to the ratio of the second signal energy to the difference value, and determining the signal-to-noise ratio of the first audio signal.
2. The method as claimed in claim 1, wherein the determining whether the first audio signal is a target audio signal according to the comparison result between the snr and a threshold snr comprises:
and determining the first audio signal as a target audio signal according to the fact that the signal-to-noise ratio is larger than the set signal-to-noise ratio threshold value.
3. The audio signal screening method according to any one of claims 1 to 2, wherein:
the range of the set signal-to-noise ratio threshold is 15 to 25dB.
4. An audio signal screening apparatus, comprising:
the noise reduction unit is used for carrying out noise reduction processing on the first audio signal to obtain a second audio signal; the second audio signal is an audio signal without background noise after the noise reduction processing;
the computing unit is used for determining the signal-to-noise ratio of the first audio signal according to the first audio signal and the second audio signal;
the screening unit is used for determining whether the first audio signal is a target audio signal according to a comparison result of the signal-to-noise ratio and a set signal-to-noise ratio threshold value, so that the target audio signal with low background noise is screened out and used as a training sample of a training model;
the calculation unit includes:
the first calculating subunit is configured to calculate signal energies of the first audio signal and the second audio signal, respectively, to obtain a first signal energy and a second signal energy;
a second calculating subunit, configured to calculate a signal-to-noise ratio of the first audio signal according to the second signal energy and a difference between the first signal energy and the second signal energy;
the separately calculating the signal energy of the first audio signal and the second audio signal comprises:
determining n sampling points in the first audio signal, and calculating the signal energy of the first audio signal according to the sampling values corresponding to the n sampling points in the first audio signal;
determining n sampling points in the second audio signal corresponding to the first audio signal, and calculating the signal energy of the second audio signal according to the sampling values corresponding to the n sampling points in the second audio signal;
the calculating the signal-to-noise ratio of the first audio signal according to the second signal energy and the difference between the first signal energy and the second signal energy comprises:
calculating a difference between a first signal energy and the second signal energy;
and carrying out logarithmic operation according to the ratio of the second signal energy to the difference value, and determining the signal-to-noise ratio of the first audio signal.
5. An electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-3.
6. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-3.
CN202011549996.3A 2020-12-24 2020-12-24 Audio signal screening method and device, electronic equipment and storage medium Active CN112652323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011549996.3A CN112652323B (en) 2020-12-24 2020-12-24 Audio signal screening method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011549996.3A CN112652323B (en) 2020-12-24 2020-12-24 Audio signal screening method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112652323A CN112652323A (en) 2021-04-13
CN112652323B true CN112652323B (en) 2023-01-20

Family

ID=75360076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011549996.3A Active CN112652323B (en) 2020-12-24 2020-12-24 Audio signal screening method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112652323B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 Multi-microphone voice acquisition method and device
CN110265052A (en) * 2019-06-24 2019-09-20 秒针信息技术有限公司 The signal-to-noise ratio of radio equipment determines method, apparatus, storage medium and electronic device
CN111833895A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio signal processing method, apparatus, computer device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3252766B1 (en) * 2016-05-30 2021-07-07 Oticon A/s An audio processing device and a method for estimating a signal-to-noise-ratio of a sound signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 Multi-microphone voice acquisition method and device
CN111833895A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio signal processing method, apparatus, computer device and medium
CN110265052A (en) * 2019-06-24 2019-09-20 秒针信息技术有限公司 The signal-to-noise ratio of radio equipment determines method, apparatus, storage medium and electronic device

Also Published As

Publication number Publication date
CN112652323A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN112786066B (en) Audio signal screening method and device and electronic equipment
US9215538B2 (en) Method and apparatus for audio signal classification
CN110767247B (en) Voice signal processing method, sound acquisition device and electronic equipment
JP2010112994A (en) Voice processing device, voice processing method and program
WO2020166322A1 (en) Learning-data acquisition device, model learning device, methods for same, and program
CN108806707B (en) Voice processing method, device, equipment and storage medium
CN112802463B (en) Audio signal screening method, device and equipment
US20230116052A1 (en) Array geometry agnostic multi-channel personalized speech enhancement
CN112652323B (en) Audio signal screening method and device, electronic equipment and storage medium
CN113611329A (en) Method and device for detecting abnormal voice
CN108093356B (en) Howling detection method and device
CN108053834A (en) audio data processing method, device, terminal and system
CN112750453B (en) Audio signal screening method, device, equipment and storage medium
CN108899041B (en) Voice signal noise adding method, device and storage medium
CN114842849B (en) Voice dialogue detection method and device
CN115954013A (en) Voice processing method, device, equipment and storage medium
CN115171735A (en) Voice activity detection method, storage medium and electronic equipment
CN113409802B (en) Method, device, equipment and storage medium for enhancing voice signal
US20240170004A1 (en) Context aware audio processing
KR100639930B1 (en) Voice 2 stage end-point detection apparatus for automatic voice recognition system and method therefor
CN117727298B (en) Deep learning-based portable computer voice recognition method and system
CN111816217B (en) Self-adaptive endpoint detection voice recognition method and system and intelligent device
US20230024855A1 (en) Method and electronic device for improving audio quality
CN117083673A (en) Context aware audio processing
CN115767389A (en) Audio signal processing method for digital hearing aid and digital hearing aid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant