CN112216296A - Audio anti-disturbance testing method and device and storage medium - Google Patents

Audio anti-disturbance testing method and device and storage medium Download PDF

Info

Publication number
CN112216296A
CN112216296A CN202011024815.5A CN202011024815A CN112216296A CN 112216296 A CN112216296 A CN 112216296A CN 202011024815 A CN202011024815 A CN 202011024815A CN 112216296 A CN112216296 A CN 112216296A
Authority
CN
China
Prior art keywords
band
attack
pass filter
target
alternative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011024815.5A
Other languages
Chinese (zh)
Other versions
CN112216296B (en
Inventor
黎吉国
许继征
张莉
王悦
马思伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Lemon Inc Cayman Island
Original Assignee
Peking University
Lemon Inc Cayman Island
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Lemon Inc Cayman Island filed Critical Peking University
Priority to CN202011024815.5A priority Critical patent/CN112216296B/en
Publication of CN112216296A publication Critical patent/CN112216296A/en
Application granted granted Critical
Publication of CN112216296B publication Critical patent/CN112216296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Noise Elimination (AREA)

Abstract

The embodiment of the disclosure provides a method, equipment and a storage medium for testing audio anti-disturbance, wherein an initial audio signal is input to an attack network of an anti-attack model to obtain a first interference signal; filtering the first interference signal by adopting at least two alternative band-pass filters with different passbands to obtain a second interference signal corresponding to each alternative band-pass filter; determining a target band-pass filter according to each second interference signal, the initial audio signal and the target audio processing model; and performing a reactive attack test on the target audio processing model based on the reactive attack model, wherein the band-pass filter in the reactive attack model is the target band-pass filter. The optimal alternative band-pass filter is determined from the at least two alternative band-pass filters to serve as the target band-pass filter of the adversarial attack model, so that the adversarial attack model has optimal adversarial attack performance aiming at the adversarial audio sample generated by the target audio processing model.

Description

Audio anti-disturbance testing method and device and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of computer and network communication, and in particular relates to a method, equipment and a storage medium for testing audio anti-disturbance.
Background
At present, with the rapid development of audio processing technologies such as voice recognition and voiceprint recognition, the application of the audio processing model is more and more extensive, various services can be provided for the life of people, the human-computer interaction efficiency is greatly improved, great convenience is brought to the daily production life of people, the audio processing model is easily attacked by hiding, and the audio processing model can generate wrong processing results by adding disturbance which cannot be found by human ears in the original audio. The purpose of the antagonistic disturbance generation is to generate the antagonistic disturbance for machine learning models such as audio processing models and the like so as to deceive the well-trained machine learning models and generate wrong processing results, and then the machine learning models can be optimized based on the antagonistic audio samples so as to improve the model performance.
In the prior art, for a certain audio processing model, an interference signal is generally generated through a specific attack network model, an input audio signal is interfered based on the interference signal to obtain a counterattack audio sample, and then counterattack is performed on the audio processing model based on the counterattack audio sample. In the prior art, the antagonistic attack performance of the audio sample obtained based on the attack network is poor, and the different audio processing models cannot be well adapted.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device and a storage medium for testing audio frequency anti-disturbance, so as to improve the performance of anti-attack on an audio frequency sample and improve the success rate of the anti-attack on an audio frequency processing model.
In a first aspect, an embodiment of the present disclosure provides a method for testing an audio countermeasure disturbance, including:
inputting the initial audio signal into an attack network in a antagonism attack model for processing to obtain a first interference signal;
respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
determining a target band-pass filter from the at least two alternative band-pass filters according to the second interference signal corresponding to each alternative band-pass filter, the initial audio signal and a target audio processing model;
and carrying out a reactive attack test on the target audio processing model based on the reactive attack model, wherein a band-pass filter in the reactive attack model is the target band-pass filter.
According to one or more embodiments of the present disclosure, the determining a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model includes:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
According to one or more embodiments of the present disclosure, determining the target band-pass filter from the at least two candidate band-pass filters according to the attack performance parameter corresponding to each candidate band-pass filter includes:
sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter; or
And determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
According to one or more embodiments of the present disclosure, the attack performance parameters include attack success rate and/or disturbance scale parameters against the audio signal.
According to one or more embodiments of the present disclosure, the method further comprises:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
According to one or more embodiments of the present disclosure, the constructing the at least two alternative band pass filters based on at least two preset pass bands and a hamming window includes:
constructing the alternative band pass filter by the following formula:
Figure BDA0002701836460000031
Figure BDA0002701836460000032
g[n,f1,f2]=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
According to one or more embodiments of the present disclosure, the method further comprises:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
According to one or more embodiments of the present disclosure, the performing a counter attack test on the target audio processing model based on the counter attack model includes:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
In a second aspect, an embodiment of the present disclosure provides an audio anti-disturbance test apparatus, including:
the system comprises an interference signal acquisition unit, a correlation attack model acquisition unit and a correlation attack model acquisition unit, wherein the interference signal acquisition unit is used for inputting an initial audio signal into an attack network in the correlation attack model for processing to obtain a first interference signal; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
a band-pass filter determining unit, configured to determine a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model;
and the antagonism attack unit is used for carrying out antagonism attack test on the target audio processing model based on the antagonism attack model, wherein a band-pass filter in the antagonism attack model is the target band-pass filter.
According to one or more embodiments of the present disclosure, when the band pass filter determining unit determines the target band pass filter from the at least two candidate band pass filters according to the second interference signal corresponding to each candidate band pass filter, the initial audio signal, and the target audio processing model, the band pass filter determining unit is configured to:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
According to one or more embodiments of the present disclosure, when the band pass filter determining unit determines the target band pass filter from the at least two candidate band pass filters according to the attack performance parameter corresponding to each candidate band pass filter, the band pass filter determining unit is configured to:
sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter; or
And determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
According to one or more embodiments of the present disclosure, the attack performance parameters include attack success rate and/or disturbance scale parameters against the audio signal.
In accordance with one or more embodiments of the present disclosure, the apparatus further comprises a band pass filter construction module to:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
According to one or more embodiments of the present disclosure, the band pass filter construction module, when constructing the at least two candidate band pass filters based on at least two preset pass bands and a hamming window, is configured to:
constructing the alternative band pass filter by the following formula:
Figure BDA0002701836460000041
Figure BDA0002701836460000051
g[n,f1,f2]=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
In accordance with one or more embodiments of the present disclosure, the apparatus further comprises a training module to:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
According to one or more embodiments of the present disclosure, the adversarial attack unit, when performing an adversarial attack test on the target audio processing model based on the adversarial attack model, is configured to:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method for testing audio immunity to perturbations as set forth in the first aspect above and in various possible designs of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for testing the audio immunity against disturbance according to the first aspect and various possible designs of the first aspect is implemented.
According to the test method, the test device and the test medium for the audio anti-disturbance, the initial audio signal is input to the attack network in the anti-attack model for processing, and a first interference signal is obtained; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter, wherein any two alternative band-pass filters in the at least two alternative band-pass filters have different passbands; determining a target band-pass filter from at least two alternative band-pass filters according to a second interference signal, an initial audio signal and a target audio processing model corresponding to each alternative band-pass filter; and performing a reactive attack test on the target audio processing model based on the reactive attack model, wherein the band-pass filter in the reactive attack model is the target band-pass filter. The method and the device for the adversarial attack model configuration have the advantages that at least two alternative band-pass filters are configured in the adversarial attack model, and the target band-pass filter of the adversarial attack model is determined, so that the adversarial attack model can generate a proper counterattack audio sample aiming at the target audio processing model, the required counterattack performance is achieved, the success rate of the adversarial attack on the target audio processing model is improved, and a basis is provided for the optimization of the target audio processing model.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is an exemplary diagram of an application scenario provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for testing audio counterdisturbance according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for testing audio counterdisturbance according to another embodiment of the present disclosure;
fig. 4 is a schematic diagram of an alternative bandpass filter for filtering according to an embodiment of the disclosure;
FIG. 5 is a flow chart of a method for testing audio counterdisturbance according to another embodiment of the present disclosure;
FIG. 6 is a block diagram of a testing apparatus for testing the resistance of audio against disturbance according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In the prior art, for a certain audio processing model, an interference signal is generally generated through a specific attack network model, an input audio signal is interfered based on the interference signal to obtain a counterattack audio sample, and then counterattack is performed on the audio processing model based on the counterattack audio sample. Wherein the attack network model can be obtained by pre-training. The adversarial attack performance of the audio samples obtained based on the attack network is poor, and the different audio processing models cannot be well adapted to.
Because the frequency domain of the interference signal generated by the attack network is in a wider frequency range, and the sensitivity of human ears to sound signals of different frequency ranges is different, the interference signal of the wider frequency range is not beneficial to synthesizing imperceptible adversity disturbance, so that the adversity attack performance of the countersound frequency sample obtained based on the attack network is poorer. By carrying out frequency domain analysis on the interference signals, the synthesized anti-audio samples when the interference signals are at different frequencies are found to have different anti-attack performance on an audio processing model, for example, for speaker recognition models (speaker recognition systems), the anti-attack performance of the anti-audio samples added with the high-frequency interference signals is better than that of the anti-audio samples added with the low-frequency interference signals; for other audio processing models, the frequency of the interfering signal may also produce different countering attack performance.
Therefore, when the countermeasure sample is generated, the band-pass filter is added to filter the interference signal generated by the attack network on the basis of the attack network, and a pair of resistance attack models is formed; in order to achieve the optimal countering attack performance for each audio processing model, in the embodiment of the present disclosure, at least two alternative bandpass filters are configured in the countering attack model, each alternative bandpass filter has a different passband, and before generating the countering samples, the optimal target bandpass filter may be determined first, so that the countering audio samples can be generated based on the attack network and the target bandpass filter in the countering attack model.
The application scenario provided by the embodiment of the disclosure may be as shown in fig. 1, and includes a reactive attack model and a target audio processing model, where the reactive attack model and the target audio processing model may be deployed on an electronic device such as the same server, and certainly may also be deployed on an electronic device such as a different server, and the reactive attack model may specifically include an attack network and a band pass filter (which may include at least two alternative band pass filters, fig. 1 is merely an example), an initial audio signal is input into the reactive attack model, a first interference signal is obtained through the attack network, a second interference signal is obtained through filtering by the band pass filter, and then the second interference signal is fused with the initial audio signal to obtain a countering audio signal, and then the countering audio signal is input into the target audio processing model to determine whether the attack is successful according to an audio processing result, and the size of a disturbance scale parameter of the countering audio, it is determined whether the interference is easily perceived. The adversarial attack model comprises at least two alternative band-pass filters, so that at least two counterattack audio signals can be generated, the target band-pass filter of the adversarial attack model can be determined by comparing the attack success rate of the audio processing result and the size of the disturbance scale parameter of the counterattack audio signals, for example, the counterattack audio signal with the optimal attack effect can be determined, the alternative band-pass filter corresponding to the optimal counterattack audio signal is used as the optimal alternative band-pass filter and is used as the target band-pass filter of the adversarial attack model, and the subsequent adversarial attack test is carried out on the target audio processing model based on the attack network of the adversarial attack model and the target band-pass filter.
The target audio processing model in the embodiments of the present disclosure may be an audio processing model that is applied in any scene and that implements any task, including but not limited to a speaker recognition model, a speech recognition model, a voiceprint recognition model, a speech arrival-based age recognition model, a gender recognition model, and so on.
The following describes the details of the method for testing the audio frequency against disturbance with reference to the specific embodiment.
Referring to fig. 2, fig. 2 is a schematic flow chart of a method for testing audio immunity against disturbance according to an embodiment of the present disclosure. The method of the embodiment can be applied to any electronic device such as a terminal device or a server, and the method for testing the audio frequency against disturbance comprises the following steps:
s201, inputting the initial audio signal into an attack network in a adversarial attack model for processing to obtain a first interference signal.
In this embodiment, the countering attack model includes an attack network and a band-pass filter, where the attack network is configured to obtain a first interference signal according to an initial audio signal, and the band-pass filter is configured to filter the first interference signal to obtain an interference signal with a specific passband, and then fuse the interference signal with the initial audio signal to obtain a countering audio signal for countering an attack on the target audio processing model, so as to perform the countering attack on the target audio processing model according to the countering audio signal, thereby achieving an optimal countering attack effect, and further facilitating improvement of accuracy and robustness of the target audio processing model.
The initial audio signal in this embodiment may be any audio signal, and the attack network and the target audio processing model may be any models such as a deep neural network, a convolutional neural network, etc., where the target audio processing model may be an audio processing model applied in any scene to implement any task, including but not limited to a speaker recognition model, a speech recognition model, a voiceprint recognition model, an age recognition model based on speech arrival, a gender recognition model, etc.
The attack network may be an attack network for the target audio processing model, and may be trained in advance, for example, by the following process: acquiring training data aiming at an initial attack network; and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model. The attack network may be a residual error network with a hopping connection, and in addition, in this embodiment, the attack network may also be trained by using different existing optimization methods, and the specific training process is not described here again.
S202, filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands.
In this embodiment, in order to find out that the countering audio signal can achieve the required attack effect, it may be determined that the interference signal of the appropriate passband is fused with the initial audio signal, and the bandpass filter is configured to filter the interference signal output by the attack network, and if the bandpass filter can limit the interference signal under the appropriate passband, the appropriate countering audio signal may be obtained, and then the required attack effect may be achieved, that is, the second interference signal corresponding to each alternative bandpass filter in this disclosure is the interference signal obtained after the first interference signal is filtered by each alternative bandpass filter. Therefore, in this embodiment, at least two alternative bandpass filters may be provided, each alternative bandpass filter having a different passband, for example, a sampling rate of 16kHz, and the following alternative bandpass filters may be provided: [0, 1k ], [1k, 2k ], [2k, 3k ], [3k, 4k ], [4k, 5k ], [5k, 6k ], [6k, 7k ], [7k, 8k ]. In this embodiment, the first interference signal is respectively input to each alternative bandpass filter, and each alternative bandpass filter respectively filters the first interference signal to obtain a second interference signal corresponding to each alternative bandpass filter.
S203, determining a target band-pass filter from the at least two alternative band-pass filters according to the second interference signal corresponding to each alternative band-pass filter, the initial audio signal and the target audio processing model.
In this embodiment, after obtaining the second interference signal corresponding to each candidate bandpass filter, the target bandpass filter of the antagonistic attack model may be determined from the at least two candidate bandpass filters based on the second interference signal corresponding to each candidate bandpass filter, the initial audio signal, and the target audio processing model.
Optionally, as shown in fig. 3, when determining the target band-pass filter in this embodiment, the following process may be performed:
s2031, fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound signal corresponding to each alternative band-pass filter;
s2032, carrying out a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and obtaining attack performance parameters corresponding to each alternative band-pass filter;
s2033, determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
In this embodiment, the second interference signal corresponding to each candidate bandpass filter may be fused with the initial audio signal, for example, the second interference signal may be directly superimposed on the initial audio signal to obtain a corresponding countering audio signal, then the countering audio signal is input into the target audio processing model to obtain an audio processing result, the attack success rate is determined according to the audio processing result, and/or the size of a disturbance scale parameter of the countering audio signal is detected as an attack performance parameter corresponding to the candidate bandpass filter, where the disturbance scale parameter of the countering audio signal may be a MOS value (Mean Opinion Score, average Opinion value) of subjective speech quality assessment (PESQ for short) to characterize whether the second interference signal in the countering audio signal can be perceived, the larger the MOS value the less perceptible the second interfering signal. And comparing the attack performance parameters of the corresponding anti-audio signals of the alternative band-pass filters, so that the target band-pass filter can be determined based on the attack performance parameters. That is, optionally, in this embodiment, a countermeasure attack test may be performed on the target audio processing model according to each countermeasure audio signal, so as to obtain an attack performance parameter corresponding to each countermeasure audio signal; and determining a target band-pass filter according to the attack performance parameters corresponding to each pair of countering audio signals, for example, taking the alternative band-pass filter corresponding to the countering audio signal with the optimal attack performance parameters as the target band-pass filter. Wherein, optionally, the attack performance parameters may include, but are not limited to, attack success rate and/or disturbance scale parameters against the audio signal.
In an optional embodiment, in this embodiment, the at least two candidate band-pass filters may be ranked according to the attack performance parameter corresponding to each of the at least two candidate band-pass filters, and a target band-pass filter is determined. In this embodiment, after obtaining the attack performance parameters corresponding to each candidate bandpass filter, the candidate bandpass filters may be ranked according to the attack performance parameters, and the candidate bandpass filter with the best attack performance parameters is selected as the target bandpass filter, where the attack performance parameters may be the highest attack success rate and/or the largest disturbance scale parameter of the anti-audio signal.
In one example, taking the target audio processing model as the speaker recognition model as an example, the preset pass band of the alternative band-pass filter includes: [0, 1k ], [1k, 2k ], [2k, 3k ], [3k, 4k ], [4k, 5k ], [5k, 6k ], [6k, 7k ], [7k, 8k ], filtering the first interference signal to obtain the corresponding second interference signal as shown in FIG. 4, respectively obtaining the anti-audio signal corresponding to each alternative band-pass filter through the fusion process, inputting the anti-audio signal into the target audio processing model, determining the attack success rate of each pair of anti-audio signals according to the output audio processing result, detecting the disturbance scale parameter of each pair of anti-audio signals, sequencing the anti-audio signal attack success rate and the disturbance scale parameter of each pair of anti-audio signals, and analyzing to find that the speaker identification model can obtain better attack performance than low frequency attack performance, the alternative band-pass filters with passbands of [6k, 7k ] and [7k, 8k ] have the corresponding attack success rate for resisting the audio signals of more than 75%, the MOS value of PESQ in the disturbance scale parameter for resisting the audio signals of more than 4.0, and the interference signals in the resisting audio signals are not easy to perceive, i.e. the alternative band-pass filter with high frequency can be the optimal alternative band-pass filter to be used as the target band-pass filter. Of course, the final determined target band-pass filter may be different for different target audio processing models.
In another optional embodiment, in this embodiment, a candidate bandpass filter whose corresponding attack performance parameter meets a preset condition may also be determined as the target bandpass filter. In this embodiment, an attack success rate threshold and/or a disturbance scale parameter threshold for countering the audio signal may be set, for example, the attack success rate threshold for countering the audio signal may be 75%, and the disturbance scale parameter threshold for countering the audio signal may be an MOS value of PESQ reaching 4.0, which may also be set according to actual situations. And when the attack success rate corresponding to a certain alternative band-pass filter is greater than the attack success rate threshold and/or the disturbance scale parameter of the countersound signal is greater than the disturbance scale parameter threshold of the countersound signal, taking the alternative band-pass filter as the target band-pass filter.
S204, performing a reactive attack test on the target audio processing model based on the reactive attack model, wherein a band-pass filter in the reactive attack model is the target band-pass filter.
In this embodiment, after the target band-pass filter is determined, it is equivalent to determining the adversarial attack model, that is, the adversarial attack model includes the attack network and the target band-pass filter at this time, and then the adversarial attack test can be performed on the target audio processing model based on the adversarial attack model, and a basis can be provided for the target audio processing model to resist the adversarial attack.
When the adversarial attack test is performed on the target audio processing model based on the adversarial attack model, one or more audio signals for the adversarial attack test need to be acquired and input into the adversarial attack model for processing to obtain an adversarial audio signal, and then the adversarial attack test is performed on the target audio processing model based on the adversarial audio signal, wherein the audio signal for the adversarial attack test may be different from the initial audio signal in the above embodiment.
According to the method for testing the audio anti-disturbance, the initial audio signal is input to the attack network in the antagonistic attack model for processing, so that a first interference signal is obtained; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter, wherein any two alternative band-pass filters in the at least two alternative band-pass filters have different passbands; determining a target band-pass filter from at least two alternative band-pass filters according to a second interference signal, an initial audio signal and a target audio processing model corresponding to each alternative band-pass filter; and performing a reactive attack test on the target audio processing model based on the reactive attack model, wherein the band-pass filter in the reactive attack model is the target band-pass filter. In the embodiment, at least two alternative band pass filters are configured in the adversarial attack model, and the target band pass filter of the adversarial attack model is determined, so that the adversarial attack model can generate a proper counterattack audio sample for the target audio processing model, thereby achieving the required performance of the adversarial attack, improving the success rate of the adversarial attack on the target audio processing model, and providing a basis for the optimization of the target audio processing model.
On the basis of any of the above embodiments, the method may further pre-construct an alternative band-pass filter, and the specific process is as follows:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
In this embodiment, at least 2 preset pass bands, such as [0, 1k ], [1k, 2k ], [2k, 3k ], [3k, 4k ], [4k, 5k ], [5k, 6k ], [6k, 7k ], [7k, 8k ], may be preset, and then at least two alternative pass band filters are constructed according to the at least two preset pass bands and Hamming windows (Hamming), respectively. Of course, other forms of time windows may be adopted in this embodiment, and are not described herein again.
Further, the constructing the at least two candidate band-pass filters based on the at least two preset pass bands and the hamming window may specifically include:
a 33-point bandpass filter with a hamming window to is constructed by the following equation:
Figure BDA0002701836460000121
Figure BDA0002701836460000122
g[n,f1,f2]=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
On the basis of any of the above embodiments, as shown in fig. 5, the performing a reactive attack test on the target audio processing model based on the reactive attack model may specifically include:
s301, obtaining an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
s302, fusing the target interference signal with the audio signal of the adversarial attack test to obtain a target adversarial audio signal;
s303, carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
In this embodiment, when a target band pass filter is determined, that is, a countering attack model is determined, and then a countering attack needs to be performed on a target audio processing model, the countering attack model is adopted, an audio signal for countering attack testing can be first obtained as an input of the countering attack model, an initial interference signal is obtained through an attack network of the countering attack model, a target interference signal is obtained through the target band pass filter, and then the target interference signal is fused with the audio signal for countering attack testing to obtain a target countering audio signal, and the countering attack testing is performed on the target audio processing model based on the target countering audio signal.
Corresponding to the method for testing audio immunity against disturbance in the above embodiment, fig. 6 is a block diagram of a structure of a testing apparatus for testing audio immunity against disturbance according to an embodiment of the present disclosure. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 6, the audio anti-disturbance test apparatus 60 includes: an interference signal acquisition unit 601, a band-pass filter determination unit 602, and a counter attack unit 603.
The interference signal obtaining unit 601 is configured to input an initial audio signal to an attack network in a reactive attack model for processing, so as to obtain a first interference signal; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
a band-pass filter determining unit 602, configured to determine a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal, the initial audio signal, and a target audio processing model corresponding to each candidate band-pass filter;
a resistant attack unit 603, configured to perform a resistant attack test on the target audio processing model based on the resistant attack model, where a band-pass filter in the resistant attack model is the target band-pass filter.
In an embodiment of the present disclosure, when determining a target band pass filter from the at least two candidate band pass filters according to the second interference signal corresponding to each candidate band pass filter, the initial audio signal, and the target audio processing model, the band pass filter determination unit 602 is configured to:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
In an embodiment of the present disclosure, when determining the target band pass filter from the at least two candidate band pass filters according to the attack performance parameter corresponding to each candidate band pass filter, the band pass filter determining unit 602 is configured to:
sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter; or
And determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
In one embodiment of the present disclosure, the attack performance parameters include attack success rate and/or disturbance scale against audio signals.
In one embodiment of the present disclosure, the apparatus further comprises a band pass filter construction module for:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
In an embodiment of the present disclosure, the band pass filter constructing module, when constructing the at least two candidate band pass filters based on at least two preset pass bands and a hamming window, is configured to:
constructing the alternative band pass filter by the following formula:
Figure BDA0002701836460000141
Figure BDA0002701836460000142
g[n,f1,f2]=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
In one embodiment of the disclosure, the apparatus further comprises a training module to:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
In an embodiment of the present disclosure, the adversarial attack unit 603, when performing an adversarial attack test on the target audio processing model based on the adversarial attack model, is configured to:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
The testing device for audio anti-disturbance provided by this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principle and technical effects are similar, and this embodiment is not described herein again.
The test equipment for audio anti-disturbance provided by the embodiment obtains a first interference signal by inputting an initial audio signal into an attack network in an anti-attack model for processing; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter, wherein any two alternative band-pass filters in the at least two alternative band-pass filters have different passbands; determining a target band-pass filter from at least two alternative band-pass filters according to a second interference signal, an initial audio signal and a target audio processing model corresponding to each alternative band-pass filter; and performing a reactive attack test on the target audio processing model based on the reactive attack model, wherein the band-pass filter in the reactive attack model is the target band-pass filter. In the embodiment, at least two alternative band pass filters are configured in the adversarial attack model, and the target band pass filter of the adversarial attack model is determined, so that the adversarial attack model can generate a proper counterattack audio sample for the target audio processing model, thereby achieving the required performance of the adversarial attack, improving the success rate of the adversarial attack on the target audio processing model, and providing a basis for the optimization of the target audio processing model.
Referring to fig. 7, a schematic structural diagram of an electronic device 700 suitable for implementing the embodiment of the present disclosure is shown, where the electronic device 700 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Part (ASSP), a System On Chip (SOC), a Complex Programmable Logic Device (CPLD), and so on.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a method for testing audio immunity against disturbance, including:
inputting the initial audio signal into an attack network in a antagonism attack model for processing to obtain a first interference signal;
respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
determining a target band-pass filter from the at least two alternative band-pass filters according to the second interference signal corresponding to each alternative band-pass filter, the initial audio signal and a target audio processing model;
and carrying out a reactive attack test on the target audio processing model based on the reactive attack model, wherein a band-pass filter in the reactive attack model is the target band-pass filter.
According to one or more embodiments of the present disclosure, the determining a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model includes:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
According to one or more embodiments of the present disclosure, determining the target band-pass filter from the at least two candidate band-pass filters according to the attack performance parameter corresponding to each candidate band-pass filter includes:
sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter; or
And determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
According to one or more embodiments of the present disclosure, the attack performance parameters include attack success rate and/or disturbance scale parameters against the audio signal.
According to one or more embodiments of the present disclosure, the method further comprises:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
According to one or more embodiments of the present disclosure, the constructing the at least two alternative band pass filters based on at least two preset pass bands and a hamming window includes:
constructing the alternative band pass filter by the following formula:
Figure BDA0002701836460000201
Figure BDA0002701836460000202
g[n,f1,f2|=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
According to one or more embodiments of the present disclosure, the method further comprises:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
According to one or more embodiments of the present disclosure, the performing a counter attack test on the target audio processing model based on the counter attack model includes:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
According to one or more embodiments of the present disclosure, there is provided an audio anti-disturbance test apparatus, including:
the system comprises an interference signal acquisition unit, a correlation attack model acquisition unit and a correlation attack model acquisition unit, wherein the interference signal acquisition unit is used for inputting an audio signal into an attack network in the correlation attack model for processing to obtain a first interference signal; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
a band-pass filter determining unit, configured to determine a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model;
and the antagonism attack unit is used for carrying out antagonism attack test on the target audio processing model based on the antagonism attack model, wherein a band-pass filter in the antagonism attack model is the target band-pass filter.
According to one or more embodiments of the present disclosure, when the band pass filter determining unit determines the target band pass filter from the at least two candidate band pass filters according to the second interference signal corresponding to each candidate band pass filter, the initial audio signal, and the target audio processing model, the band pass filter determining unit is configured to:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
According to one or more embodiments of the present disclosure, when the band pass filter determining unit determines the target band pass filter from the at least two candidate band pass filters according to the attack performance parameter corresponding to each candidate band pass filter, the band pass filter determining unit is configured to:
sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter; or
And determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
According to one or more embodiments of the present disclosure, the attack performance parameters include attack success rate and/or disturbance scale parameters against the audio signal.
In accordance with one or more embodiments of the present disclosure, the apparatus further comprises a band pass filter construction module to:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
According to one or more embodiments of the present disclosure, the band pass filter construction module, when constructing the at least two candidate band pass filters based on at least two preset pass bands and a hamming window, is configured to:
constructing the alternative band pass filter by the following formula:
Figure BDA0002701836460000221
Figure BDA0002701836460000222
g[n,f1,f2|=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
In accordance with one or more embodiments of the present disclosure, the apparatus further comprises a training module to:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
According to one or more embodiments of the present disclosure, the adversarial attack unit, when performing an adversarial attack test on the target audio processing model based on the adversarial attack model, is configured to:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform operations such asOn the upper partThe method embodiments described above and the various possible designs in fig. 2, 3 and 5 describe a method for testing audio frequency against disturbances.
According to one or more embodiments of the present disclosure, a computer-readable storage medium is provided, in which computer-executable instructions are stored, and when executed by a processor, implement the method for testing audio immunity against disturbance as described in the method embodiments and the various possible designs in fig. 2, 3 and 5.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (12)

1. A method for testing audio immunity to perturbations, comprising:
inputting the initial audio signal into an attack network in a antagonism attack model for processing to obtain a first interference signal;
respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter, wherein any two alternative band-pass filters in the at least two alternative band-pass filters have different pass bands;
determining a target band-pass filter from the at least two alternative band-pass filters according to the second interference signal corresponding to each alternative band-pass filter, the initial audio signal and a target audio processing model;
and carrying out a reactive attack test on the target audio processing model based on the reactive attack model, wherein a band-pass filter in the reactive attack model is the target band-pass filter.
2. The method according to claim 1, wherein the determining a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model comprises:
fusing the second interference signal corresponding to each alternative band-pass filter with the initial audio signal to obtain a countersound audio signal corresponding to each alternative band-pass filter;
performing a countermeasure attack test on the target audio processing model according to the countermeasure audio signal corresponding to each alternative band-pass filter, and acquiring an attack performance parameter corresponding to each alternative band-pass filter;
and determining the target band-pass filter from the at least two alternative band-pass filters according to the attack performance parameters corresponding to each alternative band-pass filter.
3. The method according to claim 2, wherein the determining the target band-pass filter from the at least two candidate band-pass filters according to the attack performance parameter corresponding to each candidate band-pass filter comprises:
and sequencing the at least two alternative band-pass filters according to the attack performance parameters corresponding to each of the at least two alternative band-pass filters, and determining a target band-pass filter.
4. The method according to claim 2, wherein the determining the target band-pass filter from the at least two candidate band-pass filters according to the attack performance parameter corresponding to each candidate band-pass filter comprises:
and determining the alternative band-pass filter of which the corresponding attack performance parameter meets the preset condition as the target band-pass filter.
5. The method according to any of claims 2-4, wherein the attack performance parameters comprise attack success rate and/or disturbance size against the audio signal parameters.
6. The method according to any one of claims 1-4, further comprising:
and constructing the at least two alternative band-pass filters based on at least two preset pass bands and a Hamming window.
7. The method according to claim 5, wherein constructing the at least two alternative band pass filters based on the at least two preset pass bands and the Hamming window comprises:
constructing the alternative band pass filter by the following formula:
Figure FDA0002701836450000021
Figure FDA0002701836450000022
g[n,f1,f2]=2f2sinc(2πf2n)-2f1sinc(2πf1n) (3)
wherein h (t) represents a first interference signal input by the alternative band-pass filter, and h' (t) represents a second interference signal output by the alternative band-pass filter; sinx (x) sinx/x; f. of1And f2Is the lower cut-off frequency and the upper cut-off frequency of the pass band; w represents a hamming window; n is the time stamp of the audio signal in the hamming window, n-16, -15, -14,.., 14,15, 16.
8. The method according to any one of claims 1-4, further comprising:
acquiring training data aiming at an initial attack network;
and training the initial attack network based on the training data and the gradient back propagation process to obtain the attack network of the resistant attack model.
9. The method according to any of claims 1-4, wherein said performing a counter attack test on said target audio processing model based on said counter attack model comprises:
acquiring an audio signal for a reactive attack test, inputting the audio signal into the reactive attack model with the determined target band-pass filter, and obtaining a target interference signal through an attack network and the target band-pass filter;
fusing the target interference signal with the audio signal of the antagonistic attack test to obtain a target antagonistic audio signal;
and carrying out adversarial attack test on the target audio processing model according to the target adversarial audio signal.
10. An audio anti-disturbance test apparatus, comprising:
the system comprises an interference signal acquisition unit, a correlation attack model acquisition unit and a correlation attack model acquisition unit, wherein the interference signal acquisition unit is used for inputting an initial audio signal into an attack network in the correlation attack model for processing to obtain a first interference signal; respectively filtering the first interference signal by adopting at least two alternative band-pass filters to obtain a second interference signal corresponding to each alternative band-pass filter; wherein any two of the at least two alternative bandpass filters have different passbands;
a band-pass filter determining unit, configured to determine a target band-pass filter from the at least two candidate band-pass filters according to the second interference signal corresponding to each candidate band-pass filter, the initial audio signal, and a target audio processing model;
and the antagonism attack unit is used for carrying out antagonism attack test on the target audio processing model based on the antagonism attack model, wherein a band-pass filter in the antagonism attack model is the target band-pass filter.
11. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of any of claims 1-9.
12. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1 to 9.
CN202011024815.5A 2020-09-25 2020-09-25 Audio countermeasure disturbance testing method, device and storage medium Active CN112216296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011024815.5A CN112216296B (en) 2020-09-25 2020-09-25 Audio countermeasure disturbance testing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011024815.5A CN112216296B (en) 2020-09-25 2020-09-25 Audio countermeasure disturbance testing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN112216296A true CN112216296A (en) 2021-01-12
CN112216296B CN112216296B (en) 2023-09-22

Family

ID=74051128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011024815.5A Active CN112216296B (en) 2020-09-25 2020-09-25 Audio countermeasure disturbance testing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN112216296B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661940A (en) * 2022-01-28 2022-06-24 宁波大学 Method for rapidly acquiring voice countermeasure sample under black box attack
CN117877506A (en) * 2024-03-11 2024-04-12 北京建筑大学 Method, device and system for enhancing resistance attack on voice content

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144138A (en) * 2013-05-10 2014-11-12 钜泉光电科技(上海)股份有限公司 Self-adaptive single-frequency narrow-band interference trapped wave filtering device and double-frequency filtering equipment
US20180190280A1 (en) * 2016-12-29 2018-07-05 Baidu Online Network Technology (Beijing) Co., Ltd. Voice recognition method and apparatus
CN108711436A (en) * 2018-05-17 2018-10-26 哈尔滨工业大学 Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic
US20190043471A1 (en) * 2018-08-31 2019-02-07 Intel Corporation Ultrasonic attack prevention for speech enabled devices
CN109473091A (en) * 2018-12-25 2019-03-15 四川虹微技术有限公司 A kind of speech samples generation method and device
CN109599109A (en) * 2018-12-26 2019-04-09 浙江大学 For the confrontation audio generation method and system of whitepack scene
CN109887496A (en) * 2019-01-22 2019-06-14 浙江大学 Orientation confrontation audio generation method and system under a kind of black box scene
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144138A (en) * 2013-05-10 2014-11-12 钜泉光电科技(上海)股份有限公司 Self-adaptive single-frequency narrow-band interference trapped wave filtering device and double-frequency filtering equipment
US20180190280A1 (en) * 2016-12-29 2018-07-05 Baidu Online Network Technology (Beijing) Co., Ltd. Voice recognition method and apparatus
CN108711436A (en) * 2018-05-17 2018-10-26 哈尔滨工业大学 Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic
US20190043471A1 (en) * 2018-08-31 2019-02-07 Intel Corporation Ultrasonic attack prevention for speech enabled devices
CN109473091A (en) * 2018-12-25 2019-03-15 四川虹微技术有限公司 A kind of speech samples generation method and device
CN109599109A (en) * 2018-12-26 2019-04-09 浙江大学 For the confrontation audio generation method and system of whitepack scene
CN109887496A (en) * 2019-01-22 2019-06-14 浙江大学 Orientation confrontation audio generation method and system under a kind of black box scene
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAKURA H: "" Robust Audio Adversarial Example for a Physical Attack"", 《ARXIV:1810.11793V4》 *
陈晋音;叶林辉;郑海斌;杨奕涛;俞山青;: "面向语音识别***的黑盒对抗攻击方法", 小型微型计算机*** *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661940A (en) * 2022-01-28 2022-06-24 宁波大学 Method for rapidly acquiring voice countermeasure sample under black box attack
CN114661940B (en) * 2022-01-28 2023-08-08 宁波大学 Method suitable for quickly acquiring voice countermeasure sample under black box attack
CN117877506A (en) * 2024-03-11 2024-04-12 北京建筑大学 Method, device and system for enhancing resistance attack on voice content
CN117877506B (en) * 2024-03-11 2024-05-10 北京建筑大学 Method, device and system for enhancing resistance attack on voice content

Also Published As

Publication number Publication date
CN112216296B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
KR102262686B1 (en) Voice quality evaluation method and voice quality evaluation device
WO2021004247A1 (en) Method and apparatus for generating video cover and electronic device
CN110600059B (en) Acoustic event detection method and device, electronic equipment and storage medium
CN104422922A (en) Method and device for realizing sound source localization by utilizing mobile terminal
CN112216296B (en) Audio countermeasure disturbance testing method, device and storage medium
CN111028845A (en) Multi-audio recognition method, device, equipment and readable storage medium
CN110992963A (en) Network communication method, device, computer equipment and storage medium
CN111582090A (en) Face recognition method and device and electronic equipment
CN110070884B (en) Audio starting point detection method and device
CN111883117B (en) Voice wake-up method and device
CN113192528B (en) Processing method and device for single-channel enhanced voice and readable storage medium
CN113205820B (en) Method for generating voice coder for voice event detection
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN114187922A (en) Audio detection method and device and terminal equipment
CN116913258B (en) Speech signal recognition method, device, electronic equipment and computer readable medium
CN114495901A (en) Speech synthesis method, speech synthesis device, storage medium and electronic equipment
CN112735466B (en) Audio detection method and device
WO2021212985A1 (en) Method and apparatus for training acoustic network model, and electronic device
CN111128131B (en) Voice recognition method and device, electronic equipment and computer readable storage medium
Gao et al. Device-independent smartphone eavesdropping jointly using accelerometer and gyroscope
CN112382266A (en) Voice synthesis method and device, electronic equipment and storage medium
CN116884402A (en) Method and device for converting voice into text, electronic equipment and storage medium
CN111312223A (en) Training method and device of voice segmentation model and electronic equipment
CN112542157A (en) Voice processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant