CN109887496A - Orientation confrontation audio generation method and system under a kind of black box scene - Google Patents
Orientation confrontation audio generation method and system under a kind of black box scene Download PDFInfo
- Publication number
- CN109887496A CN109887496A CN201910060662.0A CN201910060662A CN109887496A CN 109887496 A CN109887496 A CN 109887496A CN 201910060662 A CN201910060662 A CN 201910060662A CN 109887496 A CN109887496 A CN 109887496A
- Authority
- CN
- China
- Prior art keywords
- audio
- orientation
- black box
- particle
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the orientation confrontation audio generation methods and system that resisting sample generation technique field, disclose under a kind of black box scene.Wherein method includes: (1) selection target black box speech recognition modeling as audio identification model, selects source audio and sets target of attack;(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;(3) extract resampling after source audio MFCC feature;(4) the MFCC feature is identified using audio identification model, obtains recognition result;(5) objective function is set, makes the smallest optimum noise of target function value using particle swarm algorithm searching, optimum noise is superimposed with source audio, the orientation that recognition result is target of attack is obtained and fights audio.This method can reach and speech recognition modeling is allowed to be identified as specific content by adding small disturbance in source audio.
Description
Technical field
The present invention relates to raw to the orientation confrontation audio under resisting sample generation technique field more particularly to a kind of black box scene
At method and system.
Background technique
Speech recognition is just with the gesture of boundless in occupation of the high point in an intelligent epoch.One tune of Google, U.S. publication
It looks into report to show, in the teenager between 13 years old to 18 years old, will use the number ratio of phonetic search daily be about 55%.With
The development of the technologies such as big data, machine learning, cloud computing, artificial intelligence, speech recognition in the both hands for liberating user step by step,
The gesture of voice input frame also big substituted mouse, keyboard.Along with the universal of Intelligent mobile equipment, interactive voice is as a kind of new
The man-machine interaction mode of type just increasingly causes the attention of entire IT industry circle.
Although the development of artificial intelligence technology largely improves speech recognition modeling accuracy rate, artificial intelligence
Mysterious internal mechanism is also that practical application has buried many security risks.Usually in planing machine learning system, it is
The system for guaranteeing design is safe, reliable and result can achieve the desired results, we would generally consider specifically
Threat modeling, these models are the attacking ability and attack for the attacker for making our machine learning system malfunction those attempts
The hypothesis of target.So far, existing most of machine learning model is designed both for a very weak threat modeling
It realizes, there is no worry about attacker.Although these models can have very perfect table when in face of naturally input
It is existing but nearest the study found that even if the model of function admirable is vulnerable to attack resisting sample --- add in the sample
After the imperceptible small sample perturbations of human eye, sample can be classified mistake with very high confidence level.If be classified as to resisting sample
The classification that attacker specifies, then being just referred to as orientation to resisting sample.
What most of current existing work considered is the generation for fighting image, and the rare people of confrontation audio studies, especially
Orientation under black box scene fights audio.Under black box scene, attacker is unknown to the inside structure and ginseng of the model of attack
Number, can only obtain the probability that input data is classified as each classification.Very due to attacker knows under this scene information
Limited, there is presently no people to study the confrontation audio generation method of the orientation under black box scene.In view of speech recognition modeling exists
It is generally under black box scene when being applied in real life, therefore studies the formation mechanism of black box confrontation audio sample for research
Corresponding defence method is very necessary with the robustness for enhancing the speech recognition modeling in practical application.
Summary of the invention
The present invention provides the orientations under a kind of black box scene to fight audio generation method, and this method can be by source sound
Small disturbance is added on frequency to achieve the purpose that allow speech recognition modeling to be identified as specific content.
Specific technical solution is as follows:
A kind of orientation under black box scene fights audio generation method, comprising the following steps:
(1) selection target black box speech recognition modeling selects source audio and sets target of attack as audio identification model;
(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;
(3) extract resampling after source audio MFCC feature;
(4) the MFCC feature is identified using audio identification model, obtains recognition result;
(5) objective function is set, makes the smallest optimum noise of target function value using particle swarm algorithm searching, will most preferably make an uproar
Sound is superimposed with source audio, is obtained the orientation that recognition result is target of attack and is fought audio.
The black box speech recognition modeling refers to the speech recognition modeling of unknown parameters.Black box speech recognition of the invention
Model is to be classified and exported the fixed model of classification to voice, such as order word identification model.Target of attack refers to black box language
Sound identification model sounds " no " the expection recognition result of orientation confrontation audio for example, orientation fights audio in human ear, and
The recognition result of black box speech recognition modeling is " yes ", and " yes " is its target of attack.
In step (3), the MFCC feature is mel cepstrum coefficients.Since MFCC simulates human ear to a certain extent
To the processing feature of voice, the research achievement of human auditory system perceptible aspect is applied, voice is helped to improve using this technology
The performance of identifying system.
Step (3) includes:
(3-1) carries out preemphasis processing to pretreated audio, and the frequency spectrum of audio is made to become flat;
Audio is divided into several frames after (3-2), and by each frame multiplied by Hamming window;
(3-3) carries out Fast Fourier Transform (FFT) to each frame audio, obtains the frequency spectrum of each frame audio, obtains from the frequency spectrum of audio
The energy spectrum of audio;
The energy spectrum of audio is passed through the triangle filter group of one group of Mel scale by (3-4);
The logarithmic energy that (3-5) calculates each triangle filter output is obtained by logarithmic energy through discrete cosine transform
The Mel-scaleCepstrum parameter of MFCC coefficient order rank;Extract the dynamic difference parameter of audio;
(3-6) obtains MFCC feature.
Preferably, the parameter in MFCC feature extraction are as follows: pre-emphasis parameters 0.97;512 sampled points are a frame, frame with
Overlapping region between frame includes 171 sampled points, and adding window parameter is 0.46;Fast Fourier Transform points are 512;Triangle
Number of filter is 26;MFCC order is 16.
In step (5), optimal noise δ, quilt after optimal noise δ is superimposed with source audio are found in aiming at for particle swarm algorithm
Audio identification model is identified as target of attack.
In step (5), the objective function are as follows:
Wherein, x is source audio, pi(i=1 ..., N) is i-th of particle, and N is positive integer;f(x+pi)jFor audio identification
Model is for input x+piOutput is the probability of jth class result;T is target of attack, f (x+pi)tIt is audio identification model for defeated
Enter x+piOutput is the probability of t;Parameter κ is the constant less than or equal to 0.Parameter κ is the confidence level for controlling misclassification, smaller
κ mean generate orientation confrontation audio will be identified as t with higher confidence level, that is, generate orientation confrontation audio
Attack effect it is better.
In step (5), make the smallest optimum noise of target function value using particle swarm algorithm searching, comprising:
The number of iterations is initialized as 0 by (5-1), is uniformly distributed and is generated N number of particle pi(i=1 ..., N), the length of particle
It is identical as source audio length;
(5-2) is by each particle piIt is superimposed respectively with source audio x, obtains N number of audio x+pi;
(5-3) extracts audio x+piMFCC feature, using audio identification model to audio x+piMFCC feature known
Not, each audio x+p is obtainediRecognition result, and calculate its target function value g (x+pi);
Any audio x+p if it existsiRecognition result be target of attack, then success attack, particle piAs optimum noise;
Otherwise, step (5-4) is executed;
The number of iterations is added 1 by (5-4), is uniformly distributed and is generated N-1 particle pi(i=1 ..., N-1), and by last round of time
In with minimum target functional value particle be added, the seed as next round iteration;
Step (5-2)~(5-3) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as
Optimum noise;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, attacks and fail.
The present invention also provides the orientation confrontation audios under a kind of black box scene to generate system, comprising:
Data preprocessing module carries out resampling to source audio data, and the sample rate of source audio is made to meet the knowledge of black box voice
Requirement of the other model to input audio sample rate;
Audio feature extraction module extracts the MFCC feature of audio data;
Audio identification module, has black box speech recognition modeling, and the black box speech recognition modeling is special to the MFCC of audio
Sign is identified, recognition result is obtained;
Particle group optimizing module has objective function, finds optimal noise using particle swarm algorithm, optimal noise is added
Source audio obtains orientation confrontation audio.
The orientation confrontation audio generates system and generates orientation confrontation sound using the orientation confrontation audio generation method
Frequently.
Compared with prior art, the invention has the benefit that
Orientation confrontation audio generation method of the invention can generate a kind of confrontation audio for adding small sample perturbations, in human ear
It is the same for sounding the content of this confrontation audio and original audio, and speech recognition modeling can be by confrontation audio identification in addition
Specific content.The it is proposed of this confrontation audio provides to analyse in depth the fragility of the speech recognition modeling based on deep learning
How basis defends confrontation audio convenient for follow-up study, improves the robustness of speech recognition modeling.
Detailed description of the invention
Fig. 1 is the configuration diagram that orientation confrontation audio generates system;
Fig. 2 is the flow diagram of orientation confrontation audio generation method;
Fig. 3 is the flow diagram of MFCC feature extraction;
Fig. 4 is the flow diagram that optimum noise is found using particle swarm algorithm.
Specific embodiment
Present invention is further described in detail with reference to the accompanying drawings and examples, it should be pointed out that reality as described below
It applies example to be intended to convenient for the understanding of the present invention, and does not play any restriction effect to it.
It includes four modules that orientation confrontation audio under black box scene based on particle swarm algorithm of the invention, which generates system:
Data preprocessing module, characteristic extracting module, audio identification module, objective function optimization module, system architecture such as Fig. 1 institute
Show.
The process that orientation confrontation audio generates system generation orientation confrontation audio is as shown in Figure 2.Assuming that existing black box language
Sound identification model, user want one human ear of generation and sound " no " (target text) but be identified as " yes " by this model
1 second 12kHz sample rate, duration audio of (original audio), whole flow process are as follows:
(1) human ear is sounded " no " as black box speech recognition modeling by the order word identification model for providing Google
Audio as original audio, " yes " is used as target of attack;
(2) sample rate to input audio of order word identification model requires to be 16kHz, according to order word identification model
Input requirements, data preprocessing module carry out resampling, i.e., the sound for being 16kHz by the audio resampling of 12kHz to original audio
Frequently;
(3) MFCC feature extraction is carried out to pretreated audio, the process of MFCC feature extraction is as shown in Figure 3.Specifically
Extraction process is as follows:
(i) preemphasis is handled.Firstly, voice signal is then tied after preemphasis is handled by a high-pass filter
Fruit is y (n)=x (n)-ax (n-1), and wherein x (n) is n moment speech sample value, and a is pre emphasis factor, is usually arranged as
0.97.Preemphasis purpose is to eliminate the effect of vocal cords and lip in voiced process, to compensate voice signal by articulatory system
The high frequency section inhibited, while the formant of prominent high frequency.
(ii) framing adding window.After the completion of preemphasis, needs to carry out sub-frame processing to audio, i.e., adopt every 512 of audio
Sampling point assembles a frame, and the overlapping region between frame and frame includes 171 sampled points, then by each frame after framing multiplied by the Chinese
Bright window is to increase continuity of the frame left end to right end, adding window parameter a=0.46.
(iii) Fast Fourier Transform (FFT).After the completion of framing adding window, Fast Fourier Transform (FFT) is carried out to each frame signal and is obtained respectively
The frequency spectrum of frame.Then to the frequency spectrum modulus square of voice signal (square to take absolute value) and divided by the points of Fourier transformation
The energy spectrum of voice signal is obtained, usual Fourier transformation points are set as 512.
(iv) triangle bandpass filtering.The triangle filter group that energy spectrum is passed through to one group of Mel scale carries out energy spectrum
Smoothly, and the effect of harmonic carcellation, the formant of original voice is highlighted.Triangle bandpass filter number is 26.
(v) logarithmic energy of filter output is calculated.Firstly, calculating the logarithmic energy s (m) of each filter output, so
After will calculate resulting logarithmic energy and substitute into discrete cosine transform, find out MFCC coefficientWherein M is triangular filter number, is 26;N
For Fourier transformation points, L is MFCC coefficient order, takes 16.
(vi) extraction of dynamic difference parameter.The cepstrum parameter MFCC of standard has only reacted the static characteristic of speech parameter.
We can describe the dynamic characteristic of voice by extracting dynamic difference parameter.
Dynamic difference parameter calculates as follows:
Wherein, dtIndicate t-th of first-order difference parameter, CtIndicate that t-th of cepstrum coefficient, Q indicate the order of cepstrum coefficient, K
Indicate the time difference (can value 1 or 2) of first derivative.dtThe second differnce parameter of MFCC can be obtained in formula iteration twice.
(4) identify that obtaining recognition result is to the MFCC feature of extraction with the order word identification model that Google provides
" no ", the confidence level of recognition result are 0.9;
(5) make the smallest disturbance of target function value using particle swarm algorithm searching, specifically:
In order to make population mobile towards the direction for maximizing target category probability, objective function is arranged are as follows:
Wherein, x represents the audio of input, pi(i=1 ..., N) represents particle i, shares N number of particle.f(x+pi)jIt represents
Speech recognition modeling is for input x+piThe probability of the jth class of output;T is enabled to represent the classification that attacker specifies, then f (x+pi) t generation
Input is classified as the probability of t by table speech model;Parameter κ is the confidence level for controlling misclassification, and value is less than or equal to 0, compared with
Small κ means that the confrontation audio generated will be identified as t with higher confidence level, that is, the attack of the confrontation audio generated
Effect is better.
According to given parameter (κ=0), objective function is minimized using particle swarm algorithm, as shown in figure 4, specifically including:
(a) the number of iterations is initialized as 0 first, and is uniformly distributed from [- 1,1] and generates 25 particle random sequences, grain
The length of son is identical as original audio length, is 16000 points;
(b) by each particle piIt is added on original audio x respectively and obtains 25 new audio x+pi, repeat step (3) and
(4), each x+p is recordediRecognition result f (x+pi) and calculate its target function value g (x+pi);
(c) if there is any x+piRecognition result be " yes ", then success attack, and particle piIt is expected best
Noise δ;
Otherwise, (d) is thened follow the steps;
(d) the number of iterations is added 1, is uniformly distributed from [- 1,1] and generates 24 particles, and will there is minimum in a upper coherence
The particle of target function value is added, the seed as next round iteration;
Step (b)~(c) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as expected
Optimum noise δ;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, then it represents that attack failure;
(6) optimum noise δ is superimposed with original audio x, just obtains confrontation audio, i.e., audio sound in human ear be
" no ", but " yes " is identified as by speech recognition modeling.
Technical solution of the present invention and beneficial effect is described in detail in embodiment described above, it should be understood that
Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in spirit of the invention
Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.
Claims (7)
1. the orientation under a kind of black box scene fights audio generation method, which comprises the following steps:
(1) selection target black box speech recognition modeling selects source audio and sets target of attack as audio identification model;
(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;
(3) extract resampling after source audio MFCC feature;
(4) the MFCC feature is identified using audio identification model, obtains recognition result;
(5) set objective function, using particle swarm algorithm searching make the smallest optimum noise of target function value, by optimum noise with
Source audio superposition obtains the orientation that recognition result is target of attack and fights audio.
2. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that described is black
Box speech recognition modeling is to be classified and exported the fixed speech recognition modeling of classification to voice.
3. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that step (3)
Include:
(3-1) carries out preemphasis processing to pretreated audio, and the frequency spectrum of audio is made to become flat;
Audio is divided into several frames after (3-2), and by each frame multiplied by Hamming window;
(3-3) carries out Fast Fourier Transform (FFT) to each frame audio, obtains the frequency spectrum of each frame audio, obtains audio from the frequency spectrum of audio
Energy spectrum;
The energy spectrum of audio is passed through the triangle filter group of one group of Mel scale by (3-4);
The logarithmic energy that (3-5) calculates each triangle filter output obtains MFCC by logarithmic energy through discrete cosine transform
The Mel-scaleCepstrum parameter of coefficient order rank;Extract the dynamic difference parameter of audio;
(3-6) obtains MFCC feature.
4. the orientation under black box scene according to claim 3 fights audio generation method, which is characterized in that MFCC feature
Parameter in extraction are as follows: pre-emphasis parameters 0.97;512 sampled points are a frame, and the overlapping region between frame and frame includes 171
A sampled point, adding window parameter are 0.46;Fast Fourier Transform points are 512;Triangle filter number is 26;MFCC order
It is 16.
5. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that the mesh
Scalar functions are as follows:
Wherein, x is source audio, pi(i=1 ..., N) is i-th of particle, and N is positive integer;f(x+pi)jFor audio identification model
For input x+piOutput is the probability of jth class result;T is target of attack, f (x+pi)tIt is audio identification model for input x+
piOutput is the probability of t;Parameter κ is the constant less than or equal to 0.
6. the orientation under black box scene according to claim 5 fights audio generation method, which is characterized in that step (5)
In, make the smallest optimum noise of target function value using particle swarm algorithm searching, comprising:
The number of iterations is initialized as 0 by (5-1), is uniformly distributed and is generated N number of particle pi(i=1 ..., N), the length of particle and source
Audio length is identical;
(5-2) is by each particle piIt is superimposed respectively with source audio x, obtains N number of audio x+pi;
(5-3) extracts audio x+piMFCC feature, using audio identification model to audio x+piMFCC feature identified,
Obtain each audio x+piRecognition result, and calculate its target function value g (x+pi);
Any audio x+p if it existsiRecognition result be target of attack, then success attack, particle piAs optimum noise;
Otherwise, step (5-4) is executed;
The number of iterations is added 1 by (5-4), is uniformly distributed and is generated N-1 particle pi(i=1 ..., N-1), and will have in last round of time
There is the particle of minimum target functional value to be added, the seed as next round iteration;
Step (5-2)~(5-3) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as most preferably
Noise;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, attacks and fail.
7. the orientation confrontation audio under a kind of black box scene generates system characterized by comprising
Data preprocessing module carries out resampling to source audio data, the sample rate of source audio is made to meet black box speech recognition mould
Requirement of the type to input audio sample rate;
Audio feature extraction module extracts the MFCC feature of audio data;
Audio identification module, has a black box speech recognition modeling, the black box speech recognition modeling to the MFCC feature of audio into
Row identification, obtains recognition result;
Particle group optimizing module has objective function, finds optimal noise using particle swarm algorithm, and source sound is added in optimal noise
Frequently, orientation confrontation audio is obtained.
The orientation confrontation audio generates system and generates orientation confrontation audio using the orientation confrontation audio generation method;
The orientation confrontation audio generates system and fights audio generation side using the described in any item orientations of claim 1~6
Method generates orientation confrontation audio.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910060662.0A CN109887496A (en) | 2019-01-22 | 2019-01-22 | Orientation confrontation audio generation method and system under a kind of black box scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910060662.0A CN109887496A (en) | 2019-01-22 | 2019-01-22 | Orientation confrontation audio generation method and system under a kind of black box scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109887496A true CN109887496A (en) | 2019-06-14 |
Family
ID=66926610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910060662.0A Pending CN109887496A (en) | 2019-01-22 | 2019-01-22 | Orientation confrontation audio generation method and system under a kind of black box scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109887496A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110444208A (en) * | 2019-08-12 | 2019-11-12 | 浙江工业大学 | A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm |
CN110767216A (en) * | 2019-09-10 | 2020-02-07 | 浙江工业大学 | Voice recognition attack defense method based on PSO algorithm |
CN110992951A (en) * | 2019-12-04 | 2020-04-10 | 四川虹微技术有限公司 | Method for protecting personal privacy based on countermeasure sample |
CN111341327A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Speaker voice recognition method, device and equipment based on particle swarm optimization |
CN111710327A (en) * | 2020-06-12 | 2020-09-25 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model training and sound data processing |
CN112216296A (en) * | 2020-09-25 | 2021-01-12 | 脸萌有限公司 | Audio anti-disturbance testing method and device and storage medium |
CN113345420A (en) * | 2021-06-07 | 2021-09-03 | 河海大学 | Countermeasure audio generation method and system based on firefly algorithm and gradient evaluation |
CN113362822A (en) * | 2021-06-08 | 2021-09-07 | 北京计算机技术及应用研究所 | Black box voice confrontation sample generation method with auditory masking |
WO2021212675A1 (en) * | 2020-04-21 | 2021-10-28 | 清华大学 | Method and apparatus for generating adversarial sample, electronic device and storage medium |
CN114627858A (en) * | 2022-05-09 | 2022-06-14 | 杭州海康威视数字技术股份有限公司 | Intelligent voice recognition security defense method and device based on particle swarm optimization |
CN116758899A (en) * | 2023-08-11 | 2023-09-15 | 浙江大学 | Speech recognition model safety assessment method based on semantic space disturbance |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096955A (en) * | 2015-09-06 | 2015-11-25 | 广东外语外贸大学 | Speaker rapid identification method and system based on growing and clustering algorithm of models |
CN105139857A (en) * | 2015-09-02 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Countercheck method for automatically identifying speaker aiming to voice deception |
US10007498B2 (en) * | 2015-12-17 | 2018-06-26 | Architecture Technology Corporation | Application randomization mechanism |
CN108446700A (en) * | 2018-03-07 | 2018-08-24 | 浙江工业大学 | A kind of car plate attack generation method based on to attack resistance |
CN108520268A (en) * | 2018-03-09 | 2018-09-11 | 浙江工业大学 | The black box antagonism attack defense method evolved based on samples selection and model |
CN109036385A (en) * | 2018-10-19 | 2018-12-18 | 北京旋极信息技术股份有限公司 | A kind of voice instruction recognition method, device and computer storage medium |
-
2019
- 2019-01-22 CN CN201910060662.0A patent/CN109887496A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139857A (en) * | 2015-09-02 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Countercheck method for automatically identifying speaker aiming to voice deception |
CN105096955A (en) * | 2015-09-06 | 2015-11-25 | 广东外语外贸大学 | Speaker rapid identification method and system based on growing and clustering algorithm of models |
US10007498B2 (en) * | 2015-12-17 | 2018-06-26 | Architecture Technology Corporation | Application randomization mechanism |
CN108446700A (en) * | 2018-03-07 | 2018-08-24 | 浙江工业大学 | A kind of car plate attack generation method based on to attack resistance |
CN108520268A (en) * | 2018-03-09 | 2018-09-11 | 浙江工业大学 | The black box antagonism attack defense method evolved based on samples selection and model |
CN109036385A (en) * | 2018-10-19 | 2018-12-18 | 北京旋极信息技术股份有限公司 | A kind of voice instruction recognition method, device and computer storage medium |
Non-Patent Citations (2)
Title |
---|
MOUSTAFA ALZANTOT: "Did you hear that? Adversarial Examples AgainstAutomatic Speech Recognition", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017), LONG BEACH, CA, USA》 * |
PIN-YU CHEN: "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models", 《2017 ASSOCIATION FOR COMPUTING MACHINERY》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110444208A (en) * | 2019-08-12 | 2019-11-12 | 浙江工业大学 | A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm |
CN110767216A (en) * | 2019-09-10 | 2020-02-07 | 浙江工业大学 | Voice recognition attack defense method based on PSO algorithm |
CN110992951A (en) * | 2019-12-04 | 2020-04-10 | 四川虹微技术有限公司 | Method for protecting personal privacy based on countermeasure sample |
CN111341327A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Speaker voice recognition method, device and equipment based on particle swarm optimization |
WO2021212675A1 (en) * | 2020-04-21 | 2021-10-28 | 清华大学 | Method and apparatus for generating adversarial sample, electronic device and storage medium |
CN111710327B (en) * | 2020-06-12 | 2023-06-20 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model training and sound data processing |
CN111710327A (en) * | 2020-06-12 | 2020-09-25 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for model training and sound data processing |
CN112216296A (en) * | 2020-09-25 | 2021-01-12 | 脸萌有限公司 | Audio anti-disturbance testing method and device and storage medium |
CN112216296B (en) * | 2020-09-25 | 2023-09-22 | 脸萌有限公司 | Audio countermeasure disturbance testing method, device and storage medium |
CN113345420A (en) * | 2021-06-07 | 2021-09-03 | 河海大学 | Countermeasure audio generation method and system based on firefly algorithm and gradient evaluation |
CN113362822A (en) * | 2021-06-08 | 2021-09-07 | 北京计算机技术及应用研究所 | Black box voice confrontation sample generation method with auditory masking |
CN114627858A (en) * | 2022-05-09 | 2022-06-14 | 杭州海康威视数字技术股份有限公司 | Intelligent voice recognition security defense method and device based on particle swarm optimization |
CN116758899A (en) * | 2023-08-11 | 2023-09-15 | 浙江大学 | Speech recognition model safety assessment method based on semantic space disturbance |
CN116758899B (en) * | 2023-08-11 | 2023-10-13 | 浙江大学 | Speech recognition model safety assessment method based on semantic space disturbance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109887496A (en) | Orientation confrontation audio generation method and system under a kind of black box scene | |
CN109599109B (en) | Confrontation audio generation method and system for white-box scene | |
CN110767216B (en) | Voice recognition attack defense method based on PSO algorithm | |
Cui et al. | Data augmentation for deep neural network acoustic modeling | |
Yang et al. | Characterizing speech adversarial examples using self-attention u-net enhancement | |
CN111261147B (en) | Music embedding attack defense method for voice recognition system | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
TW201935464A (en) | Method and device for voiceprint recognition based on memorability bottleneck features | |
WO2020043160A1 (en) | Method and system for detecting voice activity innoisy conditions | |
CN109887484A (en) | A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device | |
CN102779510A (en) | Speech emotion recognition method based on feature space self-adaptive projection | |
CN109887489A (en) | Speech dereverberation method based on the depth characteristic for generating confrontation network | |
CN113362822B (en) | Black box voice confrontation sample generation method with auditory masking | |
Xu et al. | Cross-language transfer learning for deep neural network based speech enhancement | |
CN103985390A (en) | Method for extracting phonetic feature parameters based on gammatone relevant images | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN114783418B (en) | End-to-end voice recognition method and system based on sparse self-attention mechanism | |
CN112183582A (en) | Multi-feature fusion underwater target identification method | |
Yi et al. | Audio deepfake detection: A survey | |
Shi et al. | Fusion feature extraction based on auditory and energy for noise-robust speech recognition | |
CN104952446A (en) | Digital building presentation system based on voice interaction | |
Sun et al. | A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea | |
Huang et al. | Research on robustness of emotion recognition under environmental noise conditions | |
CN111462737B (en) | Method for training grouping model for voice grouping and voice noise reduction method | |
CN113488069B (en) | Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190614 |
|
RJ01 | Rejection of invention patent application after publication |