CN109887496A - Orientation confrontation audio generation method and system under a kind of black box scene - Google Patents

Orientation confrontation audio generation method and system under a kind of black box scene Download PDF

Info

Publication number
CN109887496A
CN109887496A CN201910060662.0A CN201910060662A CN109887496A CN 109887496 A CN109887496 A CN 109887496A CN 201910060662 A CN201910060662 A CN 201910060662A CN 109887496 A CN109887496 A CN 109887496A
Authority
CN
China
Prior art keywords
audio
orientation
black box
particle
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910060662.0A
Other languages
Chinese (zh)
Inventor
纪守领
杜天宇
李进锋
陈建海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910060662.0A priority Critical patent/CN109887496A/en
Publication of CN109887496A publication Critical patent/CN109887496A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the orientation confrontation audio generation methods and system that resisting sample generation technique field, disclose under a kind of black box scene.Wherein method includes: (1) selection target black box speech recognition modeling as audio identification model, selects source audio and sets target of attack;(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;(3) extract resampling after source audio MFCC feature;(4) the MFCC feature is identified using audio identification model, obtains recognition result;(5) objective function is set, makes the smallest optimum noise of target function value using particle swarm algorithm searching, optimum noise is superimposed with source audio, the orientation that recognition result is target of attack is obtained and fights audio.This method can reach and speech recognition modeling is allowed to be identified as specific content by adding small disturbance in source audio.

Description

Orientation confrontation audio generation method and system under a kind of black box scene
Technical field
The present invention relates to raw to the orientation confrontation audio under resisting sample generation technique field more particularly to a kind of black box scene At method and system.
Background technique
Speech recognition is just with the gesture of boundless in occupation of the high point in an intelligent epoch.One tune of Google, U.S. publication It looks into report to show, in the teenager between 13 years old to 18 years old, will use the number ratio of phonetic search daily be about 55%.With The development of the technologies such as big data, machine learning, cloud computing, artificial intelligence, speech recognition in the both hands for liberating user step by step, The gesture of voice input frame also big substituted mouse, keyboard.Along with the universal of Intelligent mobile equipment, interactive voice is as a kind of new The man-machine interaction mode of type just increasingly causes the attention of entire IT industry circle.
Although the development of artificial intelligence technology largely improves speech recognition modeling accuracy rate, artificial intelligence Mysterious internal mechanism is also that practical application has buried many security risks.Usually in planing machine learning system, it is The system for guaranteeing design is safe, reliable and result can achieve the desired results, we would generally consider specifically Threat modeling, these models are the attacking ability and attack for the attacker for making our machine learning system malfunction those attempts The hypothesis of target.So far, existing most of machine learning model is designed both for a very weak threat modeling It realizes, there is no worry about attacker.Although these models can have very perfect table when in face of naturally input It is existing but nearest the study found that even if the model of function admirable is vulnerable to attack resisting sample --- add in the sample After the imperceptible small sample perturbations of human eye, sample can be classified mistake with very high confidence level.If be classified as to resisting sample The classification that attacker specifies, then being just referred to as orientation to resisting sample.
What most of current existing work considered is the generation for fighting image, and the rare people of confrontation audio studies, especially Orientation under black box scene fights audio.Under black box scene, attacker is unknown to the inside structure and ginseng of the model of attack Number, can only obtain the probability that input data is classified as each classification.Very due to attacker knows under this scene information Limited, there is presently no people to study the confrontation audio generation method of the orientation under black box scene.In view of speech recognition modeling exists It is generally under black box scene when being applied in real life, therefore studies the formation mechanism of black box confrontation audio sample for research Corresponding defence method is very necessary with the robustness for enhancing the speech recognition modeling in practical application.
Summary of the invention
The present invention provides the orientations under a kind of black box scene to fight audio generation method, and this method can be by source sound Small disturbance is added on frequency to achieve the purpose that allow speech recognition modeling to be identified as specific content.
Specific technical solution is as follows:
A kind of orientation under black box scene fights audio generation method, comprising the following steps:
(1) selection target black box speech recognition modeling selects source audio and sets target of attack as audio identification model;
(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;
(3) extract resampling after source audio MFCC feature;
(4) the MFCC feature is identified using audio identification model, obtains recognition result;
(5) objective function is set, makes the smallest optimum noise of target function value using particle swarm algorithm searching, will most preferably make an uproar Sound is superimposed with source audio, is obtained the orientation that recognition result is target of attack and is fought audio.
The black box speech recognition modeling refers to the speech recognition modeling of unknown parameters.Black box speech recognition of the invention Model is to be classified and exported the fixed model of classification to voice, such as order word identification model.Target of attack refers to black box language Sound identification model sounds " no " the expection recognition result of orientation confrontation audio for example, orientation fights audio in human ear, and The recognition result of black box speech recognition modeling is " yes ", and " yes " is its target of attack.
In step (3), the MFCC feature is mel cepstrum coefficients.Since MFCC simulates human ear to a certain extent To the processing feature of voice, the research achievement of human auditory system perceptible aspect is applied, voice is helped to improve using this technology The performance of identifying system.
Step (3) includes:
(3-1) carries out preemphasis processing to pretreated audio, and the frequency spectrum of audio is made to become flat;
Audio is divided into several frames after (3-2), and by each frame multiplied by Hamming window;
(3-3) carries out Fast Fourier Transform (FFT) to each frame audio, obtains the frequency spectrum of each frame audio, obtains from the frequency spectrum of audio The energy spectrum of audio;
The energy spectrum of audio is passed through the triangle filter group of one group of Mel scale by (3-4);
The logarithmic energy that (3-5) calculates each triangle filter output is obtained by logarithmic energy through discrete cosine transform The Mel-scaleCepstrum parameter of MFCC coefficient order rank;Extract the dynamic difference parameter of audio;
(3-6) obtains MFCC feature.
Preferably, the parameter in MFCC feature extraction are as follows: pre-emphasis parameters 0.97;512 sampled points are a frame, frame with Overlapping region between frame includes 171 sampled points, and adding window parameter is 0.46;Fast Fourier Transform points are 512;Triangle Number of filter is 26;MFCC order is 16.
In step (5), optimal noise δ, quilt after optimal noise δ is superimposed with source audio are found in aiming at for particle swarm algorithm Audio identification model is identified as target of attack.
In step (5), the objective function are as follows:
Wherein, x is source audio, pi(i=1 ..., N) is i-th of particle, and N is positive integer;f(x+pi)jFor audio identification Model is for input x+piOutput is the probability of jth class result;T is target of attack, f (x+pi)tIt is audio identification model for defeated Enter x+piOutput is the probability of t;Parameter κ is the constant less than or equal to 0.Parameter κ is the confidence level for controlling misclassification, smaller κ mean generate orientation confrontation audio will be identified as t with higher confidence level, that is, generate orientation confrontation audio Attack effect it is better.
In step (5), make the smallest optimum noise of target function value using particle swarm algorithm searching, comprising:
The number of iterations is initialized as 0 by (5-1), is uniformly distributed and is generated N number of particle pi(i=1 ..., N), the length of particle It is identical as source audio length;
(5-2) is by each particle piIt is superimposed respectively with source audio x, obtains N number of audio x+pi
(5-3) extracts audio x+piMFCC feature, using audio identification model to audio x+piMFCC feature known Not, each audio x+p is obtainediRecognition result, and calculate its target function value g (x+pi);
Any audio x+p if it existsiRecognition result be target of attack, then success attack, particle piAs optimum noise;
Otherwise, step (5-4) is executed;
The number of iterations is added 1 by (5-4), is uniformly distributed and is generated N-1 particle pi(i=1 ..., N-1), and by last round of time In with minimum target functional value particle be added, the seed as next round iteration;
Step (5-2)~(5-3) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as Optimum noise;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, attacks and fail.
The present invention also provides the orientation confrontation audios under a kind of black box scene to generate system, comprising:
Data preprocessing module carries out resampling to source audio data, and the sample rate of source audio is made to meet the knowledge of black box voice Requirement of the other model to input audio sample rate;
Audio feature extraction module extracts the MFCC feature of audio data;
Audio identification module, has black box speech recognition modeling, and the black box speech recognition modeling is special to the MFCC of audio Sign is identified, recognition result is obtained;
Particle group optimizing module has objective function, finds optimal noise using particle swarm algorithm, optimal noise is added Source audio obtains orientation confrontation audio.
The orientation confrontation audio generates system and generates orientation confrontation sound using the orientation confrontation audio generation method Frequently.
Compared with prior art, the invention has the benefit that
Orientation confrontation audio generation method of the invention can generate a kind of confrontation audio for adding small sample perturbations, in human ear It is the same for sounding the content of this confrontation audio and original audio, and speech recognition modeling can be by confrontation audio identification in addition Specific content.The it is proposed of this confrontation audio provides to analyse in depth the fragility of the speech recognition modeling based on deep learning How basis defends confrontation audio convenient for follow-up study, improves the robustness of speech recognition modeling.
Detailed description of the invention
Fig. 1 is the configuration diagram that orientation confrontation audio generates system;
Fig. 2 is the flow diagram of orientation confrontation audio generation method;
Fig. 3 is the flow diagram of MFCC feature extraction;
Fig. 4 is the flow diagram that optimum noise is found using particle swarm algorithm.
Specific embodiment
Present invention is further described in detail with reference to the accompanying drawings and examples, it should be pointed out that reality as described below It applies example to be intended to convenient for the understanding of the present invention, and does not play any restriction effect to it.
It includes four modules that orientation confrontation audio under black box scene based on particle swarm algorithm of the invention, which generates system: Data preprocessing module, characteristic extracting module, audio identification module, objective function optimization module, system architecture such as Fig. 1 institute Show.
The process that orientation confrontation audio generates system generation orientation confrontation audio is as shown in Figure 2.Assuming that existing black box language Sound identification model, user want one human ear of generation and sound " no " (target text) but be identified as " yes " by this model 1 second 12kHz sample rate, duration audio of (original audio), whole flow process are as follows:
(1) human ear is sounded " no " as black box speech recognition modeling by the order word identification model for providing Google Audio as original audio, " yes " is used as target of attack;
(2) sample rate to input audio of order word identification model requires to be 16kHz, according to order word identification model Input requirements, data preprocessing module carry out resampling, i.e., the sound for being 16kHz by the audio resampling of 12kHz to original audio Frequently;
(3) MFCC feature extraction is carried out to pretreated audio, the process of MFCC feature extraction is as shown in Figure 3.Specifically Extraction process is as follows:
(i) preemphasis is handled.Firstly, voice signal is then tied after preemphasis is handled by a high-pass filter Fruit is y (n)=x (n)-ax (n-1), and wherein x (n) is n moment speech sample value, and a is pre emphasis factor, is usually arranged as 0.97.Preemphasis purpose is to eliminate the effect of vocal cords and lip in voiced process, to compensate voice signal by articulatory system The high frequency section inhibited, while the formant of prominent high frequency.
(ii) framing adding window.After the completion of preemphasis, needs to carry out sub-frame processing to audio, i.e., adopt every 512 of audio Sampling point assembles a frame, and the overlapping region between frame and frame includes 171 sampled points, then by each frame after framing multiplied by the Chinese Bright window is to increase continuity of the frame left end to right end, adding window parameter a=0.46.
(iii) Fast Fourier Transform (FFT).After the completion of framing adding window, Fast Fourier Transform (FFT) is carried out to each frame signal and is obtained respectively The frequency spectrum of frame.Then to the frequency spectrum modulus square of voice signal (square to take absolute value) and divided by the points of Fourier transformation The energy spectrum of voice signal is obtained, usual Fourier transformation points are set as 512.
(iv) triangle bandpass filtering.The triangle filter group that energy spectrum is passed through to one group of Mel scale carries out energy spectrum Smoothly, and the effect of harmonic carcellation, the formant of original voice is highlighted.Triangle bandpass filter number is 26.
(v) logarithmic energy of filter output is calculated.Firstly, calculating the logarithmic energy s (m) of each filter output, so After will calculate resulting logarithmic energy and substitute into discrete cosine transform, find out MFCC coefficientWherein M is triangular filter number, is 26;N For Fourier transformation points, L is MFCC coefficient order, takes 16.
(vi) extraction of dynamic difference parameter.The cepstrum parameter MFCC of standard has only reacted the static characteristic of speech parameter. We can describe the dynamic characteristic of voice by extracting dynamic difference parameter.
Dynamic difference parameter calculates as follows:
Wherein, dtIndicate t-th of first-order difference parameter, CtIndicate that t-th of cepstrum coefficient, Q indicate the order of cepstrum coefficient, K Indicate the time difference (can value 1 or 2) of first derivative.dtThe second differnce parameter of MFCC can be obtained in formula iteration twice.
(4) identify that obtaining recognition result is to the MFCC feature of extraction with the order word identification model that Google provides " no ", the confidence level of recognition result are 0.9;
(5) make the smallest disturbance of target function value using particle swarm algorithm searching, specifically:
In order to make population mobile towards the direction for maximizing target category probability, objective function is arranged are as follows:
Wherein, x represents the audio of input, pi(i=1 ..., N) represents particle i, shares N number of particle.f(x+pi)jIt represents Speech recognition modeling is for input x+piThe probability of the jth class of output;T is enabled to represent the classification that attacker specifies, then f (x+pi) t generation Input is classified as the probability of t by table speech model;Parameter κ is the confidence level for controlling misclassification, and value is less than or equal to 0, compared with Small κ means that the confrontation audio generated will be identified as t with higher confidence level, that is, the attack of the confrontation audio generated Effect is better.
According to given parameter (κ=0), objective function is minimized using particle swarm algorithm, as shown in figure 4, specifically including:
(a) the number of iterations is initialized as 0 first, and is uniformly distributed from [- 1,1] and generates 25 particle random sequences, grain The length of son is identical as original audio length, is 16000 points;
(b) by each particle piIt is added on original audio x respectively and obtains 25 new audio x+pi, repeat step (3) and (4), each x+p is recordediRecognition result f (x+pi) and calculate its target function value g (x+pi);
(c) if there is any x+piRecognition result be " yes ", then success attack, and particle piIt is expected best Noise δ;
Otherwise, (d) is thened follow the steps;
(d) the number of iterations is added 1, is uniformly distributed from [- 1,1] and generates 24 particles, and will there is minimum in a upper coherence The particle of target function value is added, the seed as next round iteration;
Step (b)~(c) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as expected Optimum noise δ;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, then it represents that attack failure;
(6) optimum noise δ is superimposed with original audio x, just obtains confrontation audio, i.e., audio sound in human ear be " no ", but " yes " is identified as by speech recognition modeling.
Technical solution of the present invention and beneficial effect is described in detail in embodiment described above, it should be understood that Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in spirit of the invention Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims (7)

1. the orientation under a kind of black box scene fights audio generation method, which comprises the following steps:
(1) selection target black box speech recognition modeling selects source audio and sets target of attack as audio identification model;
(2) requirement according to audio identification model to input audio sample rate carries out resampling to source audio;
(3) extract resampling after source audio MFCC feature;
(4) the MFCC feature is identified using audio identification model, obtains recognition result;
(5) set objective function, using particle swarm algorithm searching make the smallest optimum noise of target function value, by optimum noise with Source audio superposition obtains the orientation that recognition result is target of attack and fights audio.
2. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that described is black Box speech recognition modeling is to be classified and exported the fixed speech recognition modeling of classification to voice.
3. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that step (3) Include:
(3-1) carries out preemphasis processing to pretreated audio, and the frequency spectrum of audio is made to become flat;
Audio is divided into several frames after (3-2), and by each frame multiplied by Hamming window;
(3-3) carries out Fast Fourier Transform (FFT) to each frame audio, obtains the frequency spectrum of each frame audio, obtains audio from the frequency spectrum of audio Energy spectrum;
The energy spectrum of audio is passed through the triangle filter group of one group of Mel scale by (3-4);
The logarithmic energy that (3-5) calculates each triangle filter output obtains MFCC by logarithmic energy through discrete cosine transform The Mel-scaleCepstrum parameter of coefficient order rank;Extract the dynamic difference parameter of audio;
(3-6) obtains MFCC feature.
4. the orientation under black box scene according to claim 3 fights audio generation method, which is characterized in that MFCC feature Parameter in extraction are as follows: pre-emphasis parameters 0.97;512 sampled points are a frame, and the overlapping region between frame and frame includes 171 A sampled point, adding window parameter are 0.46;Fast Fourier Transform points are 512;Triangle filter number is 26;MFCC order It is 16.
5. the orientation under black box scene according to claim 1 fights audio generation method, which is characterized in that the mesh Scalar functions are as follows:
Wherein, x is source audio, pi(i=1 ..., N) is i-th of particle, and N is positive integer;f(x+pi)jFor audio identification model For input x+piOutput is the probability of jth class result;T is target of attack, f (x+pi)tIt is audio identification model for input x+ piOutput is the probability of t;Parameter κ is the constant less than or equal to 0.
6. the orientation under black box scene according to claim 5 fights audio generation method, which is characterized in that step (5) In, make the smallest optimum noise of target function value using particle swarm algorithm searching, comprising:
The number of iterations is initialized as 0 by (5-1), is uniformly distributed and is generated N number of particle pi(i=1 ..., N), the length of particle and source Audio length is identical;
(5-2) is by each particle piIt is superimposed respectively with source audio x, obtains N number of audio x+pi
(5-3) extracts audio x+piMFCC feature, using audio identification model to audio x+piMFCC feature identified, Obtain each audio x+piRecognition result, and calculate its target function value g (x+pi);
Any audio x+p if it existsiRecognition result be target of attack, then success attack, particle piAs optimum noise;
Otherwise, step (5-4) is executed;
The number of iterations is added 1 by (5-4), is uniformly distributed and is generated N-1 particle pi(i=1 ..., N-1), and will have in last round of time There is the particle of minimum target functional value to be added, the seed as next round iteration;
Step (5-2)~(5-3) is repeated, until objective function is restrained, acquisition makes the convergent particle p of objective functioni, as most preferably Noise;
If objective function is still not converged when the number of iterations reaches the maximum number of iterations of setting, attacks and fail.
7. the orientation confrontation audio under a kind of black box scene generates system characterized by comprising
Data preprocessing module carries out resampling to source audio data, the sample rate of source audio is made to meet black box speech recognition mould Requirement of the type to input audio sample rate;
Audio feature extraction module extracts the MFCC feature of audio data;
Audio identification module, has a black box speech recognition modeling, the black box speech recognition modeling to the MFCC feature of audio into Row identification, obtains recognition result;
Particle group optimizing module has objective function, finds optimal noise using particle swarm algorithm, and source sound is added in optimal noise Frequently, orientation confrontation audio is obtained.
The orientation confrontation audio generates system and generates orientation confrontation audio using the orientation confrontation audio generation method;
The orientation confrontation audio generates system and fights audio generation side using the described in any item orientations of claim 1~6 Method generates orientation confrontation audio.
CN201910060662.0A 2019-01-22 2019-01-22 Orientation confrontation audio generation method and system under a kind of black box scene Pending CN109887496A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910060662.0A CN109887496A (en) 2019-01-22 2019-01-22 Orientation confrontation audio generation method and system under a kind of black box scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910060662.0A CN109887496A (en) 2019-01-22 2019-01-22 Orientation confrontation audio generation method and system under a kind of black box scene

Publications (1)

Publication Number Publication Date
CN109887496A true CN109887496A (en) 2019-06-14

Family

ID=66926610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910060662.0A Pending CN109887496A (en) 2019-01-22 2019-01-22 Orientation confrontation audio generation method and system under a kind of black box scene

Country Status (1)

Country Link
CN (1) CN109887496A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm
CN110767216A (en) * 2019-09-10 2020-02-07 浙江工业大学 Voice recognition attack defense method based on PSO algorithm
CN110992951A (en) * 2019-12-04 2020-04-10 四川虹微技术有限公司 Method for protecting personal privacy based on countermeasure sample
CN111341327A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Speaker voice recognition method, device and equipment based on particle swarm optimization
CN111710327A (en) * 2020-06-12 2020-09-25 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for model training and sound data processing
CN112216296A (en) * 2020-09-25 2021-01-12 脸萌有限公司 Audio anti-disturbance testing method and device and storage medium
CN113345420A (en) * 2021-06-07 2021-09-03 河海大学 Countermeasure audio generation method and system based on firefly algorithm and gradient evaluation
CN113362822A (en) * 2021-06-08 2021-09-07 北京计算机技术及应用研究所 Black box voice confrontation sample generation method with auditory masking
WO2021212675A1 (en) * 2020-04-21 2021-10-28 清华大学 Method and apparatus for generating adversarial sample, electronic device and storage medium
CN114627858A (en) * 2022-05-09 2022-06-14 杭州海康威视数字技术股份有限公司 Intelligent voice recognition security defense method and device based on particle swarm optimization
CN116758899A (en) * 2023-08-11 2023-09-15 浙江大学 Speech recognition model safety assessment method based on semantic space disturbance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096955A (en) * 2015-09-06 2015-11-25 广东外语外贸大学 Speaker rapid identification method and system based on growing and clustering algorithm of models
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception
US10007498B2 (en) * 2015-12-17 2018-06-26 Architecture Technology Corporation Application randomization mechanism
CN108446700A (en) * 2018-03-07 2018-08-24 浙江工业大学 A kind of car plate attack generation method based on to attack resistance
CN108520268A (en) * 2018-03-09 2018-09-11 浙江工业大学 The black box antagonism attack defense method evolved based on samples selection and model
CN109036385A (en) * 2018-10-19 2018-12-18 北京旋极信息技术股份有限公司 A kind of voice instruction recognition method, device and computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception
CN105096955A (en) * 2015-09-06 2015-11-25 广东外语外贸大学 Speaker rapid identification method and system based on growing and clustering algorithm of models
US10007498B2 (en) * 2015-12-17 2018-06-26 Architecture Technology Corporation Application randomization mechanism
CN108446700A (en) * 2018-03-07 2018-08-24 浙江工业大学 A kind of car plate attack generation method based on to attack resistance
CN108520268A (en) * 2018-03-09 2018-09-11 浙江工业大学 The black box antagonism attack defense method evolved based on samples selection and model
CN109036385A (en) * 2018-10-19 2018-12-18 北京旋极信息技术股份有限公司 A kind of voice instruction recognition method, device and computer storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOUSTAFA ALZANTOT: "Did you hear that? Adversarial Examples AgainstAutomatic Speech Recognition", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017), LONG BEACH, CA, USA》 *
PIN-YU CHEN: "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models", 《2017 ASSOCIATION FOR COMPUTING MACHINERY》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm
CN110767216A (en) * 2019-09-10 2020-02-07 浙江工业大学 Voice recognition attack defense method based on PSO algorithm
CN110992951A (en) * 2019-12-04 2020-04-10 四川虹微技术有限公司 Method for protecting personal privacy based on countermeasure sample
CN111341327A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Speaker voice recognition method, device and equipment based on particle swarm optimization
WO2021212675A1 (en) * 2020-04-21 2021-10-28 清华大学 Method and apparatus for generating adversarial sample, electronic device and storage medium
CN111710327B (en) * 2020-06-12 2023-06-20 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for model training and sound data processing
CN111710327A (en) * 2020-06-12 2020-09-25 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for model training and sound data processing
CN112216296A (en) * 2020-09-25 2021-01-12 脸萌有限公司 Audio anti-disturbance testing method and device and storage medium
CN112216296B (en) * 2020-09-25 2023-09-22 脸萌有限公司 Audio countermeasure disturbance testing method, device and storage medium
CN113345420A (en) * 2021-06-07 2021-09-03 河海大学 Countermeasure audio generation method and system based on firefly algorithm and gradient evaluation
CN113362822A (en) * 2021-06-08 2021-09-07 北京计算机技术及应用研究所 Black box voice confrontation sample generation method with auditory masking
CN114627858A (en) * 2022-05-09 2022-06-14 杭州海康威视数字技术股份有限公司 Intelligent voice recognition security defense method and device based on particle swarm optimization
CN116758899A (en) * 2023-08-11 2023-09-15 浙江大学 Speech recognition model safety assessment method based on semantic space disturbance
CN116758899B (en) * 2023-08-11 2023-10-13 浙江大学 Speech recognition model safety assessment method based on semantic space disturbance

Similar Documents

Publication Publication Date Title
CN109887496A (en) Orientation confrontation audio generation method and system under a kind of black box scene
CN109599109B (en) Confrontation audio generation method and system for white-box scene
CN110767216B (en) Voice recognition attack defense method based on PSO algorithm
Cui et al. Data augmentation for deep neural network acoustic modeling
Yang et al. Characterizing speech adversarial examples using self-attention u-net enhancement
CN111261147B (en) Music embedding attack defense method for voice recognition system
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
TW201935464A (en) Method and device for voiceprint recognition based on memorability bottleneck features
WO2020043160A1 (en) Method and system for detecting voice activity innoisy conditions
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN102779510A (en) Speech emotion recognition method based on feature space self-adaptive projection
CN109887489A (en) Speech dereverberation method based on the depth characteristic for generating confrontation network
CN113362822B (en) Black box voice confrontation sample generation method with auditory masking
Xu et al. Cross-language transfer learning for deep neural network based speech enhancement
CN103985390A (en) Method for extracting phonetic feature parameters based on gammatone relevant images
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN112183582A (en) Multi-feature fusion underwater target identification method
Yi et al. Audio deepfake detection: A survey
Shi et al. Fusion feature extraction based on auditory and energy for noise-robust speech recognition
CN104952446A (en) Digital building presentation system based on voice interaction
Sun et al. A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea
Huang et al. Research on robustness of emotion recognition under environmental noise conditions
CN111462737B (en) Method for training grouping model for voice grouping and voice noise reduction method
CN113488069B (en) Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190614

RJ01 Rejection of invention patent application after publication