CN111046176A - Countermeasure sample generation method and device, electronic equipment and storage medium - Google Patents

Countermeasure sample generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111046176A
CN111046176A CN201911165138.6A CN201911165138A CN111046176A CN 111046176 A CN111046176 A CN 111046176A CN 201911165138 A CN201911165138 A CN 201911165138A CN 111046176 A CN111046176 A CN 111046176A
Authority
CN
China
Prior art keywords
text
modified
word
original
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911165138.6A
Other languages
Chinese (zh)
Other versions
CN111046176B (en
Inventor
王文华
吕中厚
刘焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911165138.6A priority Critical patent/CN111046176B/en
Publication of CN111046176A publication Critical patent/CN111046176A/en
Application granted granted Critical
Publication of CN111046176B publication Critical patent/CN111046176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a confrontation sample generation method, a device, electronic equipment and a storage medium, and relates to the field of deep learning, wherein the method can comprise the following steps: acquiring an original text, and executing the following first processing: respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text; if the text set meets the preset requirement, selecting one modified text in the text set as a generated countermeasure sample; otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing. By the scheme, labor cost can be saved, processing efficiency can be improved, and the like.

Description

Countermeasure sample generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to computer application technologies, and in particular, to a method and an apparatus for generating confrontation samples in the field of deep learning, an electronic device, and a storage medium.
Background
At present, Deep Neural Networks (DNNs) have been widely used in the fields of computer vision, audio, Natural Language Processing (NLP), and the like, and have solved a large number of important practical problems.
Many NLP tasks employ DNN models, such as a text classification task, and the models used may be referred to as text classification models. Text classification models are susceptible to interference from competing samples, resulting in classification errors and the like. Therefore, it is necessary to generate/construct countermeasure samples to optimize the text classification model so as to improve the performance of the model.
Most of the existing confrontation sample generation methods need manual participation, so that the method not only needs to consume larger labor cost, but also has low efficiency.
Disclosure of Invention
In view of the above, the present application provides a countermeasure sample generation method, apparatus, electronic device and storage medium.
A challenge sample generation method, comprising:
acquiring an original text, and executing the following first processing:
respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text;
and if the text set meets the preset requirement, selecting a modified text in the text set as a generated countermeasure sample, otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing.
According to a preferred embodiment of the present application, the generating S modified texts of the original text respectively includes:
respectively executing S times of second processing on the original texts, wherein the second processing comprises the following steps:
randomly selecting a word in the original text as a replacement object;
determining the optimal replacement word of the replacement object;
and replacing the replacement object with the optimal replacement word to obtain a modified text.
According to a preferred embodiment of the present application, the determining the optimal replacement word for the replacement object includes:
determining N nearest neighbor words of the replacement object, wherein N is a positive integer greater than one;
selecting M words with the highest context matching degree with the original text from the N nearest neighbor words, wherein M is a positive integer larger than one, and M is smaller than N;
and selecting the word with the most offensive property from the M words as the best alternative word.
According to a preferred embodiment of the present application, the method further comprises: filtering out non-synonyms of the replacement object from the N nearest neighbors.
According to a preferred embodiment of the present application, the selecting M words from the N nearest neighbor words with the highest context matching degree with the original text includes:
for each nearest neighbor word, replacing the replacement object in the original text by the nearest neighbor word, and determining a grammar score of the modified text after replacement by a pre-obtained voice model;
and sequencing the nearest neighbor words according to the sequence of the corresponding grammar scores from high to low, and selecting the words which are positioned at the top M positions after sequencing.
According to a preferred embodiment of the present application, the selecting a word with the most offensive property from the M words as the best alternative word includes:
for each word in the M words, replacing the replacement object in the original text by the word, and inputting the modified text after replacement into a pre-acquired text classification model to obtain a prediction classification result and a confidence coefficient;
inputting the original text into the text classification model to obtain a prediction classification result and a confidence coefficient;
and selecting a word with a corresponding prediction classification result different from that of the original text and the highest confidence coefficient from the M words as the optimal replacement word.
According to a preferred embodiment of the present application, the method further comprises:
for each modified text in the text set, inputting the modified text into a pre-acquired text classification model respectively to obtain a prediction classification result and a confidence coefficient;
inputting the original text into the text classification model to obtain a prediction classification result and a confidence coefficient;
and if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text, determining that the text set meets the preset requirement.
According to a preferred embodiment of the present application, the selecting one of the modified texts in the text set as the generated countermeasure sample includes: and selecting one modified text which is different from the predicted classification result of the original text and has the highest confidence coefficient from the text set as the countermeasure sample.
According to a preferred embodiment of the present application, the generating a new text according to the modified text in the text set includes:
respectively determining target classification prediction probabilities of all modified texts in the text set according to the confidence degrees, wherein the target classification prediction probabilities represent the probabilities that the prediction classification results of the modified texts are not the prediction classification results of the original texts;
randomly extracting two modified texts from the text set by taking the target classification prediction probability of each modified text as an extraction probability; and synthesizing a new text according to the two extracted modified texts.
According to a preferred embodiment of the present application, the method further comprises: and optimizing the pre-acquired text classification model by using the generated countermeasure sample.
A challenge sample generation device comprising: a sample generation unit;
the sample generation unit is used for acquiring an original text and executing the following first processing:
respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text;
and if the text set meets the preset requirement, selecting a modified text in the text set as a generated countermeasure sample, otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing.
According to a preferred embodiment of the present application, the sample generating unit performs S times of second processing on the original text, respectively, where the second processing includes: randomly selecting a word in the original text as a replacement object; determining the optimal replacement word of the replacement object; and replacing the replacement object with the optimal replacement word to obtain a modified text.
According to a preferred embodiment of the present application, the sample generating unit determines N nearest neighboring words of the replacement object, where N is a positive integer greater than one, selects M words with the highest context matching degree with the original text from the N nearest neighboring words, where M is a positive integer greater than one, and M is smaller than N, and selects one word with the most offensive property from the M words as the best replacement word.
According to a preferred embodiment of the present application, the sample generating unit is further configured to filter out non-synonyms of the replacement object from the N nearest neighbor words.
According to a preferred embodiment of the present application, the sample generating unit replaces the replacement object in the original text with the nearest neighboring words, determines a grammar score of the modified text after replacement by using a pre-obtained speech model, sorts the nearest neighboring words according to a sequence of the grammar scores from high to low, and selects words at the top M bits after sorting.
According to a preferred embodiment of the present application, the sample generation unit replaces the replacement object in the original text with the word for each word in the M words, inputs the modified text after replacement into a text classification model obtained in advance, obtains a predicted classification result and a confidence level, inputs the original text into the text classification model, obtains a predicted classification result and a confidence level, and selects a word with a highest confidence level and a corresponding predicted classification result different from the predicted classification result of the original text from the M words as the optimal replacement word.
According to a preferred embodiment of the present application, the sample generating unit is further configured to, for each modified text in the text set, respectively input the modified text into a text classification model obtained in advance to obtain a predicted classification result and a confidence level, and input the original text into the text classification model to obtain a predicted classification result and a confidence level, and if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text, determine that the text set meets a predetermined requirement.
According to a preferred embodiment of the present application, the sample generation unit selects, as the countermeasure sample, one modified text from the text set that is different from the prediction classification result of the original text and has the highest confidence.
According to a preferred embodiment of the present application, the sample generation unit determines a target classification prediction probability of each modified text in the text set according to the confidence, where the target classification prediction probability represents a probability that a prediction classification result of the modified text is not a prediction classification result of the original text, and randomly extracts two modified texts from the text set by using the target classification prediction probability of each modified text as an extraction probability, and synthesizes a new text according to the two extracted modified texts.
According to a preferred embodiment of the present application, the apparatus further comprises: and the model optimization unit is used for optimizing the pre-acquired text classification model by using the generated countermeasure sample.
An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.
A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
One embodiment in the above application has the following advantages or benefits: the confrontation sample can be automatically generated without manual participation, so that the labor cost is saved, the processing efficiency is improved, and the like; the original text can be modified as little as possible, and the replacement object can be replaced by the replacement words with similar semantics and grammar, so that the generated countermeasure sample is slightly disturbed, hidden and difficult to perceive compared with the original text, human misjudgment cannot be misled, and the text classification model can be successfully deceived; the method for generating the confrontation sample has strong mobility, can be suitable for text classification models of various DNN frameworks, and is also effective for attacking the confrontation sample generated by the A model to the B model; the confrontation sample generation method can be used under the condition of a black box, model details and the like do not need to be known, and the confrontation sample generation method is more suitable for a real scene; the generated countermeasure sample can be used for optimizing the text classification model so as to improve the robustness of the model, ensure the safety of the model and the like; other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow chart of an embodiment of a challenge sample generation method described herein;
FIG. 2 is a flow chart of an embodiment of a method for generating modified text of an original text according to the present application;
FIG. 3 is a schematic diagram of an example challenge sample generation apparatus 300 according to the present application;
fig. 4 is a block diagram of an electronic device according to the method of an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
FIG. 1 is a flow chart of an embodiment of a challenge sample generation method described herein. As shown in fig. 1, the following detailed implementation is included.
In 101, the original text is obtained and the process shown in 102-104 is performed.
In 102, S modified texts of the original text are respectively generated to form a text set, S is a positive integer greater than one, and each modified text respectively replaces one word in the original text.
In 103, if the text set meets the predetermined requirement, one modified text in the text set is selected as the generated countermeasure sample.
In 104, if the text set does not meet the predetermined requirement, a new text is generated according to the modified text in the text set, and the new text is used as the original text, and the processing shown in 102 and 104 is repeatedly executed.
In this embodiment, the confrontation samples may be iteratively generated using ethnic genetic algorithms. The inspiration of the ethnic genetic algorithm comes from the process of natural selection, candidate solutions are iteratively evolved into better solutions, fitness can be adopted to evaluate the quality of population members in each iteration, parents with higher fitness are more likely to be used for breeding next generations, the next generations can be generated through combination of crossing and variation, and the ethnic genetic algorithm is good in solving the problem of combination optimization.
The original text may be a sentence or a paragraph (containing a plurality of sentences). S modified texts can be respectively generated aiming at the original text, wherein S is a positive integer larger than one, the specific value can be determined according to the actual requirement, and compared with the original text, one word in the original text is respectively replaced in each modified text.
Preferably, S times of second processing may be respectively performed on the original text, and the second processing may include: randomly selecting a word in the original text as a replacement object; determining the optimal replacement word of the replacement object; and replacing the replacement object with the optimal replacement word to obtain a modified text.
When the best replacement word of the replacement object is determined, N nearest neighbor words of the replacement object can be determined firstly, N is a positive integer larger than one, then M words with the highest context matching degree with the original text can be selected from the N nearest neighbor words, M is a positive integer larger than one, M is smaller than N, the specific values of M and N can be determined according to actual needs, and finally, one word with the most offensive property can be selected from the M words to serve as the best replacement word. Preferably, after the N nearest neighbor words of the replacement object are determined, non-synonyms of the replacement object can be further filtered from the N nearest neighbor words, and then subsequent processing is performed.
Based on the above description, fig. 2 is a flowchart of an embodiment of a method for generating a modified text of an original text according to the present application. As shown in fig. 2, the following detailed implementation is included.
In 201, a word in the original text is randomly selected as a replacement object.
The original text will contain a plurality of words, one of which can be randomly selected as a replacement object.
At 202, N nearest neighbor words of the replacement object are determined, N being a positive integer greater than one, and non-synonyms of the replacement object are filtered from the N nearest neighbor words.
For example, N nearest neighbor words of the replacement object, that is, N words closest to the replacement object, may be determined from the words in the database according to the distance of the Glove word embedding space, and the distance calculation may use euclidean distance, how to determine the nearest neighbor words is the prior art.
And post-processing can be carried out by using a method such as counter-fitting and the like, and non-synonyms of the replacement objects are filtered from the N nearest neighbor words, so that the remaining nearest neighbor words are ensured to be synonyms of the replacement objects.
In 203, M words with the highest degree of context matching with the original text are selected from the nearest neighbor words, wherein M is a positive integer greater than one.
For example, for each nearest neighbor word, the nearest neighbor word may be used to replace a replacement object in the original text, and a pre-obtained speech model is used to determine a grammar score of the modified text after replacement, and further, each nearest neighbor word may be sorted according to the corresponding grammar score from high to low, and a word at the top M bits after sorting is selected.
The language model may be a Google language model, i.e., words that do not conform to the contextual grammar may be filtered out using the Google language model.
In 204, the most aggressive word is selected from the M words as the best replacement word.
The most aggressive word is the word that maximizes the prediction probability of the target classification.
The way to select the most aggressive word from the M words may include, but is not limited to: for each word in the M words, respectively using the replacement object in the original text to replace the modified text after replacement, inputting the modified text after replacement into a pre-obtained text classification model to obtain a prediction classification result and a confidence level, inputting the original text into the text classification model to obtain a prediction classification result and a confidence level, and then selecting a word with a corresponding prediction classification result different from the prediction classification result of the original text and the highest confidence level from the M words as an optimal replacement word.
Taking a text classification model as an emotion classification model as an example, the predicted classification result includes Negative (Negative) and Positive (Positive), and if the predicted classification result of the original text is Negative, a word with a corresponding predicted classification result of Positive and the highest confidence coefficient can be selected from M words as the best replacement word.
At 205, the replacement object in the original text is replaced with the best replacement word, resulting in a modified text.
Through the method, words with similar semantics and matched grammar can be selected to replace the replacement object.
For example, the following steps are carried out: the original text is: today, the weather is clear, and the method is very suitable for play in suburbs (Today's weather for a trip to the country), the word "preprpriate" can be selected to replace "suitable", the change to the original text is small, the semantics and the grammar are correct, the fitness is high, and accordingly, the obtained modified text is: today's weather is sunny, very aprpriate for a trip to the countryside.
According to the method shown in fig. 2, S modified texts of the original text may be generated respectively to form a text set, and then it may be determined whether the text set meets a predetermined requirement, if so, one modified text in the text set may be selected as the generated countermeasure sample to be output, and the iteration is ended, otherwise, a new text may be generated according to the modified text in the text set, and the processing shown in 102 and 104 may be repeatedly performed, taking the new text as the original text.
The method comprises the steps of inputting a modified text into a text classification model respectively aiming at each modified text in a text set to obtain a predicted classification result and a confidence level, inputting an original text into the text classification model to obtain the predicted classification result and the confidence level, and determining that the text set meets a preset requirement and the attack is successful if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text.
Taking a text classification model as an emotion classification model as an example, the predicted classification result comprises Negative and Positive, and if a text set comprises 8 modified texts which are respectively 1 to 8 modified texts, wherein the predicted classification result of the 3 modified texts is Positive, and the predicted classification result of the original text is Negative, the text set can be determined to meet the preset requirement.
Accordingly, one modified text in the text set can be selected as the generated countermeasure sample, and preferably, one modified text which is different from the predicted classification result of the original text and has the highest confidence coefficient can be selected from the text set as the countermeasure sample. For example, if the modified texts with the Positive classification results of 3 predictions are modified text 1, modified text 3, and modified text 4, respectively, where the confidence of modified text 3 is higher than the confidence of modified text 1 and modified text 4, then modified text 3 may be output as a countermeasure sample.
If the text set is determined not to meet the predetermined requirement, a new text can be generated according to the modified text in the text set. Preferably, the target classification prediction probability of each modified text in the text set can be respectively determined according to the confidence, the target classification prediction probability represents the probability that the prediction classification result of the modified text is not the prediction classification result of the original text, then the target classification prediction probability of each modified text can be taken as the extraction probability, two modified texts are randomly extracted from the text set, and a new text can be generated according to the two extracted modified texts.
Taking a text classification model as an example of an emotion binary classification model, the prediction classification result of the text classification model comprises Negative and Positive, and assuming that the prediction classification result of an original text is Negative, the target classification result is Positive, and the target classification prediction probability represents the probability that the prediction classification result of a modified text is Positive, wherein if the prediction classification result of a certain modified text is Positive, the target classification prediction probability can be determined as the confidence coefficient, such as 59.8%, corresponding to the prediction classification result, and if the prediction classification result of a certain modified text is Negative, the target classification prediction probability can be determined as (1-the confidence coefficient of the prediction classification result).
Each modified text in the text set is a population member, and the target classification prediction probability of each modified text is the fitness of the population member.
Assuming that the prediction classification result of each modified text in the text set is the same as the prediction classification result of the original text, the target classification prediction probability (i.e., fitness) of each modified text can be taken as an extraction probability, and two modified texts are randomly extracted from the text set.
The new text may be used as the original text and the process shown at 102 and 104 may be repeated until the challenge sample is output.
Subsequently, the generated countermeasure samples can be used for optimizing the text classification model, namely countermeasure training and the like, so that the text classification model is further optimized/perfected. The countermeasure training refers to the mixing of countermeasure samples and original samples as a training data set to train the text classification model, so that the text classification model has the capability of defending the attack of the countermeasure samples.
It should be noted that the foregoing method embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In summary, the solution described in the method embodiment of the present application has at least the following advantages:
1) the confrontation sample can be automatically generated without manual participation, thereby saving the labor cost, improving the processing efficiency and the like.
2) The original text can be modified as little as possible, and the replacement object can be replaced by the replacement words with similar semantics and grammar, so that the generated countermeasure sample is slightly disturbed, hidden and difficult to perceive compared with the original text, human misjudgment cannot be misled, and the text classification model can be successfully deceived.
3) The method for generating the confrontation sample has strong mobility, can be applied to text classification models of various DNN architectures, and is also effective for the attack of the confrontation sample generated by the A model on the B model.
4) The confrontation sample generation method can be used under the condition of a black box, model details and the like are not needed to be known, and the confrontation sample generation method is more suitable for a real scene, namely the confrontation sample generation method is a confrontation sample generation method based on black box attack, in the black box attack, an attacker can only use the provided input query target model and obtain the output prediction and the confidence score, and does not have complete access right.
5) The generated confrontation samples can be used for optimizing the text classification model so as to improve the robustness of the model, ensure the safety of the model and the like.
6) Experiments show that when the generated countermeasure sample attacks the text classification model, the attack success rate is high, the time consumption is short, the interference perception rate is low, and the like. For example, 1000 correctly classified samples of the text classification model can be randomly extracted for evaluation, the maximum percentage of allowed modification of the original text is limited to 20%, the attack round is set to 30, if the countermeasure samples are output within 30 iterations, the attack success number is increased by one, otherwise, the attack failure number is increased by one, and after the 1000 attack tests are finished, the attack success rate is calculated, wherein the higher the value is, the better the value is. In addition, the elapsed time per attack can be recorded and the average elapsed time of attack calculated, the lower the value the better. The interference perception rate of the confrontation sample can also be evaluated manually, for example, 20 volunteers can be asked to perform a user study of an emotion analysis task to evaluate the obvious degree or perceived degree of the variation of the generated confrontation sample, and the lower the interference perception rate is, the better the interference perception rate is.
The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.
FIG. 3 is a schematic diagram of an example challenge sample generation apparatus 300 according to the present application. As shown in fig. 3, includes: a sample generation unit 301.
A sample generation unit 301, configured to acquire an original text, and perform the following first processing:
respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text;
and if the text set meets the preset requirement, selecting a modified text in the text set as the generated countermeasure sample, otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing.
Specifically, the sample generation unit 301 may perform S times of second processing on the original text, respectively, and the second processing may include: randomly selecting a word in the original text as a replacement object; determining the optimal replacement word of the replacement object; and replacing the replacement object with the optimal replacement word to obtain a modified text.
The sample generating unit 301 may determine N nearest neighboring words of the replacement object, where N is a positive integer greater than one, select M words with the highest context matching degree with the original text from the N nearest neighboring words, where M is a positive integer greater than one, and M is smaller than N, and select one word with the most offensive property from the M words as the best replacement word.
Preferably, after determining N nearest neighbor words of the replacement object, the sample generation unit 301 may further filter out non-synonyms of the replacement object from the N nearest neighbor words.
The sample generating unit 301 may replace a replacement object in the original text with each nearest neighboring word, determine a grammar score of the modified text after replacement by using a pre-obtained speech model, sort the nearest neighboring words according to a sequence of the grammar scores from high to low, and select a word positioned at the top M after sorting.
The sample generating unit 301 may replace a replacement object in the original text with each of the M words, input the modified text after replacement into a pre-obtained text classification model, obtain a predicted classification result and a confidence level, input the original text into the text classification model, obtain the predicted classification result and the confidence level, and further select a word with a highest confidence level and a corresponding predicted classification result different from the predicted classification result of the original text from the M words as an optimal replacement word.
The sample generating unit 301 may further input the modified text into a text classification model for each modified text in the text set to obtain a predicted classification result and a confidence level, and input the original text into the text classification model to obtain a predicted classification result and a confidence level, and if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text, it may be determined that the text set meets a predetermined requirement.
Accordingly, the sample generation unit 301 may select one modified text from the text set, which is different from the prediction classification result of the original text and has the highest confidence, output as a countersample, and end the iteration.
And if the predicted classification results of the modified texts in the text set are the same as the predicted classification results of the original texts, determining that the text set does not meet the preset requirement. Accordingly, the sample generating unit 301 may determine, according to the confidence, a target classification prediction probability of each modified text in the text set, where the target classification prediction probability represents a probability that a prediction classification result of the modified text is not a prediction classification result of the original text, and may then randomly extract two modified texts from the text set with the target classification prediction probability of each modified text as an extraction probability, synthesize a new text according to the two extracted modified texts, and may repeatedly perform the first process with the new text as the original text.
The device shown in fig. 3 may further include: the model optimizing unit 302 is configured to optimize the text classification model by using the generated countermeasure samples, that is, perform countermeasure training and the like, so as to further optimize/perfect the text classification model.
For a specific work flow of the apparatus embodiment shown in fig. 3, reference is made to the related description in the foregoing method embodiment, and details are not repeated.
In summary, the solution described in the device embodiment of the present application has at least the following advantages:
1) the confrontation sample can be automatically generated without manual participation, thereby saving the labor cost, improving the processing efficiency and the like.
2) The original text can be modified as little as possible, and the replacement object can be replaced by the replacement words with similar semantics and grammar, so that the generated countermeasure sample is slightly disturbed, hidden and difficult to perceive compared with the original text, human misjudgment cannot be misled, and the text classification model can be successfully deceived.
3) The method for generating the confrontation sample has strong mobility, can be applied to text classification models of various DNN architectures, and is also effective for the attack of the confrontation sample generated by the A model on the B model.
4) The confrontation sample generation method can be used under the condition of a black box, model details and the like are not needed to be known, and the confrontation sample generation method is more suitable for a real scene, namely the confrontation sample generation method is a confrontation sample generation method based on black box attack, in the black box attack, an attacker can only use the provided input query target model and obtain the output prediction and the confidence score, and does not have complete access right.
5) The generated confrontation samples can be used for optimizing the text classification model so as to improve the robustness of the model, ensure the safety of the model and the like.
6) Experiments show that when the generated countermeasure sample attacks the text classification model, the attack success rate is high, the time consumption is short, the interference perception rate is low, and the like.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 4 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 4, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information for a graphical user interface on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor Y01 is taken as an example.
Memory Y02 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.
Memory Y02 is provided as a non-transitory computer readable storage medium that can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing, i.e., implements the method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.
The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.
The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device, a tactile feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (22)

1. A challenge sample generation method, comprising:
acquiring an original text, and executing the following first processing:
respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text;
and if the text set meets the preset requirement, selecting a modified text in the text set as a generated countermeasure sample, otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing.
2. The method of claim 1,
the generating S modified texts of the original text respectively includes:
respectively executing S times of second processing on the original texts, wherein the second processing comprises the following steps:
randomly selecting a word in the original text as a replacement object;
determining the optimal replacement word of the replacement object;
and replacing the replacement object with the optimal replacement word to obtain a modified text.
3. The method of claim 2,
the determining the optimal replacement word of the replacement object comprises:
determining N nearest neighbor words of the replacement object, wherein N is a positive integer greater than one;
selecting M words with the highest context matching degree with the original text from the N nearest neighbor words, wherein M is a positive integer larger than one, and M is smaller than N;
and selecting the word with the most offensive property from the M words as the best alternative word.
4. The method of claim 3,
the method further comprises the following steps: filtering out non-synonyms of the replacement object from the N nearest neighbors.
5. The method of claim 3,
the selecting, from the N nearest neighbor words, M words having a highest context matching degree with the original text includes:
for each nearest neighbor word, replacing the replacement object in the original text by the nearest neighbor word, and determining a grammar score of the modified text after replacement by a pre-obtained voice model;
and sequencing the nearest neighbor words according to the sequence of the corresponding grammar scores from high to low, and selecting the words which are positioned at the top M positions after sequencing.
6. The method of claim 3,
the selecting a word with the most offensive from the M words as the best alternative word includes:
for each word in the M words, replacing the replacement object in the original text by the word, and inputting the modified text after replacement into a pre-acquired text classification model to obtain a prediction classification result and a confidence coefficient;
inputting the original text into the text classification model to obtain a prediction classification result and a confidence coefficient;
and selecting a word with a corresponding prediction classification result different from that of the original text and the highest confidence coefficient from the M words as the optimal replacement word.
7. The method of claim 1,
the method further comprises the following steps:
for each modified text in the text set, inputting the modified text into a pre-acquired text classification model respectively to obtain a prediction classification result and a confidence coefficient;
inputting the original text into the text classification model to obtain a prediction classification result and a confidence coefficient;
and if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text, determining that the text set meets the preset requirement.
8. The method of claim 7,
the selecting one of the set of texts to be a generated countermeasure sample comprises: and selecting one modified text which is different from the predicted classification result of the original text and has the highest confidence coefficient from the text set as the countermeasure sample.
9. The method of claim 7,
the generating a new text according to the modified text in the text set comprises:
respectively determining target classification prediction probabilities of all modified texts in the text set according to the confidence degrees, wherein the target classification prediction probabilities represent the probabilities that the prediction classification results of the modified texts are not the prediction classification results of the original texts;
and randomly extracting two modified texts from the text set by taking the target classification prediction probability of each modified text as an extraction probability, and synthesizing a new text according to the two extracted modified texts.
10. The method of claim 1,
the method further comprises the following steps: and optimizing the pre-acquired text classification model by using the generated countermeasure sample.
11. A challenge sample generation device, comprising: a sample generation unit;
the sample generation unit is used for acquiring an original text and executing the following first processing:
respectively generating S modified texts of the original text to form a text set, wherein S is a positive integer greater than one, and each modified text respectively replaces one word in the original text;
and if the text set meets the preset requirement, selecting a modified text in the text set as a generated countermeasure sample, otherwise, generating a new text according to the modified text in the text set, taking the new text as the original text, and repeatedly executing the first processing.
12. The apparatus of claim 11,
the sample generation unit performs S times of second processing on the original text, respectively, the second processing including: randomly selecting a word in the original text as a replacement object; determining the optimal replacement word of the replacement object; and replacing the replacement object with the optimal replacement word to obtain a modified text.
13. The apparatus of claim 12,
the sample generating unit determines N nearest neighbor words of the replacement object, wherein N is a positive integer greater than one, M words with the highest context matching degree with the original text are selected from the N nearest neighbor words, M is a positive integer greater than one, M is smaller than N, and one word with the most offensive is selected from the M words to serve as the best replacement word.
14. The apparatus of claim 13,
the sample generation unit is further configured to filter out non-synonyms of the replacement object from the N nearest neighbors.
15. The apparatus of claim 13,
the sample generation unit replaces the replacement object in the original text with the nearest neighbor words respectively according to each nearest neighbor word, determines the grammar score of the modified text after replacement by using a pre-obtained voice model, sorts each nearest neighbor word according to the sequence of the corresponding grammar score from high to low, and selects the word at the top M position after sorting.
16. The apparatus of claim 13,
the sample generation unit replaces the replacement object in the original text with the word for each word in the M words, inputs the modified text after replacement into a pre-obtained text classification model to obtain a prediction classification result and a confidence level, inputs the original text into the text classification model to obtain a prediction classification result and a confidence level, and selects a word with the highest confidence level and the corresponding prediction classification result different from the prediction classification result of the original text from the M words as the optimal replacement word.
17. The apparatus of claim 11,
the sample generation unit is further configured to, for each modified text in the text set, input the modified text into a pre-obtained text classification model to obtain a predicted classification result and a confidence level, input the original text into the text classification model to obtain a predicted classification result and a confidence level, and determine that the text set meets a predetermined requirement if the predicted classification result of at least one modified text in the text set is different from the predicted classification result of the original text.
18. The apparatus of claim 17,
the sample generation unit selects a modified text which is different from the prediction classification result of the original text and has the highest confidence coefficient from the text set as the confrontation sample.
19. The apparatus of claim 17,
and the sample generation unit respectively determines the target classification prediction probability of each modified text in the text set according to the confidence coefficient, the target classification prediction probability represents the probability that the prediction classification result of the modified text is not the prediction classification result of the original text, the target classification prediction probability of each modified text is taken as the extraction probability, two modified texts are randomly extracted from the text set, and a new text is synthesized according to the two extracted modified texts.
20. The apparatus of claim 11,
the device further comprises: and the model optimization unit is used for optimizing the pre-acquired text classification model by using the generated countermeasure sample.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.
CN201911165138.6A 2019-11-25 2019-11-25 Countermeasure sample generation method and device, electronic equipment and storage medium Active CN111046176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911165138.6A CN111046176B (en) 2019-11-25 2019-11-25 Countermeasure sample generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911165138.6A CN111046176B (en) 2019-11-25 2019-11-25 Countermeasure sample generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111046176A true CN111046176A (en) 2020-04-21
CN111046176B CN111046176B (en) 2023-04-07

Family

ID=70233981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911165138.6A Active CN111046176B (en) 2019-11-25 2019-11-25 Countermeasure sample generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111046176B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN111897964A (en) * 2020-08-12 2020-11-06 腾讯科技(深圳)有限公司 Text classification model training method, device, equipment and storage medium
CN112364641A (en) * 2020-11-12 2021-02-12 北京中科闻歌科技股份有限公司 Chinese countermeasure sample generation method and device for text audit
CN112380845A (en) * 2021-01-15 2021-02-19 鹏城实验室 Sentence noise design method, equipment and computer storage medium
CN113204974A (en) * 2021-05-14 2021-08-03 清华大学 Method, device and equipment for generating confrontation text and storage medium
CN113723506A (en) * 2021-08-30 2021-11-30 南京星环智能科技有限公司 Method and device for generating countermeasure sample and storage medium
CN113935913A (en) * 2021-10-08 2022-01-14 北京计算机技术及应用研究所 Black box image confrontation sample generation method with visual perception concealment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595629A (en) * 2018-04-24 2018-09-28 北京慧闻科技发展有限公司 Data processing method and the application of system are selected for answer
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
KR20190061446A (en) * 2017-11-28 2019-06-05 공주대학교 산학협력단 Apparatus for generating adversarial example in deep learning environment and method thereof, computer program
US20190220605A1 (en) * 2019-03-22 2019-07-18 Intel Corporation Adversarial training of neural networks using information about activation path differentials
CN110378474A (en) * 2019-07-26 2019-10-25 北京字节跳动网络技术有限公司 Fight sample generating method, device, electronic equipment and computer-readable medium
CN110427618A (en) * 2019-07-22 2019-11-08 清华大学 It fights sample generating method, medium, device and calculates equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190061446A (en) * 2017-11-28 2019-06-05 공주대학교 산학협력단 Apparatus for generating adversarial example in deep learning environment and method thereof, computer program
CN108595629A (en) * 2018-04-24 2018-09-28 北京慧闻科技发展有限公司 Data processing method and the application of system are selected for answer
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
US20190220605A1 (en) * 2019-03-22 2019-07-18 Intel Corporation Adversarial training of neural networks using information about activation path differentials
CN110427618A (en) * 2019-07-22 2019-11-08 清华大学 It fights sample generating method, medium, device and calculates equipment
CN110378474A (en) * 2019-07-26 2019-10-25 北京字节跳动网络技术有限公司 Fight sample generating method, device, electronic equipment and computer-readable medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MOUSTAFA ALZANTOT, YASH SHARMA, AHMED ELGOHARY, BO-JHANG HO, MANI SRIVASTAVA, KAI-WEI CHANG: "Generating Natural Language Adversarial Examples" *
SHUHUAI REN, YIHE DENG, KUN HE, WANXIANG CHE: "Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" *
王文琦: "面向中文文本倾向性分类的对抗样本生成方法" *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN111783451B (en) * 2020-06-30 2024-07-02 不亦乐乎有朋(北京)科技有限公司 Method and apparatus for enhancing text samples
CN111897964A (en) * 2020-08-12 2020-11-06 腾讯科技(深圳)有限公司 Text classification model training method, device, equipment and storage medium
CN111897964B (en) * 2020-08-12 2023-10-17 腾讯科技(深圳)有限公司 Text classification model training method, device, equipment and storage medium
CN112364641A (en) * 2020-11-12 2021-02-12 北京中科闻歌科技股份有限公司 Chinese countermeasure sample generation method and device for text audit
CN112380845A (en) * 2021-01-15 2021-02-19 鹏城实验室 Sentence noise design method, equipment and computer storage medium
CN113204974A (en) * 2021-05-14 2021-08-03 清华大学 Method, device and equipment for generating confrontation text and storage medium
CN113204974B (en) * 2021-05-14 2022-06-17 清华大学 Method, device and equipment for generating confrontation text and storage medium
CN113723506A (en) * 2021-08-30 2021-11-30 南京星环智能科技有限公司 Method and device for generating countermeasure sample and storage medium
CN113723506B (en) * 2021-08-30 2022-08-05 南京星环智能科技有限公司 Method and device for generating countermeasure sample and storage medium
CN113935913A (en) * 2021-10-08 2022-01-14 北京计算机技术及应用研究所 Black box image confrontation sample generation method with visual perception concealment

Also Published As

Publication number Publication date
CN111046176B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111046176B (en) Countermeasure sample generation method and device, electronic equipment and storage medium
CN111625635B (en) Question-answering processing method, device, equipment and storage medium
CN111639710A (en) Image recognition model training method, device, equipment and storage medium
JP7079311B2 (en) Training methods, devices, electronic devices and storage media for machine reading models
CN111078892B (en) Countermeasure sample generation method, device, electronic equipment and storage medium
CN112001180A (en) Multi-mode pre-training model acquisition method and device, electronic equipment and storage medium
CN109933656A (en) Public sentiment polarity prediction technique, device, computer equipment and storage medium
CN111950291A (en) Semantic representation model generation method and device, electronic equipment and storage medium
CN110738997B (en) Information correction method and device, electronic equipment and storage medium
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN114416943B (en) Training method and device for dialogue model, electronic equipment and storage medium
CN112507702B (en) Text information extraction method and device, electronic equipment and storage medium
CN110032734B (en) Training method and device for similar meaning word expansion and generation of confrontation network model
CN111079945A (en) End-to-end model training method and device
CN110462638A (en) Training neural network is sharpened using posteriority
CN113409898B (en) Molecular structure acquisition method and device, electronic equipment and storage medium
CN111160013A (en) Text error correction method and device
CN114861637B (en) Spelling error correction model generation method and device, and spelling error correction method and device
CN111666771B (en) Semantic tag extraction device, electronic equipment and readable storage medium for document
CN111753761A (en) Model generation method and device, electronic equipment and storage medium
CN111966782A (en) Retrieval method and device for multi-turn conversations, storage medium and electronic equipment
CN117709435B (en) Training method of large language model, code generation method, device and storage medium
CN112232089B (en) Pre-training method, device and storage medium of semantic representation model
CN111860580B (en) Identification model acquisition and category identification method, device and storage medium
CN117649857A (en) Zero-sample audio classification model training method and zero-sample audio classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant