CN111897964A

CN111897964A - Text classification model training method, device, equipment and storage medium

Info

Publication number: CN111897964A
Application number: CN202010805356.8A
Authority: CN
Inventors: 邱耀; 张金超; 牛成; 周杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-11-06
Anticipated expiration: 2040-08-12
Also published as: CN111897964B

Abstract

The application discloses a text classification model training method, a text classification model training device, text classification model training equipment and a storage medium, and belongs to the field of artificial intelligence. On one hand, the countermeasure sample is introduced, and the text classification model is trained by using the text sample and the countermeasure sample, so that the text classification model learns the classification method for the text added with disturbance, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text features of the confrontation samples extracted during classification, and the confrontation samples are restored to text contents, so that the interpretability of the confrontation training method is improved. The model parameters are trained by combining the error between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and the accuracy of feature extraction of the text classification model are improved.

Description

Text classification model training method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for training a text classification model.

Background

The artificial intelligence is applied to various fields, and the artificial intelligence is used for replacing the work of people, so that the service processing efficiency can be greatly improved. In the aspect of text classification, the text classification model can be trained to obtain a trained text classification model, and the type of the text can be predicted by inputting the text to be classified into the trained text classification model.

At present, a text classification model training method generally includes obtaining a text sample, classifying the text sample based on a text classification model to obtain a prediction classification result, and updating model parameters according to the prediction classification result and a target classification result carried by the text sample.

The text classification model obtained by training through the method has poor robustness, and the text classification model can be wrongly classified when a few small disturbances are added into the input text.

Disclosure of Invention

The embodiment of the application provides a text classification model training method, a text classification model training device, text classification model training equipment and a storage medium, which can improve the robustness of a text classification model and improve the accuracy effect of feature extraction and classification of the text classification model. The technical scheme is as follows:

in one aspect, a method for training a text classification model is provided, where the method includes:

based on a text classification model, performing feature extraction on a text sample and a countermeasure sample of the text sample, classifying based on the extracted text feature, and outputting a predicted classification result of the text sample and the countermeasure sample, wherein the text sample and the corresponding countermeasure sample both carry the same target classification result;

acquiring a first classification error and a second classification error, wherein the first classification error is an error between a prediction classification result and a target classification result of the text sample, and the second classification error is an error between a prediction classification result and a target classification result of the countermeasure sample;

based on the text classification model, recognizing the text features of the confrontation sample, and outputting a text recognition result corresponding to the text features;

acquiring a recognition error based on the text recognition result and the text sample;

updating model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error.

In one aspect, an apparatus for training a text classification model is provided, the apparatus comprising:

the classification module is used for extracting the characteristics of a text sample and a countermeasure sample of the text sample based on a text classification model, classifying based on the extracted text characteristics, and outputting the predicted classification results of the text sample and the countermeasure sample, wherein the text sample and the corresponding countermeasure sample both carry the same target classification result;

an obtaining module, configured to obtain a first classification error and a second classification error, where the first classification error is an error between a predicted classification result and a target classification result of the text sample, and the second classification error is an error between a predicted classification result and a target classification result of the countermeasure sample;

the recognition module is used for recognizing the text features of the confrontation samples based on the text classification model and outputting text recognition results corresponding to the text features;

the obtaining module is further configured to obtain a recognition error based on the text recognition result and the text sample;

an updating module for updating the model parameters of the text classification model based on the first classification error, the second classification error and the identification error.

In one possible implementation, the identification module is configured to:

mapping the text features of the confrontation samples to a real number field based on the text classification model to obtain word embedding information corresponding to the text features;

and matching the word embedding information with a word list, outputting at least one matched word, and taking the at least one word as a text recognition result corresponding to the text characteristics.

In one possible implementation, the text classification model includes two layers of neural networks, a first layer of neural networks for mapping the text features of the confrontation samples to the real number domain, and a second layer of neural networks for matching the word embedding information to a vocabulary.

In a possible implementation manner, the recognition module is configured to perform normalization processing on the text features of the countermeasure sample based on the text classification model, and perform the steps of mapping to a real number domain and matching with a word list based on the text features after the normalization processing.

In one possible implementation, the update module is configured to perform any one of:

acquiring a product of the recognition error and the weight of the recognition error, acquiring the sum of the product and the first classification error and the second classification error as a total error, and updating the model parameters of the text classification model based on the total error;

weighting the first classification error, the second classification error and the identification error based on respective weights of the first classification error, the second classification error and the identification error to obtain a total error, and updating model parameters of the text classification model based on the total error.

In one possible implementation, the classification module includes a classification unit and a generation unit;

the classification unit is used for inputting the text samples into a text classification model, extracting the features of the text samples by the text classification model, classifying the text samples based on the extracted text features, and outputting the prediction classification result of the text samples;

the generation unit is used for generating corresponding countermeasure samples based on the text samples, the prediction classification results of the text samples and target classification results;

the classification unit is also used for extracting the characteristics of the confrontation samples, classifying the confrontation samples based on the extracted text characteristics, and outputting the prediction classification result of the confrontation samples.

In one possible implementation, the classification unit includes a mapping subunit and an extraction subunit;

the mapping subunit is configured to map, by the text classification model, words included in text content of the text sample to a real number field, so as to obtain word embedding information of the text sample;

the extraction subunit is configured to perform feature extraction on the word embedding information of the text sample to obtain a text feature of the text sample.

In one possible implementation, the generating unit includes a determining subunit and an adding subunit;

the determining subunit is configured to determine, based on the predicted classification result and the target classification result of the text sample, counterdisturbance of the text sample;

the adding subunit is configured to add the confrontation disturbance to the text sample to obtain a confrontation sample corresponding to the text sample.

In one possible implementation, the determining subunit is configured to:

acquiring a first classification error of the text sample according to the predicted classification result and the target classification result of the text sample;

obtaining candidate counterdisturbance of the text sample based on the gradient of the first classification error;

adding the candidate confrontation disturbance into the text sample to obtain a candidate confrontation sample corresponding to the text sample;

continuously obtaining a prediction classification result and a target classification result obtained by classifying the candidate confrontation sample, and obtaining a classification error of the candidate confrontation sample;

and updating the candidate counterdisturbance of the text sample based on the gradient of the classification error of the candidate counterdisturbance sample until a target condition is reached, and obtaining the counterdisturbance of the text sample.

In a possible implementation manner, the adding subunit is configured to add the countermeasure disturbance to the text content of the text sample to obtain the text content of the countermeasure sample corresponding to the text sample;

the feature extraction process of the confrontation sample comprises the following steps:

mapping words contained in the text content of the confrontation sample to a real number field to obtain word embedding information of the confrontation sample;

and performing feature extraction on the word embedded information of the countermeasure sample to obtain the text features of the countermeasure sample.

In a possible implementation manner, the adding subunit is configured to add the countermeasure disturbance to the word embedding information of the text sample to obtain word embedding information of a countermeasure sample corresponding to the text sample;

In one possible implementation, the text classification model includes an encoder, a classifier, and a decoder;

wherein the encoder is used for feature extraction;

the classifier is used for performing classification based on the extracted text features;

the decoder is used for recognizing the text features of the confrontation samples and outputting text recognition results corresponding to the text features.

In one aspect, an electronic device is provided that includes one or more processors and one or more memories having at least one program code stored therein, the at least one program code being loaded into and executed by the one or more processors to implement various alternative implementations of the text classification model training method described above.

In one aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement various alternative implementations of the above-described text classification model training method.

In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer-readable storage medium. The one or more program codes can be read from a computer-readable storage medium by one or more processors of the electronic device, and the one or more processors execute the one or more program codes, so that the electronic device can execute the text classification model training method of any one of the above-mentioned possible embodiments.

According to the method and the device provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text sample and the countermeasure sample are used for training the text classification model, so that the text classification model learns the classification method for the text added with disturbance, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text features of the confrontation samples extracted during classification, and the confrontation samples are restored to text contents, so that the interpretability of the confrontation training method is improved. The model parameters are trained by combining the error between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and the accuracy of feature extraction of the text classification model are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic application flow diagram of an emotion analysis system provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an application flow of an intention classification system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation environment of a text classification model training method according to an embodiment of the present application;

FIG. 4 is a flowchart of a text classification model training method according to an embodiment of the present disclosure;

FIG. 5 is a flowchart of a text classification model training method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a reconstructor or decoder according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a pre-trained language model provided in an embodiment of the present application;

FIG. 8 is a structural diagram of a text classification model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a self-encoder provided in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a text classification model training apparatus according to an embodiment of the present application;

fig. 11 is a block diagram of a terminal according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first image can be referred to as a second image, and similarly, the second image can be referred to as a first image without departing from the scope of various such examples. The first image and the second image can both be images, and in some cases, can be separate and distinct images.

The term "at least one" is used herein to mean one or more, and the term "plurality" is used herein to mean two or more, e.g., a plurality of packets means two or more packets.

It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an associative relationship that describes an associated object, meaning that three relationships can exist, e.g., a and/or B, can mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present application generally indicates that the former and latter related objects are in an "or" relationship.

It should also be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should also be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

It will be further understood that the terms "Comprises," "Comprising," "inCludes" and/or "inCluding," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also understood that the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.

The following is a description of terms involved in the present application.

And (3) confrontation training: the English name of the method is adaptive training, which is a mode for enhancing the robustness of the model. The countertraining techniques are based on White-box attacks (White-box attacks) which means that the attacker knows all the information about the attacked model, including the model structure, the penalty functions, the parameter values, the architecture, the training methods, and in some cases the training data. In the process of the countermeasure training, some tiny counterdisturbance can be mixed with the original samples to obtain the countermeasure samples, the countersamples are slightly changed relative to the original samples, but the model is likely to be misclassified, and then the model is adapted to the change, so that the countersamples are robust.

The term "countermeasure sample" refers to an input sample formed by intentionally adding a small disturbance to the data set. Inputting the challenge sample into a conventional model can cause the model to give a false output with high confidence, resulting in misclassification.

The counterdisturbance refers to an interference factor added to the original sample, for example, in the image field, white noise may be added to the clean image to obtain a countersample, and the white noise is the counterdisturbance. For another example, in the text field, the counterdisturbance may refer to some word modification added to the text, or a modification of word embedded information of the text, and the like.

An auto-encoder: the english is called autoencoder, and the english is abbreviated as AE, and is a kind of Artificial Neural Networks (ans) used in semi-supervised learning and unsupervised learning. The input from the encoder is the same as the learning objective. The auto-encoder can perform representation learning (representation learning) on the input information by taking the input information as a learning target, that is, learning how to accurately express features of the input information by taking the input information as a target. The self-encoder has a function of characterizing a learning algorithm in a general sense, and is applied to dimension reduction (dimension reduction) and outlier detection (anomaly detection).

The self-encoder comprises an encoder (encoder) and a decoder (decoder). In a learning paradigm, the self-encoder can be divided into a punctured self-encoder (irregular self-encoder), a regularized self-encoder (regular self-encoder), and a Variational self-encoder (VAE), where the former two are discriminant models and the latter is a generative model. The self-encoder may be a neural network of a feedforward structure or a recursive structure, depending on the type of construction.

Specifically, given an input space and a feature space, the self-encoder can solve the mapping of the two to minimize the reconstruction error of the input features, thereby learning a more accurate feature expression.

For robustness, robustness is the transliteration of Robust, meaning Robust and Robust. In the computer context, it refers to the ability of the system to survive in abnormal and dangerous situations. For example, whether the computer software is halted or crashed in the case of input error, disk failure, network overload, or intentional attack is the robustness of the computer software. By "robust" is also meant that the control system maintains some other performance characteristic under a perturbation of a parameter (e.g., structure or size).

And for the text classification model, the text classification model is used for classifying the input text and determining the type of the text. For example, the type may be an emotion expressed by the text, an attribute of an object embodied by the text, or an intention expressed by the text. According to the specific application of the text classification model, the types of texts to be determined by the text classification model are different.

The robustness of the text classification model refers to the characteristic that the text classification model can still classify accurately when the input text is slightly changed.

Word embedding information: word embedding is a general term for Language models and characterization learning techniques in Natural Language Processing (NLP). Conceptually, it refers to embedding a high-dimensional space with dimensions of the number of all words into a continuous vector space with much lower dimensions, each word or phrase being mapped as a vector on the real number domain. The word embedding process is a dimension reduction process, and is used for mapping a word to a real number domain to obtain a vector expression of the word, and the vector expression of the word can be called as word embedding information.

The word Embedding is a process of Embedding, which is a way of converting discrete variables into continuous vectors. Embedding is also the process of mapping source data to another space. In colloquial, word embedding may also be referred to as word embedding, which is to map a word in a space to which X belongs to a multidimensional vector into a space to which Y belongs, the multidimensional vector is equivalent to embedding into the space to which Y belongs. The mapping process is the process of generating an expression on a new space.

The word embedding can be applied to artificial neural networks, dimension reduction of word and expression co-occurrence matrixes, probability models, explicit representation of contexts in which words are located and the like. The method for representing words or phrases by using word embedding information can improve the analysis effect of the text in the NLP.

The vocabulary refers to a concrete embodiment of a subject law entity, and in the embodiment of the application, the vocabulary can also be called a dictionary, and the vocabulary includes a large number of words and is used for identifying words corresponding to other forms of characteristics.

Bert (bidirectional Encoder restances from transformers), bi-directional Encoder representation from transformer: is a pre-training model. The BERT can extract all used word vectors in a sample, store the word vectors into a vector file, and provide embedding vectors for subsequent models, namely word embedding information.

Forward Neural Network (FFN): in FFN, parameters are propagated unidirectionally from the input layer to the output layer, unlike the recurrent neural network output layer. The inside of the FFN does not form a directional ring.

Interpretability: the machine learning business application targets output decision making. Interpretability refers to the degree to which a human can understand the cause of a decision. The higher the interpretability of a machine learning model, the easier it is for people to understand why certain decisions or predictions are made. Model interpretability refers to the understanding of mechanisms within a model and the understanding of the results of the model. The important characters are as follows: in the modeling stage, developers are assisted to understand the model, the model is compared and selected, and the model is optimized and adjusted if necessary; and in the operation stage, an internal mechanism of the model is explained to a service party, and the model result is explained. Such as the fund recommendation model, requires an explanation: why a certain fund is recommended for this user.

Data Augmentation (Data Augmentation): data enhancement may refer to augmenting a data set, or enhancing the diversity and generalization of data in a data set. For example, image augmentation is a data enhancement, and image augmentation (image augmentation) techniques expand the size of a training data set by making a series of random changes to a training image to produce similar but different training samples. Another explanation for image augmentation is that randomly changing the training samples can reduce the dependence of the model on certain attributes, thereby improving the generalization capability of the model. For example, the image is cut in different modes, so that the interested object appears at different positions, and the dependence of the model on the appearance position of the object is relieved. Or adjusting the brightness, color and other factors of the image to reduce the sensitivity of the model to the color.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the technologies of text processing, semantic understanding, machine translation, robot question answering and the like in the artificial intelligent natural language processing technology, and is specifically explained by the following embodiment.

An application scenario of the scheme provided in the embodiment of the present application is described below.

The text classification model training method provided by the embodiment of the application can be applied to any scene needing text classification, and the method can be applied to products of the scenes, such as an emotion analysis system, a yellow reflex system, a commodity classification system, an intention classification system and the like.

The emotion analysis system is used for dividing the text into commendatory or derogatory types according to the meaning and emotion information expressed by the text, or dividing the text into other types, such as happiness, injury, surprise, confusion and the like. Among them, emotion analysis (Sentiment analysis), also called tendency analysis, Opinion extraction (Opinion extraction), Opinion mining (Opinion mining), emotion mining (Sentiment mining), subjective analysis (subjective analysis), is a process of analyzing, processing, inducing and reasoning subjective texts with emotional colors, such as analyzing emotional tendency of attributes of "zoom, price, size, weight, flash, ease of use" of a user to a "digital camera" from comment texts.

For example, as shown in fig. 1, when it is desired to determine the emotion expressed by a text, an input text 101 may be input into a text classification model 102, and the text classification model 102 may perform emotion analysis on the input text 101 and output a type 103 to which the input text 101 belongs. For example, input the text "the song is good to hear! "after text classification, the type of the text is output as" happy ".

The Huang-reverse system is used for identifying whether the text is yellow and the content of the reverse action, and is called the Huang-reverse content herein. For example, a user may publish a text in a website or an application, and a terminal may send the text to be published to a yellow reflex system, where the yellow reflex system may classify the text to be published, determine whether the text contains yellow reflex content, and if so, may feed back that the text fails to be audited; if it is determined not to be included, the text may be posted in the website or application. Of course, only one example is provided here, and the yellow reflex system may further include a variety of application scenarios, which are not limited in this application embodiment.

The commodity classification system can automatically and efficiently distinguish and apply commodities, and particularly can be divided according to industries and the like in the commodity production and circulation fields, for example, commodities are divided into large categories such as foods, textiles, general goods, hardware and cultural goods, and each large category can be subdivided. Of course, the classification process can also be based on other factors, such as classification based on the properties, composition, etc. of the goods. The product classification system may apply a text classification model to classify the product based on text information of the product, for example, classifying the product according to a product name, or classifying the product according to text information of an advertisement when the product is an advertisement.

An intent classification system is used to classify the intent of text. The intention is to indicate the desired object. The intent classification system is then used to analyze the intent of the text. The intention classification system can be applied to various scenes, for example, in a search scene, the text can be the text input by the user, and after the user inputs the text, the intention classification system analyzes the corresponding intention. For another example, in a speech interaction scenario, the text may be a text obtained by performing speech recognition on the captured speech signal. The user sends out voice, the equipment collects voice signals, the voice signals are recognized to obtain texts, and voice instructions sent out by the user are determined through intention classification.

For example, as shown in fig. 2, a specific speech interaction application scenario is provided, a user 201 sends a speech signal 202, the device can collect the speech signal 202, perform speech recognition on the speech signal 202 through a speech recognition system 203 to obtain a text 204 corresponding to the speech signal 202, perform intent classification on the text 204 through an intent classification system 205, and determine a function 206 that the speech signal 202 is intended to achieve. For example, if the text corresponding to the speech signal is "play music", the intention of the text is determined by the intention classification system to be: the control device plays music. The playing of music is a function of the device.

The following describes an embodiment of the present application.

Fig. 3 is a schematic diagram of an implementation environment of a text classification model training method according to an embodiment of the present application. The implementation environment comprises a terminal 301 or the implementation environment comprises a terminal 301 and a text classification platform 302. The terminal 301 is connected to the text classification platform 302 through a wireless network or a wired network.

The terminal 301 can be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The terminal 301 is installed and running with an application supporting text classification, which can be, for example, a system application, an instant messaging application, a news push application, a shopping application, an online video application, a social application.

Illustratively, the terminal 301 can obtain a text sample, train a text classification model based on the text sample, obtain a text classification model with good classification accuracy and robustness after the training is finished, and subsequently classify the text by using the trained text classification model to determine the type of the text. The terminal 301 can independently complete the work and can also provide data services for the terminal through the text classification platform 302. The embodiments of the present application do not limit this.

The text classification platform 302 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Text classification platform 302 is used to provide background services for applications that support text classification. Optionally, the text classification platform 302 undertakes primary processing, and the terminal 301 undertakes secondary processing; or, the text classification platform 302 undertakes the secondary processing work, and the terminal 301 undertakes the primary processing work; alternatively, the text classification platform 302 or the terminal 301 can be able to undertake the processing work separately, respectively. Or, a distributed computing architecture is adopted between the text classification platform 302 and the terminal 301 for collaborative computing.

Optionally, the text classification platform 302 includes at least one server 3021 and a database 3022, where the database 3022 is used to store data, and in this embodiment, the database 3022 can store text samples to provide data services for the at least one server 3021.

Illustratively, at least one server 3021 is capable of extracting text samples from the database 3022, training a text classification model based on the text samples, and using the trained text classification model. When the terminal 301 has a text classification requirement, the text to be classified can be sent to the at least one server 3021, and the at least one server 3021 can call the trained text classification model, classify the received text, determine the type of the text, and feed back the classification result (i.e., the type of the text) to the terminal 301.

The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal can be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like.

Those skilled in the art will appreciate that the number of terminals 301 and servers 3021 can be greater or less. For example, the number of the terminals 301 and the servers 3021 may be only one, or the number of the terminals 301 and the servers 3021 may be several tens or hundreds, or more, and the number of the terminals or the servers and the type of the devices are not limited in the embodiment of the present application.

Fig. 4 is a flowchart of a text classification model training method provided in an embodiment of the present application, where the method is applied to an electronic device, where the electronic device is a terminal or a server, and referring to fig. 4, the method includes the following steps.

401. The electronic equipment extracts the characteristics of the text sample and the countermeasure sample of the text sample based on the text classification model, classifies the text sample and the countermeasure sample based on the extracted text characteristics, and outputs the predicted classification results of the text sample and the countermeasure sample, wherein the text sample and the corresponding countermeasure sample both carry the same target classification result.

The text sample refers to a sample in a text form, and the sample (specimen) refers to a part of an individual under observation or investigation, and in the embodiment of the application, the text sample is used for training a text classification model. The confrontation sample corresponding to the text sample refers to the confrontation sample obtained on the basis of the text sample. For example, adding an opponent-resist to a text sample can generate a corresponding opponent sample. The aim of the countermeasure training is to enable the text classification model to accurately classify the countermeasure samples added with the countermeasure disturbance, so that the target classification result carried by the countermeasure samples is the same as the target classification result carried by the corresponding text samples.

The prediction classification result refers to a classification result which is classified and output through a text classification model, the classification process is a process of predicting the type of the text, and the classification result output by the text classification model is a prediction result. The target classification result is a true, correct classification result, which may also be referred to as a "true value". The purpose of training the text classification model is to enable a predicted classification result obtained by the text classification model to be infinitely close to the target classification result by updating the model parameters of the text classification model, namely, enable the text classification model to accurately classify the text.

In step 401, the model parameters of the text classification model are initial values, and model training is further performed to obtain better model parameters. During model training, the electronic equipment can obtain a text sample, input the text sample into the text classification model, and classify the text sample by the text classification model to obtain a prediction classification result.

The classification process comprises two parts of obtaining text features through feature extraction and classifying based on the text features. The feature is a feature that one object is different from other objects, and the text feature is a feature that the text is different from other texts. The text features express the characteristics of the text through a machine language, for example, the text features can embody the characteristics of words in the text content and the relationship between the words and the context, and the like. The context refers to context, semantic meaning, and may be other words before and after the word.

402. The electronic equipment acquires a first classification error and a second classification error, wherein the first classification error is an error between a prediction classification result and a target classification result of the text sample, and the second classification error is an error between a prediction classification result and a target classification result of the countermeasure sample.

After the electronic equipment classifies all samples (including text samples and confrontation samples) to obtain a predicted text classification result, a classification error can be obtained according to the predicted classification result and a target classification result, and the current classification effect of the text classification model is measured through the classification error. The classification error corresponding to the text sample is referred to as a first classification error, and the classification error corresponding to the countersample is referred to as a second classification error.

Understandably, if the classification error is larger, the classification effect of the text classification model is poorer; if the classification error is smaller, the classification effect of the text classification model is better.

403. And the electronic equipment identifies the text features of the confrontation sample based on the text classification model and outputs a text identification result corresponding to the text features.

404. And the electronic equipment acquires a recognition error based on the text recognition result and the text sample.

In the

steps

403 and 404, the electronic device may further perform reconstruction processing on the text features of the confrontation sample to restore the text content thereof. The text recognition result obtained by reconstructing after adding the countermeasure disturbance to the text sample is compared with the text sample to obtain a recognition error, wherein the recognition error is used for indicating the difference between the text recognition result and the text sample, and the robustness and the accuracy of the text classification model in the aspect of feature extraction can be measured. In particular, the process of identifying the text features as corresponding text content is a reconstruction process, and accordingly, the identification error may also be referred to as a reconstruction error.

It can be understood that if the recognition error is small, that is, the difference between the reconstructed result and the text sample corresponding to the countermeasure sample is small, that is, even if the countermeasure disturbance is added, the original text content can be accurately restored, which indicates that the text feature obtained by extracting the feature of the text sample can accurately represent the text sample, and the current feature expression mode is accurate. If the recognition error is large, namely the difference between the reconstructed result and the text sample corresponding to the countermeasure sample is large, namely even if the original text content cannot be restored by adding the countermeasure disturbance, the original text content is changed, which means that the text feature obtained by extracting the feature of the text sample is not accurate enough, and the original meaning is changed by slightly changing.

405. The electronic device updates model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error.

The model parameters of the text classification model are updated by combining the classification errors and the identification errors, so that the robustness and the accuracy of the classification of the text classification model can be improved, and the robustness and the accuracy of the feature extraction of the text classification model can be improved. On one hand, the confrontation samples are introduced in the training process, the training samples comprise the text samples and the confrontation samples, and the text classification model can accurately classify the original text samples and the text samples added with the confrontation disturbance, so that the robustness and the accuracy of the classification of the text classification model are improved. On the other hand, recognition errors are added in the training process, through error recognition, the original text content can still be accurately restored and obtained after the text classification model adds the anti-disturbance to the input text content, the text classification model can learn an accurate feature expression mode, and the feature extraction step also has strong robustness.

According to the method provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text classification model is trained by using the text sample and the countermeasure sample, so that the text classification model learns the classification method for the text added with disturbance, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text features of the confrontation samples extracted during classification, and the confrontation samples are restored to text contents, so that the interpretability of the confrontation training method is improved. The model parameters are trained by combining the error between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and the accuracy of feature extraction of the text classification model are improved.

Fig. 5 is a flowchart of a text classification model training method provided in an embodiment of the present application, and referring to fig. 5, the method includes the following steps.

501. The electronic device obtains a text sample.

In the embodiment of the application, the electronic device can obtain the text sample and train the text classification model based on the text sample. The text sample carries a target classification result, and the classification capability of the text classification model can be determined according to the prediction classification result of the text sample and the target classification result of the text classification model, so that model parameters are adjusted in a multi-iteration process, and the classification capability of the text classification model is improved.

Specifically, according to different storage addresses of the text samples, the electronic device may obtain the text samples in multiple ways, in one possible implementation manner, the text samples may be stored in a text database, and when the text classification model needs to be trained, the electronic device may extract the text samples from the text database.

In another possible implementation, the text sample may be a text resource in a website from which the electronic device can download the text sample.

In another possible implementation, the text sample may be stored in the electronic device, for example, historical text sent to the electronic device by other devices, or text generated by the electronic device, and the electronic device may retrieve the text sample from local storage.

The foregoing provides several possible implementation manners for obtaining the text sample, and the electronic device may also obtain the text sample in other manners.

502. The electronic equipment inputs the text sample into a text classification model, and the text classification model maps words contained in the text content of the text sample to a real number field to obtain word embedding information of the text sample.

After the electronic device obtains the text sample, the text sample can be used to train a text classification model. The form of the text sample is a text form, that is, the text sample exists in a text content form. The text classification model can reduce the dimension of the text content, convert the text content into a representation form of continuous vectors, and the method for representing words or phrases by using word embedding information can improve the subsequent analysis effect on the text.

In one possible implementation manner, the text sample includes at least one word, and the text classification model is capable of mapping the at least one word in the text content of the text sample to a real number field respectively to obtain corresponding word embedding information, where the word embedding information includes word embedding information corresponding to each word. For example, the text sample may be a sentence, the sentence including one or more words, and the text classification model may map each word to word embedding information, i.e., word embedding information of the one or more words constituting the sentence.

In a specific possible embodiment, the word embedding information may adopt one-hot representation (one-hot representation), or distributed representation (distributed representation), and of course, other representation manners may also be adopted, specifically, the word embedding information may be in a vector form, and the word embedding information in the vector form may be referred to as a word vector, and the word embedding information may also be in a matrix form, which is not limited in this embodiment of the present application.

503. And the electronic equipment performs feature extraction on the word embedded information of the text sample to obtain the text features of the text sample.

After the word embedding information of the text sample is determined, the word embedding information contains the representation of words in the text content, but the association between each word and the context cannot be clearly determined through the word embedding information, and the electronic equipment can extract the features of the word embedding information to obtain text features capable of accurately representing the text sample.

In one possible implementation, the word embedding information includes word embedding information for one or more words, and for each word of word embedding information, the electronic device is capable of determining a text characteristic corresponding to the word for a context of the word.

Specifically, the electronic device can determine the text feature corresponding to the word according to the word embedding information of the word, the word embedding information of the first word, and the word embedding information of the second word. Wherein the first word refers to a word that is sequentially before the word in the text content. The second word refers to a word that follows the word in the text content in order. Of course, the first word and the last word in the text content comprise the second word and the first word, respectively.

The foregoing provides only one specific example, and the electronic device may also perform feature extraction in other manners, for example, the electronic device may determine, in addition to the context, a text feature corresponding to the word according to the position of the word in the text content, or may determine a corresponding text feature according to the context of the word and the position in the text content, and the like, which is not limited in this embodiment of the application.

It should be noted that, step 502 and step 503 are processes of performing feature extraction on the text sample by the text classification model, in the feature extraction process, the dimension is reduced to obtain word embedding information, and then text features are obtained by extracting the word embedding information, so that text content of the text sample can be analyzed more finely to obtain accurate text features. In a possible implementation manner, the step 502 and the step 503 can be implemented by two neural network layers, which are respectively called a word embedding layer and a feature extraction layer, and the electronic device can process the text sample by the word embedding layer, output word embedding information, input the word embedding information into the feature extraction layer, perform feature extraction on the word embedding information by the feature extraction layer, and output text features. The processing performed by each neural network layer may be convolution processing or other processing, which is not limited in this embodiment of the present application.

Of course, the feature extraction process may also be implemented in other manners, for example, feature extraction may be directly performed on text contents of a text sample to obtain text features, which is not limited in this embodiment of the present application.

504. And the electronic equipment classifies the text samples based on the extracted text features and outputs the prediction classification result of the text samples.

After the electronic device extracts the text features, the text features can be classified based on the text features, and the classification process is used for matching the text features with multiple candidate types of the text sample, determining the matching degree of the text features and each candidate type, and outputting a prediction classification result.

The prediction classification result may include a plurality of forms, and in one possible form, the prediction classification result is in the form of a vector, and the vector includes a plurality of elements, each element corresponding to a candidate type, and the element is used for indicating the matching degree of the text feature and the candidate type.

In another possible form, the prediction classification result is the candidate type with the largest matching degree with the text feature, or the prediction classification result is the candidate type with the largest matching degree with the text feature and the matching degree.

The matching degree may be in a probability form or a matching grade form, and the prediction classification result and the matching degree form are not limited in the embodiment of the present application.

In one possible implementation, the classification process may be implemented by a classification algorithm, which may be any kind of classification algorithm, such as a regression analysis classification algorithm, a bayesian classification algorithm, a decision tree, and so on. The embodiment of the present application does not limit which classification algorithm is specifically adopted.

In a specific possible embodiment, the classification process may be implemented by a Softmax function, and the electronic device processes the text feature by using the Softmax function to obtain a predicted classification result of the text feature. Softmax is a logistic regression function, which is commonly used for classification problems.

In one possible implementation, the classifying step can be implemented by classifiers, which can include multiple types, e.g., a two-class classifier, a multi-tasking classifier, and the like. The classifier can implement any classification algorithm, the type of the classifier and the classification algorithm adopted can be determined by related technical personnel according to requirements, and the embodiment of the application is not limited thereto.

505. The electronic device determines an opposition perturbation of the text sample based on the predicted classification result and the target classification result of the text sample.

After the electronic equipment carries out classification prediction on the text samples to obtain the prediction classification result, the corresponding countermeasure samples can be generated for the text samples on the basis of the prediction classification result and the target classification result of the text samples, so that the countermeasure samples can be added on the basis of the text samples to train the text classification model. When generating the countermeasure sample, the electronic device may determine the countermeasure disturbance of the text sample, and add the countermeasure disturbance to the text sample to obtain the corresponding countermeasure sample.

In one possible implementation, the determination of the counterdisturbance can be implemented by the following steps one to five.

Step one, the electronic equipment obtains a first classification error of the text sample according to a prediction classification result and a target classification result of the text sample.

After the electronic equipment carries out classification prediction on the text samples to obtain a prediction classification result, the classification capability of the text classification model can be measured by comparing the target classification result, and can be represented by a first classification error.

In one possible implementation, the first classification error may be obtained by a loss function, which may be any one of loss functions, such as a cross-entropy loss function, an L1, an L2 equidistant regression loss function, an exponential loss function, or the like. In a specific possible embodiment, the first classification error may be obtained by a cross entropy Loss Function (CE Loss Function). The embodiment of the present application does not limit the specific obtaining manner of the first classification error.

And step two, the electronic equipment acquires candidate anti-disturbance of the text sample based on the gradient of the first classification error.

After the electronic device determines the first classification error, the countering perturbation of the text sample can be determined based on the first classification error. In the acquisition process of the countermeasure disturbance, the electronic device can determine and give a candidate countermeasure disturbance based on the first classification error, add the candidate countermeasure disturbance into the text sample, and then continuously determine a new candidate countermeasure disturbance based on the classification error of the candidate countermeasure sample added with the candidate countermeasure disturbance.

And step three, adding the candidate confrontation disturbance into the text sample by the electronic equipment to obtain a candidate confrontation sample corresponding to the text sample.

The process of adding candidate countermeasure disturbance can be implemented in various ways, two optional ways are provided below, and any way can be adopted in the embodiment of the present application, and the present application is not limited thereto.

And adding the candidate countermeasure disturbance into the word embedding information of the text sample to obtain the word embedding information of the candidate countermeasure sample corresponding to the text sample.

When the candidate countermeasure sample is generated, the candidate countermeasure sample is in a word embedding form, the electronic device in the fourth step can directly perform feature extraction on the word embedding information of the candidate countermeasure sample to obtain the text features of the candidate countermeasure sample, and then perform classification based on the text features to obtain a prediction classification result.

And secondly, adding the candidate countermeasure disturbance into the text content of the text sample to obtain the text content of the candidate countermeasure sample corresponding to the text sample.

When the candidate countermeasure sample is generated in a text form, the electronic device in the fourth subsequent step may map words contained in the text content of the candidate countermeasure sample to a real number domain to obtain word embedding information of the candidate countermeasure sample, and then perform feature extraction on the word embedding information of the candidate countermeasure sample to obtain text features of the candidate countermeasure sample.

And step four, the electronic equipment continues to obtain a prediction classification result and a target classification result which are obtained by classifying the candidate confrontation sample, and obtains the classification error of the candidate confrontation sample.

The classification process and the classification error obtaining process for the candidate confrontation sample in the fourth step are the same as those in the first step 502 to the second step 504, and are not repeated herein.

And step five, the electronic equipment updates the candidate confrontation disturbance of the text sample based on the gradient of the classification error of the candidate confrontation sample until the target condition is reached, and the confrontation disturbance of the text sample is obtained.

Wherein the target condition is that the number of iterations reaches a target number, or the target condition is that the classification error converges. The target times can be set by related technicians according to requirements, and can be a hyper-parameter of the text classification model, namely an empirical value obtained by performing text classification model training before. For example, the target number may be 3, 4, or 5, which is not limited in the embodiments of the present application.

The process is a multiple iteration process, in each iteration process, the candidate confrontation disturbance is expected to be increased, so that the classification error is increased, the influence of the obtained confrontation sample on the text classification model is probably larger, and the robustness of the text classification model obtained through the training of the confrontation sample is higher.

In a possible implementation manner, the electronic device may initialize a candidate counterdisturbance, update the candidate counterdisturbance in each subsequent iteration process, find a candidate counterdisturbance with a gradient of a classification error that rises, and make the classification error the largest through multiple iterations, so that the obtained countersample has a larger influence on the text classification model.

In each iteration process, the electronic equipment determines an adjustment value of the candidate anti-disturbance based on the gradient of the classification error, adds the adjustment value on the basis of the candidate anti-disturbance in the previous iteration process, and converts the adjusted candidate anti-disturbance into the value range of the anti-disturbance to obtain the candidate anti-disturbance required to be added in the next iteration.

506. And the electronic equipment adds the confrontation disturbance into the text sample to obtain a confrontation sample corresponding to the text sample.

This step 506 is a process of generating a countermeasure sample based on the countermeasure disturbance, according to which the electronic device can add it to the different forms of data of the text sample to generate different forms of countermeasure sample. Two ways are provided below, and the electronic device can implement the generation process of the challenge sample in either way.

And adding the confrontation disturbance into the word embedding information of the text sample to obtain the word embedding information of the confrontation sample corresponding to the text sample.

The countermeasure sample is generated in a word embedding form, and accordingly, in the following step 507, the electronic device may perform feature extraction on the word embedding information of the countermeasure sample to obtain the text features of the countermeasure sample.

And secondly, adding the confrontation disturbance into the text content of the text sample to obtain the text content of the confrontation sample corresponding to the text sample.

Correspondingly, in the following step 507, the electronic device may map words included in the text content of the countermeasure sample to a real number field to obtain word embedding information of the countermeasure sample, and then perform feature extraction on the word embedding information of the countermeasure sample to obtain text features of the countermeasure sample.

It should be noted that, steps 505 to 506 are processes of generating corresponding countermeasure samples based on the text sample, the predicted classification result of the text sample, and the target classification result, and the text sample and the corresponding countermeasure samples both carry the same target classification result.

507. The electronic equipment extracts the features of the confrontation sample, classifies the confrontation sample based on the extracted text features, and outputs a prediction classification result of the confrontation sample.

The step 507 is similar to the contents shown in the above steps 502 to 504, and will not be described herein again.

In a possible implementation manner, in step 505, the electronic device may obtain the first classification error, so that in step 508, the electronic device may not obtain the first classification error any more. In another possible implementation manner, the electronic device may repeat the step of obtaining the first classification error. Alternatively, the countermeasure disturbance may be obtained in other manners in step 505, and the electronic device obtains the first classification error and the second classification error in step 508. The embodiment of the present application does not specifically limit which manner is used.

508. The electronic equipment acquires a first classification error and a second classification error, wherein the first classification error is an error between a prediction classification result and a target classification result of the text sample, and the second classification error is an error between a prediction classification result and a target classification result of the countermeasure sample.

Step 508 is similar to the step 505, and will not be described herein again.

509. And the electronic equipment identifies the text features of the confrontation sample based on the text classification model and outputs a text identification result corresponding to the text features.

The recognition process is used for recognizing the text features as corresponding text contents, and the text contents obtained through recognition are called as text recognition results. The text recognition result may include various forms, for example, text corresponding to the text feature and the probability, or text corresponding to the text feature.

This step 509, which may also be referred to as a reconstruction process, is essentially a decoding process, and accordingly, the above-mentioned feature extraction process is essentially an encoding process, and the decoding process is used to restore the features obtained in the encoding process (i.e., the feature extraction process) to a text form.

Specifically, this step 509 can be realized by the following step one and step two.

Mapping the text features of the confrontation sample to a real number field based on the text classification model to obtain word embedding information corresponding to the text features;

and step two, matching the word embedding information with a word list, outputting at least one matched word, and taking the at least one word as a text recognition result corresponding to the text characteristics.

In one possible implementation, the text classification model includes two layers of neural networks, a first layer of neural networks for mapping the text features of the confrontation samples to the real number domain, and a second layer of neural networks for matching the word embedding information to the vocabulary.

Specifically, in this step 509, the electronic device may map the text features of the confrontation sample to a real number domain based on a first-layer neural network of the text classification model, to obtain word-embedded information corresponding to the text features, and parameters of the first-layer neural network may be synchronized with parameters of the neural network layer that performs step 502. The electronic equipment matches the word embedding information with a word list based on a first-layer neural network of a text classification model, outputs at least one matched word, and takes the at least one word as a text recognition result corresponding to the text characteristics.

In a possible implementation manner, the electronic device may further perform normalization processing on the text features of the anti-sample, and then perform subsequent mapping and feature extraction steps. Specifically, the electronic device normalizes the text features of the confrontation sample based on the text classification model, and performs the steps of mapping to the real number field and matching with the word list based on the normalized text features. The text features are subjected to normalization processing, the text features can be converted into a data range which can be processed by a subsequent neural network, the reconstruction accuracy is improved, the subsequent calculation amount can be reduced, and the processing effect is improved.

In a specific possible embodiment, the text classification model includes a normalization layer for performing the step of normalizing, and the two layers of neural networks for performing the steps of mapping to real number domain and matching with vocabulary, respectively.

For example, the step 509 can be implemented by a reconstructor or decoder, which can be shown in fig. 6, the confrontation sample generated by the above step can be a semantic representation Ra601 (i.e. text feature of the confrontation sample) at word level, and the Ra601 is normalized by passing through a Normalization Layer (Layer Norm)602 and a Gaussian Error linear Unit (GeLU) 603 activation functionAnd (6) processing. Next, through two layers of forward neural networks, the first layer of forward neural network (FFN)604 maps Ra from the hidden layer dimension to the word embedding dimension, and the second layer of forward neural network (FFN)605 maps the word embedding dimension to the dimension of the word table size, so that we can obtain a probability distribution on the word table. The final loss calculation is a cross entropy loss function (CELOSs)606, which can calculate the recognition error L from the output of FFN605 and the input sentence word ID (input sense token ids)_R608. The input sentence word ID (input presence token IDs) refers to a text sample for generating the confrontation sample, each word in the text sample is uniquely identified by an Identity (ID), and the output of the FFN605 is also in the form of the word ID. The second layer forward neural network (FFN)605 is capable of sharing parameters with the neural network layer in step 502 (which will be referred to herein first as the word embedding layer).

510. And the electronic equipment acquires a recognition error based on the text recognition result and the text sample.

The recognition error may also be referred to as a reconstruction error, the text recognition result is a predicted value, the text sample is a true value, and the recognition error is used to measure a difference between the predicted value and the true value, which is similar to the step in step 505 above, and may be implemented by using a loss function, which is not described herein in detail. In a specific possible embodiment, the identification error may be obtained by a cross Entropy Loss Function (CE Loss Function).

511. The electronic device updates model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error.

For a text sample, the electronic device obtains three errors, namely a first classification error obtained by the text sample, a second classification error obtained by a countermeasure sample corresponding to the text sample, and an identification error obtained by reconstructing the countermeasure sample. The model parameters are updated by combining three errors, the robustness and the accuracy of the text classification model classification can be considered, and the robustness and the accuracy of the text classification model feature extraction can also be considered, so that the performance of the trained text classification sample in two aspects can be improved.

The updating process combining the three errors can include two modes, and the embodiment of the present application can implement the updating step in any mode. Two alternatives are provided below.

The method one includes that the electronic device obtains a product of the recognition error and the weight of the recognition error, obtains a sum of the product and the first classification error and the second classification error as a total error, and updates model parameters of the text classification model based on the total error.

In the first embodiment, a weight may be set for the recognition error, the weight of the recognition error may be set by a relevant technician as required, the weight of the recognition error may be a hyper-parameter of the text classification model, and may be an empirical value obtained by previously training the text classification model, for example, the weight may be set to 0.1, and in another possible implementation, the weight may also be obtained by updating the model parameter together with the model parameter in the current model training, which is not limited in the embodiment of the present application.

And secondly, weighting the first classification error, the second classification error and the identification error by the electronic equipment based on respective weights of the first classification error, the second classification error and the identification error to obtain a total error, and updating the model parameters of the text classification model based on the total error.

In the second method, each error is provided with a weight, and the setting of the weight is the same as that in the first method, which is not described herein again.

In one possible implementation, the text classification model includes an encoder, a classifier, and a decoder, wherein the encoder is used for feature extraction; the classifier is used for performing classification based on the extracted text features; the decoder is used for recognizing the text features of the confrontation sample and outputting a text recognition result corresponding to the text features.

The structure of the text classification model is different from that of the traditional text classification model, compared with the traditional text classification model, the text classification model comprises a feature extraction layer and a classifier, the feature extraction is carried out on the text classification model through a coder, a decoder is added to decode the text features of the resisting sample to obtain a corresponding text recognition result, the text content can be reconstructed in the decoding process, and therefore whether the character features obtained through feature extraction are accurate enough or not can be analyzed through comparison with the original text sample.

By adding the decoder, the robustness and the accuracy in the aspect of text classification model feature extraction can be estimated, and the robustness and the accuracy in the aspect of text classification model feature extraction are enhanced in the iterative process, so that the robustness and the accuracy of classification and feature extraction of the obtained text classification model are improved.

Optionally, the encoder is configured to perform feature extraction on the word embedding information, and the text classification model further includes a word embedding layer. The word embedding layer is used for mapping the text content to a real number field to obtain word embedding information.

In one possible implementation, the text classification model before training in step 501 may be a pre-trained speech model.

The following presents a brief introduction to the pre-trained language model. Pre-trained language models (BERT models as an example) will typically precede the input single sentence by one [ CLS ]]The sign can be [ CLS ] when the language model is used for text classification task]The hidden state vector at the last layer of the encoder is used as the semantic vector of the whole sentence, and the semantic vector is input into a classifier consisting of a full connection layer and Softmax. Specifically, as shown in FIG. 7, a Single Sentence (Single Sennce) 701 may be input into the BERT model, preceded by [ CLS [/CLS ]]Marking 702, and decomposing the sentence into a plurality of words to obtain Tok1, Tok2, …, and TokN703, wherein Tok is meaning of Token and means words. The BERT model can convert each Token into word-embedded information 704, here denoted as E_[CLS]、E₁、E₂、…、E_NAnd (4) showing. The BERT model can extract the characteristics of word embedded information through a plurality of hidden layers and extract the characteristics of the last hidden layerOutput C, T₁、T₂、…、T _N705 are input to the classifier as text features and the classification is performed by Softmax function 706.

For example, in one specific example, the text classification model may be based on ALBERT, a pre-trained speech model, on which the confrontation sample generation and confrontation sample learning modules are added, for example. The structure of the text classification model can be as shown in fig. 8, first, an input sentence input content 801 (i.e. text sample) is given, a word Embedding layer (Embedding Block)802 of the ALBERT obtains a word Embedding representation Eo (i.e. word Embedding information) thereof, and then a coder (Encoding Block)803 and a Classifier (Classifier)804 of the ALBERT respectively obtain a semantic representation Ro (i.e. text feature) and a classification loss L of the sentence_C(i.e., the first classification error).

Then through the L_CA confrontational sample of text samples is generated. Specifically, can be represented by L_CObtaining candidate countermeasure disturbance P, adding the candidate countermeasure disturbance P into the word embedding representation Eo of the text sample to obtain word embedding representation Ea of the candidate countermeasure sample (namely word embedding information of the candidate countermeasure sample), then obtaining corresponding semantic expression Ra of the word embedding representation Ea of the candidate countermeasure sample through an encoder (EncodingBlock), inputting the semantic expression Ra into a classifier 804 for classification, and then obtaining classification error L of the candidate countermeasure sample_CDetermining new candidate counterdisturbance P, and so on, determining classification error L through multiple iterations_CThe maximum challenge perturbation P is added to Eo, resulting in a word-embedded representation Ea of the challenge sample. In this particular example, a countermeasure sample in the form of word embedding can be generated. Ea can be mapped to the representation space of the model after passing through an encoder (Encoding Block)803 of the ALBERT to obtain a semantic expression Ra (i.e. text features). Optionally, in the iterative process, Ra may be input into a Reconstructor (Reconstructor)805 to be reconstructed and calculate a reconstruction error L in each iterative process_R(i.e., recognition errors).

Now a semantic representation Ra of the word level of the challenge sample has been obtained, this tableThe word embedding model is calculated from the countermeasure samples in the form of word embedding, and does not know which word the word embedding information of each countermeasure sample really corresponds to, if the model cannot do so, the comprehension capability of the whole sentence is greatly reduced, especially when the model understands some keywords wrongly. In the embodiment of the application, the model can restore the original words from the word embedding information of the confrontation sample, and learns to extract more robust syntax and lexical knowledge. Specifically, the semantic expression Ra of the obtained countermeasure sample is input into a Reconstructor (Reconstructor) for reconstruction, and a reconstruction error L is determined according to the reconstructed content and the input sentence801_RBased on the classification error L corresponding to the input sentence and the confrontation sample respectively_CAnd a reconstruction error L_RTo update the model parameters.

In this example, the Reconstructor (Reconstructor)805 is essentially a decoder, and the encoder (Encoding Block)803 and decoder may constitute a self-encoder, which is explained below. As shown in fig. 9, the self-encoder includes an encoder (encoder)901 and a decoder (decoder)902, the encoder 901 can encode an input (input) to obtain a code (code), the decoder 902 is configured to reconstruct the code to obtain an estimated input, and a reconstruction error (error)903 can be obtained based on the input and the estimated input. The self-encoder has two constraints: the dimensions of the hidden layer are much smaller than those of the input layer (vocabulary size) and the goal of the decoder 902 is to minimize the reconstruction error 903. The overall optimization objective can be expressed as:

φ，ψ＝argmin_φ，ψL(X，(ψοφ)X)

where ψ and Φ denote the encoder 901 and the decoder 902, respectively, L is a reconstruction error, and X is an input vector. By (o phi) X is meant that X first passes through the encoder psi and by the decoder phi, argmin essentially refers to the value of the variables (psi and phi) when the reconstruction error 903 reaches a minimum.

Accordingly, the objective function of the text classification model shown in fig. 8 above may be as follows:

where f is a forward (forward) function of the model, θ is a model parameter, LC and LR are a classification loss function and a reconstruction loss function, respectively, v is a text sample, y is a target classification result of the text sample (which may be referred to as a true tag if the target classification result is represented in the form of a tag), D is a data distribution of the text sample, is an anti-perturbation, is a maximum value of a model of the anti-perturbation (which prevents the anti-perturbation from being too large to change the sentence meaning), E is word embedding information, w is reconstruction loss function, v is a text sample, and E is word embedding information_rIs the weight of the reconstruction loss (e.g., set to 0.1). We first get the counterdisturbance by maximizing the classification loss function, add it back to the original word embedding to get the countersample (word embedding form), and then minimize the classification loss and reconstruction loss of the countersample, i.e. the model should both classify the countersample correctly and restore the countersample back to the original sample.

After the reconstructor is trained, experiments show that the text recognition result can be in the form of probability distribution, and the reconstructor can output the probability distribution corresponding to each word position of the sentence, wherein the probability distribution of one word position refers to the probability that the word position is a certain candidate word. In the probability distribution of each word position, the probability of the original word and the synonym of the original word is relatively high, and in the reconstruction process, for some keywords, the synonym of the keywords can be output by the text classification model according to the probability distribution obtained by the reconstruction of the keywords, so that some reconstructed sentences are obtained. Therefore, for an input sentence, a plurality of reconstructed sentences corresponding to the input sentence can be obtained, the sentences are in text forms, the semantics of the sentences can be directly obtained, and the words can not be directly extracted into the word embedding forms of the semantics, so that the interpretability can be provided for the countertraining. And more sentences obtained through the process can be used as samples, and the reconstructed sentences and the original sentences are synthesized, so that more expression modes can be obtained for the same semantic meaning, and the method can be used for data enhancement in the training process.

A specific experimental example is provided below, and the effect of the text classification model training method provided in the present application is exemplarily illustrated by the experimental example.

In this experimental example, we evaluated the above text classification model training method using four data sets. The four data sets were SST-2, Yelp-P, AG's News and Yahoo! Answers. The four data sets and the experimental setup are explained first, and then the text classification model training method provided by the application is analyzed by combining the experimental results.

SST-2: SST (the Stanford sentment Treebank) is an emotion analysis data set issued by Stanford university, which mainly aims at movie reviews to make emotion classification, and belongs to the text classification task of a single sentence. Specifically, SST includes SST-2, SST-5 and the like. Wherein SST-2 is the second class and SST-5 is the fifth class. As can be appreciated, the more classifications, the more detailed the sentimental polarity is. SST-2 is able to predict sentence-level emotions of input text, including positive and negative.

Yelp-P: is a common data set for learning that is derived from comments in the Yelp website. Each review has a rating label from 1 to 5. We can classify it as binary, i.e. choose two scoring labels, and randomly draw 30000 training samples, 1000 validation samples and 1000 test samples from the dataset.

Yahoo! Answers: is a question and answer data set. The data set includes questions and corresponding answers. The data set is sourced from Yahoo! 10 major categories of data in Answers Comprehensive Questions and Answers 1.0, the ten major categories including: society and culture, science and mathematics, health, education and reference, computer and internet, sports, business and finance, entertainment and music, family and relations and politics and governments. Each category contains 140000 training samples and 5000 test samples. In this experimental example, we used five of the categories. For each class, we used 12000 training samples, 400 validation samples, and 400 test samples.

AG's News: is a data set consisting of over 100 million news articles, the news articles in the data set comprise four major categories of world, sports, business and science, each category comprises 30000 training samples and 1900 test samples. In this experimental example we used 15000 training samples, 500 validation samples and test samples.

In this experimental example, the method provided in the examples of the present application was compared with the following methods ALBERT and FreeLB.

ALBERT is used for text classification. For ALBERT, the first token of the sequence is [ CLS ], and when performing the text classification task, ALBERT takes the final hidden state h of the [ CLS ] token as a representation of the entire sentence. The classifier consists of a feedforward layer and a softmax function. The functional expression of the ALBERT may be as follows:

p(c|h)＝softmax(Wh)

where W is a learnable parameter matrix and c is a classification. h is a hidden state. p is the probability of transition from hidden state h to class c. All parameters of ALBERT and W can be fine-tuned together during the training process.

FreeLB: adding the adversarial interference in the output of the ALBERT embedding layer and minimizing the adversarial loss generated around the input sample, it utilizes the "free" training strategy to improve the efficiency of the adversarial training, which makes it possible to apply the PGD (Project Gradient Descence) based adversarial training to the large-scale pre-trained language model.

The experimental setup is explained below.

The method provided by the embodiment of the application is realized on the ALBERT-BASEV2, the parameters of the ALBERT embedding layer and the ALBERT coding layer are loaded from a pre-trained model, and then an experiment is carried out in the adjusting stage. The module was trained using an Adam optimizer with a learning rate set to 1e-5, a batch size of 16 for AG's News, and 32 for the other three datasets. Since the hyper-parameters of FreeLB are highly dependent on the dataset, we perform a hyper-parameter search on each dataset, and the search results are shown in Table 1.

TABLE 1

	SST-2	Yahoo！Answers	Yelp-P	AG’s News
					γ	0.6	0	0.5	0
α	0.1	0.01	0.05	0.01
					ε	0	0	0	0
n	2	3	3	3

As shown in table 1, the hyper-parameters of FreeLB over 4 data sets: step size α, maximum perturbation norm, i.e. maximum of the counterdisturbance, iteration number steps n, initial disturbance and disturbance amplitude γ. These hyper-parameters remain unchanged during the training process.

In this experimental example, we trained the model on two Tesla P40 s. The method provided herein is referred to herein as RAR (Reconstruction of resistance descriptions) where L is_RTo update the parameters of the model from the start of training. In addition, m of YelpP is set to 20000, and m of the other three data sets is set to 16000. τ and M are set to 0.07 and 0.5, respectively.

For SST-2, we used the development set for evaluation. To make the results reliable, we used the same hyperparameter, but used different random seeds for three experiments and reported the average score of the three experiments. For the other three datasets, we used the development set to select the best training checkpoint and evaluate on the test set.

The results of the method and ALBERT, FreeLB provided in the examples of the present application are shown in Table 2.

TABLE 2

	SST-2	Yahoo！Answers	Yelp-P	AG’s News
					ALBERT	92.16	73.93	93.55	89.90
FreeLB	93.23	74.28	93.93	90.85
					RAR	93.73	74.88	94.4	91.75

As shown in table 2, RAR, ALBERT and FreeLB were compared on four data sets. ALBERT is a model without any antagonistic training methods. FreeLB uses categorical losses to learn the examples of confrontation. RAR is implemented based on FreeLB, using additional optimization objectives in the antagonism example.

As shown in table 2, FreeLB and RAR participated in the training of the model for the challenge samples, so they both performed better than ALBERT. These improvements are mainly due to the effect of data augmentation. The experimental results also show that RAR performs better than FreeLB on all four datasets. The method for resisting the gradient-based adversarial attack in the training process is effective and well applied to various text classification data sets. During model training, because competitive goals may encourage the model to discover real underlying knowledge that can determine class labels from antagonism and original performance. This knowledge is very effective for the antagonistic interferences added in the original sample and will not be altered by modifying the statements of the sentence. When the model is able to learn this knowledge, its generalization and robustness will be improved.

The challenge and original samples differ in expression, and table 3 compares euclidean distance and cosine similarity between sentence-level representations of the challenge and original samples in the three methods. We used the AG's News test set to analyze the models trained by the above three methods. For each sample vi, we first calculate its original representation Ri, then under the same hyper-parameter setting, get their antagonistic sample representation Radv-i by k-PGD method, and then measure their distance by cosine similarity and euclidean distance. We also compared the results when different maximum perturbation norms α were used in k-PGD.

TABLE 3

Cosine	α＝1	α＝0.075
			ALBERT	0.851	0.871
FreeLB	0.899	0.918
			RAR	0.926	0.941
Euclidean	α＝1	α＝0.075
			ALBERT	8.409	7.746
FreeLB	6.477	5.776
			RAR	5.121	4.453

The above results are the average of all samples. The experimental result shows that FreeLB and RAR perform much better than ALBERT on cosine similarity and Euclidean distance, which shows that the stability of the model representation space can be effectively improved by optimizing the classification error of the antagonistic sample. Furthermore, FreeLB gave the best performance of RAR compared to RAR. This shows that the method provided by the present application is effective to further improve the robustness of the model representation space.

We used the k-PGD method to attack the AG's News based trained model by three methods. Experimental results show that FreeLB and RAR have good performance and ALBERT is much better. The target operation of the RAR task is sentence-level representation, performing better in terms of representation robustness.

After an RAR model is obtained by training SST-2, a k-PGD method is used for attacking the RAR model. And then obtaining a reconstructed sentence by utilizing the output logic of the RAR module. We can get some text from the resistant sample, which is called the reconstructed sample. The semantics of these reconstructed samples are substantially identical to the semantics of the original samples, but the ALBERT trained model can be successfully fooled into making its classification incorrect. These reconstructed sentences may be used as antagonistic text samples and may further be used as enhancement data.

In this experimental example, we propose a gradient-based antagonism training method RAR to improve the performance and robustness of the text classification model. The key to this approach is to narrow the range of the original and challenge samples in the expression space. RAR forces the model to reconstruct the original marker from its antagonism representation. Experiments prove that the method is superior to ALBERT and FreeLB. The sentence representation and the performance of the model are more robust, and the effectiveness of the method is proved. Furthermore, RARs can be used to generate antagonistic instantiations.

After the text classification model is obtained through training by the method, the text classification model can provide a text classification function. In one possible implementation manner, in response to a text classification instruction, the electronic device can call the text classification model, input a text to be classified into the text classification model, perform feature extraction on the text by using the text classification model, perform classification based on the extracted text features, and output a type to which the text belongs.

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 10 is a schematic structural diagram of a text classification model training apparatus provided in an embodiment of the present application, and referring to fig. 10, the apparatus includes:

the classification module 1001 is configured to perform feature extraction on a text sample and a countermeasure sample of the text sample based on a text classification model, perform classification based on extracted text features, and output predicted classification results of the text sample and the countermeasure sample, where the text sample and the corresponding countermeasure sample both carry the same target classification result;

an obtaining module 1002, configured to obtain a first classification error and a second classification error, where the first classification error is an error between a predicted classification result and a target classification result of the text sample, and the second classification error is an error between a predicted classification result and a target classification result of the countermeasure sample;

the recognition module 1003 is configured to recognize the text features of the countermeasure sample based on the text classification model, and output a text recognition result corresponding to the text features;

the obtaining module 1002 is further configured to obtain a recognition error based on the text recognition result and the text sample;

an updating module 1004 for updating the model parameters of the text classification model based on the first classification error, the second classification error and the recognition error.

In one possible implementation, the identifying module 1003 is configured to:

In one possible implementation, the identification module 1003 is configured to normalize the text features of the confrontation sample based on the text classification model, and perform the steps of mapping to a real number domain and matching with a vocabulary table based on the normalized text features.

In one possible implementation, the update module 1004 is configured to perform any one of:

In one possible implementation, the classification module 1001 includes a classification unit and a generation unit;

the classification unit is used for inputting the text sample into a text classification model, extracting the characteristics of the text sample by the text classification model, classifying the text sample based on the extracted text characteristics, and outputting a prediction classification result of the text sample;

the generation unit is used for generating corresponding countermeasure samples based on the text samples, the prediction classification results and the target classification results of the text samples;

the classification unit is also used for extracting the characteristics of the confrontation sample, classifying the confrontation sample based on the extracted text characteristics and outputting a prediction classification result of the confrontation sample.

In one possible implementation, the classification unit includes a mapping sub-unit and an extraction sub-unit;

the mapping subunit is used for mapping the words contained in the text content of the text sample to a real number field by the text classification model to obtain word embedding information of the text sample;

the adding subunit is configured to add the confrontation perturbation to the text sample to obtain a confrontation sample corresponding to the text sample.

In one possible implementation, the determining subunit is configured to:

acquiring a first classification error of the text sample according to a prediction classification result and a target classification result of the text sample;

obtaining candidate confrontation disturbance of the text sample based on the gradient of the first classification error;

continuously classifying the candidate confrontation sample to obtain a prediction classification result and a target classification result, and obtaining a classification error of the candidate confrontation sample;

and performing feature extraction on the word embedded information of the confrontation sample to obtain the text features of the confrontation sample.

wherein, the encoder is used for feature extraction;

the decoder is used for recognizing the text features of the confrontation sample and outputting a text recognition result corresponding to the text features.

According to the device provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text sample and the countermeasure sample are used for training the text classification model, so that the text classification model learns the classification method for the text added with disturbance, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text features of the confrontation samples extracted during classification, and the confrontation samples are restored to text contents, so that the interpretability of the confrontation training method is improved. The model parameters are trained by combining the error between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and the accuracy of feature extraction of the text classification model are improved.

It should be noted that: in the text classification model training device provided in the above embodiment, when training a text classification model, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution can be completed by different functional modules as needed, that is, the internal structure of the text classification model training device is divided into different functional modules to complete all or part of the above described functions. In addition, the text classification model training device provided in the above embodiment and the text classification model training method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

The electronic device in the above method embodiment can be implemented as a terminal. For example, fig. 11 is a block diagram of a terminal according to an embodiment of the present disclosure. The terminal 1100 can be: the mobile phone comprises a smart phone, a tablet computer, an MP3(Moving picture Experts Group Audio Layer III, motion picture Experts compression standard Audio Layer 3) player, an MP4(Moving picture Experts Group Audio Layer IV, motion picture Experts compression standard Audio Layer 4) player, a notebook computer, a desktop computer, an intelligent robot or a self-service payment device. Terminal 1100 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, terminal 1100 includes: one or more processors 1101 and one or more memories 1102.

Processor 1101 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 can be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1101 can also include a main processor, which is a processor for processing data in the wake state, also referred to as a Central Processing Unit (CPU), and a coprocessor; a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 can be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and rendering content that the display screen needs to display. In some embodiments, the processor 1101 can also include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1102 can include one or more computer-readable storage media, which can be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1101 to implement the text classification model training methods provided by the method embodiments herein.

In some embodiments, the terminal 1100 may further include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 can be connected by a bus or signal lines. Various peripheral devices can be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, display screen 1105, camera assembly 1106, audio circuitry 1107, positioning assembly 1108, and power supply 1109.

The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, the memory 1102 and the peripheral device interface 1103 can be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 is capable of communicating with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1104 can also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1105 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal can be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 can also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1105 can be one, providing the front panel of terminal 1100; in other embodiments, display screens 1105 can be at least two, each disposed on a different surface of terminal 1100 or in a folded design; in other embodiments, display 1105 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1100. Even the display screen 1105 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 1105 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 can also include a flash. The flash lamp can be a monochrome temperature flash lamp and can also be a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and can be used for light compensation under different color temperatures.

The audio circuitry 1107 can include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones can be provided, each at a different location on terminal 1100. The microphone can also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a conventional membrane loudspeaker, but also a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to human, but also the electric signal can be converted into a sound wave inaudible to human for use in distance measurement or the like. In some embodiments, the audio circuitry 1107 can also include a headphone jack.

Positioning component 1108 is used to locate the current geographic position of terminal 1100 for purposes of navigation or LBS (location based Service). The positioning component 1108 can be a positioning component based on the united states GPS (global positioning System), the chinese beidou System, or the russian galileo System.

Power supply 1109 is configured to provide power to various components within terminal 1100. The power supply 1109 can be alternating current, direct current, disposable or rechargeable batteries. When the power supply 1109 includes a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charge technology.

In some embodiments, terminal 1100 can also include one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

The acceleration sensor 1111 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 1100. For example, the acceleration sensor 1111 can be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 can control the display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 can also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 can detect the body direction and the rotation angle of the terminal 1100, and the gyro sensor 1112 can acquire the 3D motion of the user to the terminal 1100 in cooperation with the acceleration sensor 1111. The processor 1101 can implement the following functions according to the data collected by the gyro sensor 1112: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1113 can be positioned on a side bezel of terminal 1100 and/or on an underlying layer of display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the terminal 1100, the holding signal of the user to the terminal 1100 can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 can be provided on the front, back, or side of the terminal 1100. When a physical button or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 can be integrated with the physical button or vendor Logo.

Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 can control the display brightness of the display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the display screen 1105 is reduced. In another embodiment, processor 1101 can also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.

Proximity sensor 1116, also referred to as a distance sensor, is typically disposed on a front panel of terminal 1100. Proximity sensor 1116 is used to capture the distance between the user and the front face of terminal 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 is gradually decreased, the display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state; when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 becomes progressively larger, the display screen 1105 is controlled by the processor 1101 to switch from a breath-screen state to a light-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal 1100, and can include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

The electronic device in the above method embodiment can be implemented as a server. For example, fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1200 may generate a relatively large difference due to a difference in configuration or performance, and can include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one program code, and the at least one program code is loaded and executed by the processors 1201 to implement the text classification model training method provided by the foregoing method embodiments. Certainly, the server can also have components such as a wired or wireless network interface and an input/output interface to facilitate input and output, and the server can also include other components for implementing the functions of the device, which is not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including at least one program code, executable by a processor, is also provided to perform the text classification model training method in the above embodiments. For example, the computer-readable storage medium can be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact disc-Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, in one aspect, a computer program product or a computer program is provided that includes one or more program codes stored in a computer readable storage medium. The one or more program codes can be read from a computer-readable storage medium by one or more processors of the electronic device, and the one or more processors execute the one or more program codes, so that the electronic device can execute the text classification model training method.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program for instructing relevant hardware, and the program can be stored in a computer readable storage medium, and the above mentioned storage medium can be read only memory, magnetic or optical disk, etc.

The above description is intended only to be an alternative embodiment of the present application, and not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training a text classification model, the method comprising:

2. The method of claim 1, wherein the recognizing the text features of the confrontation sample based on the text classification model and outputting the text recognition result corresponding to the text features comprises:

3. The method of claim 2, wherein the text classification model comprises two layers of neural networks, a first layer of neural networks for mapping the text features of the confrontation samples to a real number domain, and a second layer of neural networks for matching the word embedding information to a vocabulary.

4. The method of claim 2, wherein the recognizing the text features of the confrontation sample based on the text classification model and outputting the text recognition result corresponding to the text features comprises:

and normalizing the text features of the confrontation samples based on the text classification model, and executing the steps of mapping to a real number field and matching with a word list based on the text features after normalization processing.

5. The method of claim 1, wherein the updating model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error comprises any one of:

6. The method of claim 1, wherein the performing feature extraction on a text sample and a confrontation sample of the text sample based on a text classification model, classifying based on the extracted text feature, and outputting a result of predictive classification on the text sample and the confrontation sample comprises:

inputting a text sample into a text classification model, extracting features of the text sample by the text classification model, classifying the text sample based on the extracted text features, and outputting a prediction classification result of the text sample;

generating corresponding confrontation samples based on the text samples, the predicted classification results and target classification results of the text samples;

and performing feature extraction on the confrontation sample, classifying based on the extracted text features, and outputting a prediction classification result of the confrontation sample.

7. The method of claim 6, wherein the feature extraction of the text sample by the text classification model comprises:

mapping words contained in the text content of the text sample to a real number field by the text classification model to obtain word embedding information of the text sample;

and performing feature extraction on the word embedded information of the text sample to obtain the text features of the text sample.

8. The method of claim 6, wherein generating corresponding countermeasure samples based on the text sample, the predicted classification result of the text sample, and a target classification result comprises:

determining the counterdisturbance of the text sample based on the predicted classification result and the target classification result of the text sample;

and adding the confrontation disturbance into the text sample to obtain a confrontation sample corresponding to the text sample.

9. The method of claim 8, wherein determining the counterdisturbance of the text sample based on the predicted classification result and the target classification result of the text sample comprises:

10. The method of claim 8, wherein the adding the confrontational disturbance to the text sample to obtain a confrontational sample corresponding to the text sample comprises:

adding the confrontation disturbance into the text content of the text sample to obtain the text content of the confrontation sample corresponding to the text sample;

11. The method of claim 8, wherein the adding the confrontational disturbance to the text sample to obtain a confrontational sample corresponding to the text sample comprises:

adding the countermeasure disturbance into the word embedding information of the text sample to obtain the word embedding information of the countermeasure sample corresponding to the text sample;

12. The method of claim 1, wherein the text classification model comprises an encoder, a classifier, and a decoder;

wherein the encoder is used for feature extraction;

13. An apparatus for training a text classification model, the apparatus comprising:

14. An electronic device, comprising one or more processors and one or more memories having at least one program code stored therein, the at least one program code being loaded and executed by the one or more processors to implement the text classification model training method of any one of claims 1 to 12.

15. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded into and executed by a processor to implement the text classification model training method according to any one of claims 1 to 12.