WO2022022163A1 - 文本分类模型的训练方法、装置、设备及存储介质 - Google Patents

文本分类模型的训练方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022022163A1
WO2022022163A1 PCT/CN2021/101372 CN2021101372W WO2022022163A1 WO 2022022163 A1 WO2022022163 A1 WO 2022022163A1 CN 2021101372 W CN2021101372 W CN 2021101372W WO 2022022163 A1 WO2022022163 A1 WO 2022022163A1
Authority
WO
WIPO (PCT)
Prior art keywords
training sample
semantic representation
sample
training
classification model
Prior art date
Application number
PCT/CN2021/101372
Other languages
English (en)
French (fr)
Inventor
邱耀
张金超
周杰
牛成
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022022163A1 publication Critical patent/WO2022022163A1/zh
Priority to US17/948,348 priority Critical patent/US20230016365A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence and computer technology, and in particular, to the training technology of a text classification model.
  • the input to a text classification model can be a sentence, and the model outputs the category to which the sentence belongs.
  • Traditional text classification models are not very robust, and adding some small perturbations to the input sentence can make the model misclassify.
  • the embodiments of the present application provide a training method, apparatus, device, and storage medium for a text classification model, which can improve the robustness of the text classification model.
  • the technical solution is as follows:
  • a method for training a text classification model is provided, the method is executed by a computer device, and the method includes:
  • the text classification model is trained according to the classification loss and the contrastive loss.
  • an apparatus for training a text classification model is provided, the apparatus is deployed on computer equipment, and the apparatus includes:
  • a training sample acquisition module used to acquire a training sample of a text classification model, where the training sample is text
  • a classification result prediction module configured to determine the semantic representation of the training sample by the text classification model, and determine the predicted classification result of the training sample based on the semantic representation
  • an adversarial sample generation module configured to generate an adversarial sample corresponding to the training sample according to the training sample and the acquired disturbance information
  • a semantic representation generation module configured to determine the semantic representation of the confrontation sample corresponding to the training sample through the text classification model
  • a classification loss generation module configured to determine the classification loss of the text classification model based on the predicted classification result of the training sample
  • a contrast loss generation module configured to determine the contrast loss of the text classification model based on the semantic representation of the training sample and the semantic representation of the confrontation sample corresponding to the training sample;
  • a classification model training module configured to train the text classification model according to the classification loss and the comparison loss.
  • a computer device includes a processor and a memory, and the memory stores at least one instruction, at least one program, a code set or an instruction set, the at least one The instructions, the at least one piece of program, the code set or the instruction set are loaded and executed by the processor to implement the above-mentioned method for training a text classification model.
  • a computer-readable storage medium where at least one instruction, at least one piece of program, code set or instruction set is stored in the readable storage medium, the at least one instruction, the At least one section of program, the code set or the instruction set is loaded and executed by the processor to implement the above text classification model training method.
  • a computer program product or computer program where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above-mentioned training method for a text classification model.
  • the training samples and the obtained disturbance information are used to generate adversarial samples corresponding to the training samples, and then based on the semantics of the training samples and the adversarial samples. It is expressed to calculate the contrast loss of the model, and optimize the model parameters by combining the above classification loss and contrast loss, so as to achieve the purpose of training the text classification model.
  • the training process of the text classification model not only must the training samples and their adversarial samples be correctly classified, but also the semantic representation of the training samples output by the model and the adversarial samples should be as close as possible by calculating the contrast loss, so as to avoid the encoder of the model. Disturbed by perturbation information, this method can not only improve the accuracy and robustness of the classifier, but also improve the robustness of the encoder, thereby achieving the overall improvement of the classification effect and robustness of the text classification model.
  • FIG. 1 is a schematic diagram of a solution implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a training method of a text classification model provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the application of a pre-training model provided by an embodiment of the present application to text classification;
  • FIG. 4 is an architectural diagram of a training method for a text classification model provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of comparative learning provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a text classification method provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a training device for a text classification model provided by an embodiment of the present application.
  • FIG. 8 is a block diagram of a training device for a text classification model provided by another embodiment of the present application.
  • FIG. 9 is a block diagram of a computer device provided by an embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Natural Language Processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones It is believed that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • the solutions provided in the embodiments of the present application relate to artificial intelligence natural language processing and machine learning technologies, use machine learning technologies to train a text classification model, and classify texts through the text classification model.
  • the execution subject of each step may be a computer device, and the computer device refers to an electronic device with data computing, processing and storage capabilities.
  • the computer device may be a terminal such as a PC (Personal Computer, personal computer), a tablet computer, a smart phone, a wearable device, an intelligent robot, etc.; it may also be a server.
  • the server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services.
  • the technical solutions provided in the embodiments of the present application can be used in any product or system that requires text classification functions, such as sentiment analysis systems, yellow counter systems, commodity classification systems, intention classification systems, etc.
  • the technical solutions provided in the embodiments of the present application It can effectively improve the robustness of the text classification model and improve the accuracy of text classification.
  • the system may include a terminal 10 and a server 20 .
  • the terminal 10 may be an electronic device such as a mobile phone, a tablet computer, a PC, a wearable device and the like.
  • the user can access the server 20 through the terminal 10 and perform text classification operations.
  • the client terminal of the target application program can be installed in the terminal 10, and the user can access the server 20 through the client terminal and perform text classification operations.
  • the above-mentioned target application program may be any application program that provides an emotion analysis service, such as an emotion detection application program, an intention recognition application program, and the like, which are not limited in this embodiment of the present application.
  • the server 20 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server providing cloud computing services.
  • the server 20 is used to provide background services for the client of the target application in the terminal 10 .
  • the server 20 may be a background server of the above-mentioned target application (eg, an emotion detection application).
  • the user inputs a piece of voice information (such as "the weather is nice today") in the client of the target application, the client sends the voice information to the server 20, and the server 20 first converts the voice information into text, Then, the text is input into the text classification model as the text to be classified, and the text classification model determines the category to which the text to be classified belongs (eg emotion classification), and outputs the emotion classification (eg happy) corresponding to the text to be classified.
  • a piece of voice information such as "the weather is nice today”
  • the client sends the voice information to the server 20
  • the server 20 first converts the voice information into text
  • the text is input into the text classification model as the text to be classified, and the text classification model determines the category to which the text to be classified belongs (eg emotion classification), and outputs the emotion classification (eg happy) corresponding to the text to be classified.
  • the category to which the text to be classified belongs eg emotion classification
  • the emotion classification eg happy
  • the text classification model may also be deployed in the terminal 10, such as the client of the above target application, which is not limited in this embodiment of the present application.
  • FIG. 2 shows a flowchart of a training method for a text classification model provided by an embodiment of the present application.
  • the method may include the following steps (201-207):
  • step 201 a training sample of a text classification model is obtained, and the training sample is text.
  • a text classification model is a machine learning model used to classify text to be classified.
  • the classification category of the text classification model can be preset, and its classification rules and methods can be optimized through model training.
  • text classification models with different functions can be constructed.
  • a text classification model for judging the target person's mood can be constructed. For example, the text classification model divides the text into two categories: happy and unhappy. The text to be classified is "The weather is so nice today!. The text classification model finds that the text to be classified belongs to the happy category, and the mood of the target person can be judged as Happy.
  • a text classification model for classifying the intent of the target sentence can be constructed.
  • a text classification model for identifying the answer to the target question can be constructed, which is not limited in this application.
  • the training sample is a text
  • the text content is a character string, which includes but is not limited to at least one of text, punctuation, special characters, and the like.
  • the training samples can be Chinese text, such as "I love China", or foreign language text, such as English “I Love China", or a mixture of Chinese and foreign languages, such as "China means China”.
  • the text content of the training sample can be a word, a sentence, a paragraph, an article, and the like.
  • the language type used for identification by the text classification model is not limited, for example, it may be Chinese, English, Japanese, Korean, and the like.
  • the training sample can also be Chinese; when a text classification model for classifying English text is required, the training sample can also be English.
  • Step 202 Determine the semantic representation of the training sample by using the text classification model, and determine the predicted classification result of the training sample based on the semantic representation.
  • semantic representation refers to a carrier used to represent semantics, and the carrier may be a symbol, a graphic, or a number.
  • the carrier is a word embedding available for machine learning.
  • the word embedding (1, 2, 3) is used to represent the Chinese character "I", that is, the semantic representation of "I” is the word embedding (1, 2, 3). 2, 3).
  • Word embedding refers to a form of digital vector representation used to replace words in text, and word embedding is also called word vector.
  • FIG. 3 takes the Bidirectional Encoder Representations from Transformers (BERT) model where the encoder is based on a converter as an example to obtain the input text “Tok 1, Tok 2, ... , Tok N" (Tok N represents the Nth word segment in the input text (such as training sample)), the text classification model adds a [CLS] mark in front of the training sample, and the word embedding extraction unit obtains the word segmentation in the training sample.
  • BERT Bidirectional Encoder Representations from Transformers
  • the predicted classification result refers to the predicted result of the category to which the training sample belongs, which is output by the text classification model.
  • the predicted classification result may be the same as the true classification result of the training sample, or it may be different from the true classification result of the training sample.
  • the true classification result of the training sample is known, indicating the correct classification result of the training sample. For example, if the actual classification result of the training sample is A, and the predicted classification result of the training sample output by the text classification model is A, then the predicted classification result of the text classification model is the same as the real classification result.
  • the text classification model includes a word embedding extraction unit, an encoder, a classifier, and a contrastive loss calculation unit.
  • the word embedding extraction unit is used to extract the word embedding of the training sample
  • the encoder is used to generate the semantic representation of the training sample based on the word embedding of the training sample, and the encoder can also be called a feature extractor, which is used to extract semantic features based on the word embedding.
  • the classifier is used to determine the category to which the training samples belong based on the semantic representation of the training samples
  • the contrastive loss calculation unit is used to calculate the contrastive loss of the text classification model.
  • the training samples are used as the input text for the text classification model.
  • the word embedding extraction unit may be an encoding matrix.
  • the encoder can be a multi-layer neural network.
  • the classifier can be a two-class classifier, a three-class classifier, etc., and its classification category can be designed according to actual needs.
  • the text classification model includes a word embedding extraction unit 41 , an encoder 42 , a classifier 43 , a contrast loss calculation unit 44 and a storage unit 45 .
  • the text classification model obtains a training sample.
  • the text classification model first extracts the word embedding Eo of the training sample through the word embedding extraction unit 41, and then based on the word embedding Eo, generates the semantic representation Ro of the training sample through the encoder 42, and finally passes the classification.
  • the controller 43 determines the predicted classification result of the training sample according to the semantic representation Ro.
  • the introduction and description of the contrast loss calculation unit 44 and the storage unit 45 can be found below.
  • Step 203 generating confrontation samples corresponding to the training samples according to the training samples and the obtained disturbance information.
  • an adversarial sample refers to a sample newly generated by adding disturbance information to a training sample.
  • the added perturbation information is premised on not changing the semantics of the training samples, that is, the adversarial samples are semantically consistent with the training samples.
  • the process of generating adversarial samples includes acquiring word embeddings of the training samples, and then adding perturbation information to the word embeddings of the training samples to obtain processed word embeddings, which are the adversarial samples corresponding to the training samples .
  • the text classification model obtains a training sample, firstly extracts the word embedding Eo of the training sample through the word embedding extraction unit 41, then obtains the disturbance information P based on the classification loss Lc, and adds the disturbance information P to the word embedding Eo,
  • the word embedding Ea is obtained
  • Ea is the word embedding of the adversarial sample.
  • Ra is the semantic representation of the adversarial example.
  • the content of obtaining the disturbance information P may include: calculating the gradient of the loss function of the text classification model with respect to the training sample, and performing disturbance in the direction of the positive gradient of the gradient, and the disturbance is the disturbance information P.
  • perturbation information is added to the word embedding matrix of the training sample, for example, the word embedding of the training sample is "(1, 2, 3) (4, 5, 6) (7, 8, 9)", adding perturbation
  • the processed word embedding obtained after the information is "(1, 2, 4) (4, 5, 7) (7, 8, 9)”.
  • the process of generating the adversarial sample includes adding perturbation information to the text content of the training sample to obtain processed text information, where the processed text information is the adversarial sample corresponding to the training sample.
  • the perturbation information is directly added to the text content of the training sample to obtain an adversarial sample.
  • minor modifications are made to the training samples, such as word order, typos, etc. For example, if the training sample is "I love China", and "I" is modified to the typo “Russia”, the text information after adding perturbation information is: "Russia loves China”.
  • the perturbation information at the word embedding level can be more fine-grained. If the adversarial sample corresponding to the training sample is in the form of word embedding, the adversarial sample can be obtained by the FreeLB algorithm, or can be obtained by the FreeAT algorithm, which is not limited in this application.
  • methods of generating and using adversarial examples include:
  • the method can be based on a white-box attack, which means that the attacker knows all the information about the attacked text classification model, including model structure, loss function, etc.
  • the attack method is to add perturbation information to the word embeddings of the training samples to obtain adversarial samples. This perturbation information is calculated by the gradient of the loss function, which can make the text classification model more prone to errors.
  • the text classification model then optimizes the classification error for adversarial examples. The whole optimization process is expressed by the formula as follows:
  • f is the forward function of the text classification model
  • is the parameter of the text classification model
  • L is the loss function of the text classification model
  • is the disturbance information
  • X is the word embedding of the training sample
  • y is the true category to which the training sample belongs
  • D is the data distribution
  • Z is the word sequence of the training samples.
  • Step 204 Determine the semantic representation of the adversarial sample corresponding to the training sample by using the text classification model.
  • the text classification model obtains the word embedding Ea by adding perturbation information P to the word embedding Eo, and then generates the semantic representation Ra of the above-mentioned confrontation sample through the encoder 42 based on the word embedding Ea of the above-mentioned confrontation sample.
  • the semantic representation of the training sample is a mathematical representation used to characterize the semantic features of the training sample, for example, it is mapped in the representation space of the text classification model in the form of a feature vector.
  • the semantic representation of an adversarial example is a mathematical representation used to characterize the semantic features of an adversarial example, for example, it is mapped in the representation space of a text classification model in the form of a feature vector.
  • Step 205 Determine the classification loss of the text classification model based on the predicted classification result of the training sample.
  • the classification loss function is a function used to measure the difference between the predicted classification result and the true classification result.
  • the predicted classification result of the training sample is obtained by the classifier 43, and then the corresponding classification loss Lc is calculated based on the predicted classification result of the training sample and the actual classification result of the training sample through the classification loss function.
  • the classification loss of the text classification model is determined based on the training samples and the predicted classification results of the adversarial samples. That is, when calculating the classification loss, in addition to the classification loss of training samples, the classification loss of adversarial samples is also considered. For an adversarial sample corresponding to a training sample, the true classification result of the training sample is the same as the true classification result of the adversarial sample. In addition, the predicted classification results of the adversarial examples are obtained by the classifier based on the semantic representation output of the adversarial examples.
  • Step 206 Determine the contrast loss of the text classification model based on the semantic representation of the training sample and the semantic representation of the confrontation sample corresponding to the training sample.
  • the contrast loss is used to indicate the degree of difference between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample, and the difference degree may be represented by distance or similarity.
  • the smaller the value of the contrast loss the smaller the distance between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample; conversely, the larger the value of the contrast loss, the smaller the distance between the semantic representation of the training sample and the corresponding adversarial sample.
  • the distance between the semantic representation and the semantic representation of the adversarial examples corresponding to the training samples is also larger.
  • the text classification model can enhance the robustness of the encoder by minimizing the contrast loss, narrowing the distance between the semantic representation of the training samples and the semantic representation of the adversarial samples corresponding to the training samples, and output high-quality adversarial samples, thereby The classification effect of the classifier is improved, thereby improving the classification effect and robustness of the text classification model.
  • FIG. 5 it shows a schematic diagram of contrastive learning provided by an embodiment of the present application.
  • the ellipse 501 represents the representation space of the text classification model
  • the two circles 502 refer to the semantic representation of two training samples belonging to the same category
  • the triangle 503 refers to the semantic representation of a group of samples belonging to different categories from the circle 502 Representation, by minimizing the contrast loss, reducing the distance between the two circles 502 and increasing the distance between the two circles 502 and the triangle 503, so that the model learns a better semantic representation and avoids interference from interfering information.
  • the text classification model obtains the contrast loss LD calculated by the contrast loss calculation unit 44 .
  • determining the contrastive loss of a text classification model includes the following steps:
  • the first contrastive loss refers to a contrastive loss based on the semantic representation of the adversarial samples corresponding to the training samples.
  • the process of determining the first contrast loss includes: calculating a first similarity, where the first similarity refers to the similarity between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample ; Calculate the second similarity, the second similarity refers to the similarity between the semantic representation of the confrontation sample corresponding to the training sample and the semantic representation of the different sample; Calculate the first contrast loss according to the first similarity and the second similarity .
  • the text classification model generates adversarial samples corresponding to the different samples by adding perturbation information to the word embeddings of the different samples.
  • the text classification model obtains randomly sampled outliers, first extracts the word embeddings of the outliers through the word embedding extraction unit, then obtains the disturbance information by processing the classification loss, and adds the disturbance information to the word embeddings of the outliers to obtain the outliers.
  • the word embedding of the corresponding adversarial sample based on the word embedding of the adversarial sample corresponding to the different sample, the text classification model obtains the semantic representation of the adversarial sample corresponding to the different sample through the encoder.
  • the text classification model can directly add disturbance information to the text of the different samples, so as to obtain the confrontation samples corresponding to the different samples, which is not limited in this application.
  • the above-mentioned second contrastive loss refers to the contrastive loss based on the semantic representation of the training samples.
  • the process of determining the second contrast loss includes: calculating a third similarity, where the third similarity refers to the difference between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample Similarity; calculate the fourth similarity, the fourth similarity refers to the similarity between the semantic representation of the training sample and the semantic representation of the confrontation sample corresponding to the different sample; according to the third similarity and the fourth similarity, calculate the second similarity Contrast loss.
  • the different samples may be preset, or may be obtained from samples belonging to different categories from the training samples when calculating the contrast loss, for example, m samples are randomly sampled and obtained as the different samples. , m is a positive integer.
  • the contrastive loss of the text classification model is the sum of the first contrastive loss described above and the second contrastive loss described above.
  • the calculation process of the contrastive loss of the text classification model can be represented by the following formula:
  • i refers to the ith training sample
  • R i refers to the semantic representation of the training sample
  • R i adv is the semantic representation of the adversarial sample corresponding to the training sample
  • R i,j refers to the jth sample of the ith training sample Semantic representation of different samples
  • R i,j adv refers to the semantic representation of the adversarial sample corresponding to the j-th different sample of the ith training sample
  • m refers to the number of randomly sampled different samples
  • m is a positive integer
  • j is a positive integer less than or equal to m.
  • the final contrastive loss LD is the sum of LD a and LD o .
  • the h ⁇ function is a decision function to calculate the similarity of semantic representation of two texts.
  • h ⁇ ( ⁇ R i adv , R i ⁇ ) is used to determine the similarity between the semantic representation of the adversarial sample corresponding to the training sample and the semantic representation of the training sample, that is, the first similarity mentioned above; h ⁇ ( ⁇ R i adv , R i,j ⁇ ) is used to determine the similarity between the semantic representation of the adversarial sample corresponding to the training sample and the semantic representation of the different sample, that is, the second similarity mentioned above; h ⁇ ( ⁇ R i , R i adv ⁇ ) is used to determine the similarity between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample, that is, the third similarity mentioned above; h ⁇ ( ⁇ R i , R i, j adv ⁇ ) is used to determine the similarity between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the different sample, that is, the fourth similarity
  • the dot product of the vector is used as the similarity score, and a hyperparameter ⁇ is used to adjust the dynamic range of the score, which is formulated as follows:
  • x 1 and x 2 respectively represent two vectors used for similarity calculation.
  • the present application can reduce the consumption of computing resources by using the NCE (Noise-Contrastive Estimation, Noise Contrastive Estimation) method.
  • NCE Noise-Contrastive Estimation, Noise Contrastive Estimation
  • the text classification model includes at least two dynamic buffers, such as a first dynamic buffer and a second dynamic buffer, the first dynamic buffer is used to store the semantic representation of the training samples, and the second dynamic buffer is used to store The semantic representation of the adversarial samples corresponding to the training samples, and the data stored in the first dynamic buffer and the data stored in the second dynamic buffer are dynamically updated.
  • Borig [i] is the set of unit vector numbers that store the semantic representation of the training samples
  • Badv [i] is the set of unit vector numbers that store the semantic representation of the adversarial examples corresponding to the training samples.
  • the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample it is only necessary to input the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample to update the serial number at the time of storage.
  • the content corresponding to the sequence number is extracted from the dynamic buffer.
  • the content stored in Borig [i] is the semantic representation of the ith training sample, and the semantic representation of the ith training sample can be obtained only by extracting the stored content corresponding to Borig [i].
  • Step 207 train the text classification model according to the classification loss and the contrast loss.
  • the training of the text classification model includes determining the total loss of the text classification model according to the classification loss and the comparison loss; adjusting the parameters of the text classification model to minimize the total loss to obtain a trained text classification model.
  • f is the forward function of the text classification model
  • is the parameter of the text classification model
  • LC and LD are the classification loss and contrast loss, respectively
  • v is a training sample
  • y is the true label of the training sample (ie, the true classification result )
  • D is the data distribution
  • is the perturbation information
  • E is the word embedding of the training sample.
  • the disturbance information ⁇ is obtained by maximizing the classification loss, and then it is added to the word embedding of the training sample to obtain the corresponding adversarial sample in the form of word embedding, and then the difference between the classification loss LC and the contrast loss LD of the adversarial sample is minimized. and to adjust the parameters of the text classification model to obtain the trained text classification model. That is, in the training process of the text classification model, not only should the adversarial samples corresponding to the training samples be correctly classified, but also the semantic representation of the adversarial samples corresponding to the training samples should be as similar as possible to the semantic representation of the training
  • the corresponding training samples are also generated through the training samples and the obtained disturbance information. Then, based on the training samples and the semantic representation of the adversarial samples, the contrast loss of the model is calculated, and the above classification loss and contrast loss are combined to optimize the model parameters to achieve the purpose of training the text classification model.
  • the training process of the text classification model not only must the training samples and their adversarial samples be correctly classified, but also the semantic representation of the training samples output by the model and the adversarial samples should be as close as possible by calculating the contrast loss, so as to avoid the encoder of the model. Disturbed by perturbation information, this method can not only improve the accuracy and robustness of the classifier, but also improve the robustness of the encoder, thereby achieving the overall improvement of the classification effect and robustness of the text classification model.
  • this application also enhances the robustness of the encoder and outputs high-quality adversarial samples by minimizing the contrast loss and narrowing the representation distance between the semantic representation of the training samples and the semantic representation of the adversarial samples corresponding to the training samples. , thereby improving the classification effect of the classifier, thereby improving the classification effect and robustness of the text classification model.
  • the training method of the text classification model is described above.
  • the following describes how the text classification model determines the category of the text to be classified:
  • the category to which the text to be classified is determined by a text classification model, referring to FIG. 6 , specifically includes the following steps (601-603):
  • Step 601 acquiring the text to be classified.
  • the category to which the text to be classified belongs is unknown, and there may be one or more texts to be classified.
  • Step 602 extract the semantic representation of the text to be classified by using the text classification model.
  • the above text classification model includes a word embedding extraction unit, an encoder and a classifier; wherein, the word embedding extraction unit is used to extract the word embedding of the text to be classified; the encoder is used to generate the semantic representation of the text to be classified based on the word embedding of the text to be classified ; The classifier is used to determine the category to which the text to be classified belongs based on the semantic representation of the text to be classified. At this time, the text to be classified is used as the input text of the text classification model.
  • the text classification model For example, based on the word embeddings of the text to be classified, the text classification model generates a semantic representation of the text to be classified through an encoder.
  • Step 603 based on the semantic representation of the text to be classified, determine the category to which the text to be classified belongs.
  • the text classification model uses the classifier to determine the category to which the text to be classified belongs.
  • the category to which the text to be classified outputted by the text classification model belongs may have the highest proportion of all categories in the model.
  • the text to be classified is classified by the trained text classification model, so as to determine the category to which the text to be classified belongs.
  • FIG. 7 it shows a block diagram of a text classification model training apparatus provided by an embodiment of the present application.
  • the apparatus has the function of implementing the above method example, and the function may be implemented by hardware or by executing corresponding software by hardware.
  • the apparatus may be the computer equipment described above, or may be provided in the computer equipment.
  • the apparatus 700 includes: a training sample acquisition module 701, a classification result prediction module 702, an adversarial sample generation module 703, a semantic representation generation module 704, a classification loss generation module 705, a contrast loss generation module 706, and classification model training Module 707.
  • the training sample obtaining module 701 is configured to obtain a training sample of a text classification model, where the training sample is text.
  • the classification result prediction module 702 is configured to determine the semantic representation of the training sample through the text classification model, and determine the predicted classification result of the training sample based on the semantic representation.
  • the confrontation sample generation module 703 is configured to generate confrontation samples corresponding to the training samples according to the training samples and the obtained disturbance information.
  • the semantic representation generation module 704 is configured to determine the semantic representation of the adversarial sample corresponding to the training sample by using the text classification model.
  • the classification loss generating module 705 is configured to determine the classification loss of the text classification model based on the predicted classification result of the training sample.
  • the contrast loss generating module 706 is configured to determine the contrast loss of the text classification model based on the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample.
  • the classification model training module 707 is configured to train the text classification model according to the classification loss and the comparison loss.
  • the contrast loss generation module 706 includes: a first loss determination unit 706a, a second loss determination unit 706b, and a contrast loss determination unit 706c.
  • a first loss determining unit 706a configured to determine a first contrast loss based on the semantic representation of the training sample, the semantic representation of the adversarial sample corresponding to the training sample, and the semantic representation of the different samples; wherein the different samples are Refers to samples belonging to a different class from the training samples.
  • the second loss determining unit 706b is configured to determine a second contrast loss based on the semantic representation of the training sample, the semantic representation of the adversarial sample corresponding to the training sample, and the semantic representation of the adversarial sample corresponding to the different sample.
  • a contrastive loss determining unit 706c configured to determine a contrastive loss of the text classification model according to the first contrastive loss and the second contrastive loss.
  • the first loss determination unit 706a is configured to:
  • the first similarity refers to the similarity between the semantic representation of the training sample and the semantic representation of the confrontation sample corresponding to the training sample
  • the second similarity refers to the similarity between the semantic representation of the adversarial sample corresponding to the training sample and the semantic representation of the different sample
  • the first contrast loss is calculated according to the first similarity and the second similarity.
  • the second loss determining unit 706b is configured to:
  • the third similarity refers to the similarity between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the training sample
  • the fourth similarity refers to the similarity between the semantic representation of the training sample and the semantic representation of the adversarial sample corresponding to the different sample
  • the second contrast loss is calculated according to the third similarity and the fourth similarity.
  • the first loss determination unit 706a or the second loss determination unit 706b is further configured to:
  • the different samples are obtained by randomly sampling m samples from samples belonging to different categories from the training samples, where m is a positive integer.
  • the classification model training module 707 is used to:
  • the parameters of the text classification model are adjusted to minimize the total loss, and the trained text classification model is obtained.
  • the adversarial sample generation module 703 is used to:
  • the processed word embeddings are used as adversarial samples corresponding to the training samples.
  • the adversarial sample generation module 703 is used to:
  • the processed text information is used as a confrontation sample corresponding to the training sample.
  • the apparatus 700 further includes: a buffer creation module 708 and a buffer update module 709 .
  • the buffer creation module 708 is configured to create a first buffer and a second buffer; wherein the first buffer is used to store the semantic representation of the training samples, and the second buffer is used to store the training sample The semantic representation of the adversarial example corresponding to the sample.
  • the buffer updating module 709 is configured to dynamically update the data stored in the first buffer.
  • the buffer updating module 709 is further configured to dynamically update the data stored in the second buffer.
  • the text classification model includes a word embedding extraction unit, an encoder, a classifier, and a contrastive loss calculation unit.
  • the word embedding extraction unit is used to extract the word embedding of the input text.
  • the encoder is configured to generate a semantic representation of the input text based on the word embeddings of the input text.
  • the classifier is configured to determine a category to which the input text belongs based on the semantic representation of the input text.
  • the contrastive loss calculation unit is configured to calculate the contrastive loss of the text classification model.
  • the corresponding training samples are also generated through the training samples and the obtained disturbance information. Then, based on the training samples and the semantic representation of the adversarial samples, the contrast loss of the model is calculated, and the above classification loss and contrast loss are combined to optimize the model parameters to achieve the purpose of training the text classification model.
  • the training process of the text classification model not only must the training samples and their adversarial samples be correctly classified, but also the semantic representation of the training samples output by the model and the adversarial samples should be as close as possible by calculating the contrast loss, so as to avoid the encoder of the model. Disturbed by perturbation information, this method can not only improve the accuracy and robustness of the classifier, but also improve the robustness of the encoder, thereby achieving the overall improvement of the classification effect and robustness of the text classification model.
  • FIG. 9 shows a structural block diagram of a computer device provided by an embodiment of the present application.
  • the computer device can be used to implement the training method of the text classification model provided in the above embodiments. Specifically:
  • the computer device 900 includes a processing unit (such as a CPU (Central Processing Unit, central processing unit), a GPU (Graphics Processing Unit, graphics processing unit), and an FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array, etc.) 901, including RAM (Random-Access Memory, random access memory) 902 and ROM (Read-Only Memory, read-only memory) 903 system memory 904, and a system bus 905 connecting the system memory 904 and the central processing unit 901.
  • a processing unit such as a CPU (Central Processing Unit, central processing unit), a GPU (Graphics Processing Unit, graphics processing unit), and an FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array, etc.) 901, including RAM (Random-Access Memory, random access memory) 902 and ROM (Read-Only Memory, read-only memory) 903 system memory 904, and a system bus 905 connecting the system memory 904 and the central processing unit 901.
  • the computer device 900 also includes a basic input/output system (I/O system) 906 that assists in transferring information between various devices within the server, and a basic input/output system (I/O system) 906 for storing an operating system 913, application programs 914 and other program modules 915
  • the basic input/output system 906 includes a display 908 for displaying information and input devices 909 such as a mouse, keyboard, etc., for user input of information.
  • the display 908 and the input device 909 are both connected to the central processing unit 901 through the input and output controller 910 connected to the system bus 905 .
  • the basic input/output system 906 may also include an input output controller 910 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
  • input output controller 910 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905 .
  • the mass storage device 907 and its associated computer-readable media provide non-volatile storage for the computer device 900 . That is, the mass storage device 907 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • Computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, Electrically Erasable Programmable Read-Only Memory), flash memory or Other solid-state storage technologies, CD-ROM, DVD (Digital Video Disc, high-density digital video disc) or other optical storage, cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • the system memory 904 and the mass storage device 907 described above may be collectively referred to as memory.
  • the computer device 900 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 900 can be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or can also use the network interface unit 911 to connect to other types of networks or remote computer systems (not shown) .
  • the memory also includes a computer program stored in the memory and configured to be executed by one or more processors to implement the above-described method of training a text classification model.
  • a computer-readable storage medium stores at least one instruction, at least one segment of a program, a code set or an instruction set, the at least one instruction, the at least one segment
  • the program, the code set or the instruction set when executed by the processor, implements the above-mentioned training method of the text classification model.
  • the computer-readable storage medium may include: ROM (Read-Only Memory, read-only memory), RAM (Random-Access Memory, random access memory), SSD (Solid State Drives, solid-state hard disk), or an optical disk.
  • the random access memory may include ReRAM (Resistance Random Access Memory, resistive random access memory) and DRAM (Dynamic Random Access Memory, dynamic random access memory).
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above-described training method for a text classification model.
  • references herein to "a plurality” means two or more.
  • "And/or" which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character "/" generally indicates that the associated objects are an "or” relationship.
  • the numbering of the steps described in this document only exemplarily shows a possible execution sequence between the steps. In some other embodiments, the above steps may also be executed in different order, such as two different numbers. The steps are performed at the same time, or two steps with different numbers are performed in a reverse order to that shown in the figure, which is not limited in this embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种文本分类模型的训练方法、装置、设备及存储介质,涉及人工智能和计算机技术领域。该方法通过文本分类模型确定训练样本的语义表示,以及基于语义表示确定训练样本的预测分类结果;根据训练样本和扰动信息生成训练样本对应的对抗样本;通过文本分类模型确定训练样本对应的对抗样本的语义表示;基于训练样本的预测分类结果确定文本分类模型的分类损失;基于训练样本的语义表示和训练样本对应的对抗样本的语义表示,确定文本分类模型的对比损失;根据分类损失和对比损失,对文本分类模型进行训练。本申请提高分类器的准确性和鲁棒性的同时,也提高了编码器的鲁棒性,从而实现了文本分类模型的分类效果和鲁棒性的整体提高。

Description

文本分类模型的训练方法、装置、设备及存储介质
本申请要求于2020年7月30日提交中国专利局、申请号202010753159.6、申请名称为“文本分类模型的训练方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能和计算机技术领域,特别涉及文本分类模型的训练技术。
背景技术
随着人工智能技术在文本分类模型方面的研究和进步,适用于文本分类模型的训练方法也是越来越多。
文本分类模型的输入可以是一个句子,然后该模型输出该句子所属的类别。传统的文本分类模型不具备很强的鲁棒性,在输入句子上添加一些很小的扰动就可以让模型分类错误。
发明内容
本申请实施例提供了一种文本分类模型的训练方法、装置、设备及存储介质,能够提升文本分类模型的鲁棒性。所述技术方案如下:
根据本申请实施例的一个方面,提供了一种文本分类模型的训练方法,所述方法由计算机设备执行,所述方法包括:
获取文本分类模型的训练样本,所述训练样本为文本;
通过所述文本分类模型确定所述训练样本的语义表示,以及基于所述语义表示确定所述训练样本的预测分类结果;
根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本;
通过所述文本分类模型确定所述训练样本对应的对抗样本的语义表示;
基于所述训练样本的预测分类结果确定所述文本分类模型的分类损失;
基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失;
根据所述分类损失和所述对比损失,对所述文本分类模型进行训练。
根据本申请实施例的一个方面,提供了一种文本分类模型的训练装置,所述装置部署在计算机设备上,所述装置包括:
训练样本获取模块,用于获取文本分类模型的训练样本,所述训练样本为文本;
分类结果预测模块,用于通过所述文本分类模型确定所述训练样本的语义表示,以及基于所述语义表示确定所述训练样本的预测分类结果;
对抗样本生成模块,用于根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本;
语义表示生成模块,用于通过所述文本分类模型确定所述训练样本对应的对抗样本的语义表示;
分类损失生成模块,用于基于所述训练样本的预测分类结果确定所述文本分类模型的分类损失;
对比损失生成模块,用于基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失;
分类模型训练模块,用于根据所述分类损失和所述对比损失,对所述文本分类模型进行训练。
根据本申请实施例的一个方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述文本分类模型的训练方法。
根据本申请实施例的一个方面,提供了一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述文本分类模型的训练方法。
根据本申请实施例的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述文本分类模型的训练方法。
本申请实施例提供的技术方案可以包括如下有益效果:
在对文本分类模型进行训练的过程中,除了计算文本分类模型的分类损失之外,还通过训练样本和获取到的扰动信息生成训练样本对应的对抗样本,进而基于训练样本及其对抗样本的语义表示来计算模型的对比损失,综合上述分类损失和对比损失优化模型参数,以达到训练文本分类模型的目的。这样,在文本分类模型的训练过程中,不仅要正确分类训练样本及其对抗样本,还要通过计算对比损失让模型输出的训练样本与其对抗样本的语义表示尽可能地接近,避免模型的编码器被扰动信息所干扰,该方法不仅能够提高分类器的准确性和鲁棒性,同时还会提高编码器的鲁棒性,从而实现了文本分类模型的分类效果和鲁棒性的整体提高。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的方案实施环境的示意图;
图2是本申请一个实施例提供的文本分类模型的训练方法的流程图;
图3是本申请一个实施例提供的预训练模型在文本分类上应用的示意图;
图4是本申请一个实施例提供的文本分类模型的训练方法的架构图;
图5是本申请一个实施例提供的对比学习的示意图;
图6是本申请一个实施例提供的文本分类方法的流程图;
图7是本申请一个实施例提供的文本分类模型的训练装置的框图;
图8是本申请另一个实施例提供的文本分类模型的训练装置的框图;
图9是本申请一个实施例提供的计算机设备的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互***、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
自然语言处理(Nature Language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的自然语言处理和机器学习技术,利用机器学习技术训练文本分类模型,通过该文本分类模型对文本进行分类。
本申请实施例提供的方法,各步骤的执行主体可以是计算机设备,该计算机设备是指具备数据计算、处理和存储能力的电子设备。该计算机设备可以是诸如PC(Personal Computer,个人计算机)、平板电脑、智能手机、可穿戴设备、智能机器人等终端;也可以是服务器。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云计算服务的云服务器。
本申请实施例提供的技术方案,可以被使用在任何需要文本分类功能的产品或***中,比如情感分析***、黄反***、商品分类***、意图分类***等,本申请实施例提供的技术方案能够有效提升文本分类模型的鲁棒性,并提升文本分类的准确率。
在一个示例中,如图1所示,以情感分析***为例,该***可以包括终端10和服务器20。
终端10可以是诸如手机、平板电脑、PC、可穿戴设备等电子设备。用户可以通过终端10接入服务器20,并进行文本分类操作。例如,终端10中可以安装目标应用程序的客户端,用户可以通过该客户端接入服务器20,并进行文本分类操作。上述目标应用程序可以是任何提供情感分析服务的应用程序,如情绪检测类应用程序、意图识别类应用程序等,本申请实施例对此不做限定。
服务器20可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云计算服务的云服务器。服务器20用于为终端10中的目标应用程序的客户端提供后台服务。例如,服务器20可以是上述目标应用程序(如情绪检测类应用程序)的后台服务器。
终端10和服务器20之间可以通过网络进行通信。
示例性地,用户在目标应用程序的客户端中输入了一条语音信息(如“今天天气真好”),客户端将该语音信息发送给服务器20,服务器20先将该语音信息转换为文本,然后将该文本作为待分类文本输入至文本分类模型,由该文本分类模型确定该待分类文本所属的类别(例如情绪分类),输出该待分类文本对应的情绪分类(如高兴)。
当然,在一些其他示例中,文本分类模型也可以部署在终端10中,如上述目标应用程序的客户端中,本申请实施例对此不做限定。
请参考图2,其示出了本申请一个实施例提供的文本分类模型的训练方法的流程图。该方法可以包括如下几个步骤(201~207):
步骤201,获取文本分类模型的训练样本,训练样本为文本。
文本分类模型是用于对待分类文本进行分类的机器学习模型。文本分类模型的分类类别可以预先被设定,其分类规则和方法可通过模型训练进行优化。
可以理解的是,在不同的应用场景中,可以构建不同功能的文本分类模型。在情感分析场景中,可以构建用于判断目标人物的心情的文本分类模型。例如,文本分类模型将文本分为高兴和不高兴两种类别,待分类文本为“今天天气真好!”,该文本分类模型得出待分类文本属于高兴类别,则可判断目标人物的心情为高兴。在意图分类场景中,可以构 建用于分类目标语句的意图的文本分类模型。在问答匹配场景中,可以构建用于识别目标问题的答案的文本分类模型,本申请在此不做限定。
训练样本是一个文本,文本内容是字符串,其包括但不限于文字、标点符号、特殊字符等中的至少一种。训练样本可以是中文文本,例如“我爱中国”,也可以是外语文本,例如英语“I Love China”,还可以是中文与外语的混合,例如“China是中国的意思”。训练样本的文本内容可以是一个词语,也可以是一个句子,其还可以是段落及文章等。在本申请实施例中,对文本分类模型所用于识别的语言类型不做限定,如其可以是中文、英语、日语、韩语等。例如,当需要用于对中文文本进行分类的文本分类模型时,训练样本也可以选用中文;当需要用于对英文文本进行分类的文本分类模型时,训练样本也可以选用英语。
步骤202,通过文本分类模型确定训练样本的语义表示,以及基于语义表示确定训练样本的预测分类结果。
在本申请实施例中,语义表示是指用于表示语义的载体,该载体可以是一个符号,也可以是一个图形,还可以是数字。在本申请实施例中,该载体是可供机器学习的词嵌入,例如:使用词嵌入(1,2,3)表示中文字“我”,即“我”的语义表示为词嵌入(1,2,3)。词嵌入(word embedding)是指用于替代文本中词的一种数字向量表现形式,词嵌入也称为词向量。
在示例性实施例中,参考图3,图3以编码器是基于转换器的双向编码表征(Bidirectional Encoder Representations from Transformers,BERT)模型为例,获取输入文本“Tok 1,Tok 2,...,Tok N”(Tok N表示输入文本(例如训练样本)中的第N个分词),文本分类模型在训练样本的前面添加一个[CLS]标志,经过词嵌入提取单元得到训练样本中各个分词的词向量“E (CLS),E 1,E 2,...,E n”,然后通过编码器得到相应的语义向量“C,T 1,T 2,...,T N”,并取[CLS]在编码器最后一层的隐状态向量C作为整个样本的语义向量,即得到上述训练样本的语义表示,然后将其输入到一个由全连接层和Softmax函数组成的分类器中。
预测分类结果是指由文本分类模型输出的该训练样本所属类别的预测结果。该预测分类结果可能与训练样本的真实分类结果相同,也可能与训练样本的真实分类结果不同。其中训练样本的真实分类结果是已知的,表示该训练样本正确的分类结果。例如,训练样本的真实分类结果为A,文本分类模型输出的该训练样本的预测分类结果是A,则文本分类模型的预测分类结果与真实分类结果相同。
在一个示例中,文本分类模型包括词嵌入提取单元、编码器、分类器和对比损失计算单元。其中,词嵌入提取单元用于提取训练样本的词嵌入;编码器用于基于训练样本的词嵌入,生成训练样本的语义表示,编码器也可称为特征提取器,用于基于词嵌入提取语义特征信息;分类器用于基于训练样本的语义表示,确定训练样本所属的类别;对比损失计算单元用于计算文本分类模型的对比损失。此时,训练样本作为文本分类模型的输入文本。
在一种可能的实现方式中,词嵌入提取单元可以是一个编码矩阵。例如one-hot编码矩阵。编码器可以是一个多层的神经网络。分类器可以是二分类器、三分类器等,其分类类别可以根据实际需求进行设计。
参考图4,文本分类模型包括词嵌入提取单元41、编码器42、分类器43、对比损失计算单元44和存储单元45。文本分类模型获取一个训练样本,文本分类模型先通过词嵌入提取单元41提取该训练样本的词嵌入Eo,然后基于该词嵌入Eo,通过编码器42生成该训练样本的语义表示Ro,最后通过分类器43根据该语义表示Ro确定该训练样本的预测分类结果。有关对比损失计算单元44和存储单元45的介绍说明可参见下文。
步骤203,根据训练样本和获取到的扰动信息生成训练样本对应的对抗样本。
在本申请实施例中,对抗样本是指对训练样本添加扰动信息而新生成的样本。添加的扰动信息以不改变训练样本的语义为前提,即该对抗样本与训练样本在语义上保持一致。
在一个示例中,对抗样本的生成过程包括获取训练样本的词嵌入,然后给训练样本的词嵌入添加扰动信息,得到处理后的词嵌入,该处理后的词嵌入即为训练样本对应的对抗样本。
例如,参考图4,文本分类模型获取一个训练样本,首先通过词嵌入提取单元41提取训练样本的词嵌入Eo,然后基于分类损失Lc得到扰动信息P,并且将该扰动信息P加入词嵌入Eo,从而得到词嵌入Ea,Ea即为对抗样本的词嵌入。其中,Ra为该对抗样本的语义表示。在本申请中,得到扰动信息P的内容可以包括:计算文本分类模型的损失函数对于训练样本的梯度,在该梯度的正梯度方向进行扰动,该扰动即为上述扰动信息P。
示例性地,在训练样本的词嵌入矩阵中添加扰动信息,例如,训练样本的词嵌入为“(1,2,3)(4,5,6)(7,8,9)”,添加扰动信息后得到的处理后的词嵌入为“(1,2,4)(4,5,7)(7,8,9)”。
在另一个示例中,对抗样本的生成过程包括给训练样本的文本内容添加扰动信息,得到处理后的文本信息,该处理后的文本信息即为训练样本对应的对抗样本。例如,文本分类模型获取一个训练样本,则直接在训练样本的文本内容中添加扰动信息,从而得到对抗样本。示例性地,对训练样本进行诸如字序、错别字等类别的微小修改,例如,训练样本为“我爱中国”,将“我”修改为错别字“俄”,则添加扰动信息后的文本信息为“俄爱中国”。
上文提供了在词嵌入级别和文本级别添加扰动信息,以生成对抗样本。相比于文本级别的扰动信息,词嵌入级别的扰动信息可以更加地细粒度。如果训练样本对应的对抗样本是词嵌入形式的,该对抗样本可以由FreeLB算法得到,还可以是由FreeAT算法得到,本申请在此不做限定。
在示例性实施例中,对抗样本的生成和使用方法包括:
该方法可以是基于白盒攻击的,白盒攻击是指攻击者知道关于被攻击文本分类模型的所有信息,包括模型结构、损失函数等。攻击的方法是给训练样本的词嵌入加扰动信息,从而得到对抗样本,这个扰动信息由损失函数的梯度计算得来,可以让文本分类模型更容易犯错。然后文本分类模型去优化对于对抗样本的分类误差。整个优化过程用公式表示如下:
Figure PCTCN2021101372-appb-000001
其中,f是文本分类模型的forward函数,θ是文本分类模型的参数,L是文本分类模型的损失函数,δ是扰动信息,X是训练样本的词嵌入,y是训练样本所属的真实类别,D是数据分布,Z是训练样本的单词序列。
步骤204,通过文本分类模型确定训练样本对应的对抗样本的语义表示。
参考图4,文本分类模型通过对词嵌入Eo添加扰动信息P,从而得到词嵌入Ea,然后基于上述对抗样本的词嵌入Ea,通过编码器42生成上述对抗样本的语义表示Ra。
在本申请实施例中,训练样本的语义表示是用于表征训练样本的语义特征的数学表示,例如其以特征向量形式被映射在文本分类模型的表示空间里。对抗样本的语义表示是用于表征对抗样本的语义特征的数学表示,例如其以特征向量形式被映射在文本分类模型的表示空间里。
步骤205,基于训练样本的预测分类结果确定文本分类模型的分类损失。
分类损失函数是用于衡量预测分类结果与真实分类结果之间差异的函数。一般情况下,该函数的值越小,表明预测分类结果与真实分类结果越接近,模型的精度也就越高;反之,该函数的值越大,表明预测分类结果与真实分类结果越不接近,模型的精度也就越低。
参考图4,通过分类器43得到训练样本的预测分类结果,再通过分类损失函数基于训练样本的预测分类结果和训练样本的真实分类结果,计算相应的分类损失Lc。
在一个示例中,基于训练样本以及对抗样本的预测分类结果确定文本分类模型的分类损失。也即,在计算分类损失时,除了考虑训练样本的分类损失之外,还结合考虑对抗样本的分类损失。对于某一个训练样本对应的对抗样本来说,该训练样本的真实分类结果与该对抗样本的真实分类结果相同。另外,对抗样本的预测分类结果由分类器基于对抗样本的语义表示输出得到。
步骤206,基于训练样本的语义表示和训练样本对应的对抗样本的语义表示,确定文本分类模型的对比损失。
在本申请实施例中,对比损失用于指示训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的差异程度,该差异程度可以采用距离或相似度来表示。一般情况下,该对比损失的值越小,表示训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的距离也越小;反之,该对比损失的值越大,表示训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的距离也越大。文本分类模型可以通过最小化对比损失,拉近训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的距离,来增强编码器的鲁棒性,并输出高质量的对抗样本,从而提高了分类器的分类效果,进而提高文本分类模型的分类效果和鲁棒性。
如图5所示,其出示了本申请一个实施例提供的对比学习的示意图。其中,椭圆501代表着文本分类模型的表示空间,两个圆圈502指的是属于同一个类别的两个训练样本的语义表示,三角形503指的是和圆圈502属于不同类别的一组样本的语义表示,通过最小化对比损失,减小两个圆圈502之间的距离,同时增大两个圆圈502与三角形503的距离,使得模型学习到更好的语义表示,避免受到干扰信息的干扰。
在示例性实施例中,参考图4,基于训练样本的语义表示Ro和训练样本对应的对抗样本的语义表示Ra,文本分类模型通过对比损失计算单元44计算得到对比损失L D
在一个示例中,确定文本分类模型的对比损失,包括如下步骤:
1、基于训练样本的语义表示、训练样本对应的对抗样本的语义表示,以及异样本的语义表示,确定第一对比损失;其中,异样本是指与训练样本属于不同类别的样本。
在本申请实施例中,第一对比损失是指基于训练样本对应的对抗样本的语义表示的对比损失。
在一种可能的实现方式中,确定第一对比损失的过程包括:计算第一相似度,第一相似度是指训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的相似度;计算第二相似度,第二相似度是指练样本对应的对抗样本的语义表示和异样本的语义表示之间的相似度;根据第一相似度和第二相似度,计算第一对比损失。
2、基于训练样本的语义表示、训练样本对应的对抗样本的语义表示,以及异样本对应的对抗样本的语义表示,确定第二对比损失。
在本申请实施例中,文本分类模型通过对异样本的词嵌入添加扰动信息,从而生成异样本对应的对抗样本。例如,文本分类模型获取随机采样的异样本,首先通过词嵌入提取单元提取异样本的词嵌入,然后通过处理分类损失得到扰动信息,并将该扰动信息加入该异样本的词嵌入,从而得到异样本对应的对抗样本的词嵌入,基于异样本对应的对抗样本的词嵌入,文本分类模型通过编码器得到异样本对应的对抗样本的语义表示。文本分类模型可以通过直接对异样本的文本添加扰动信息,从而得到异样本对应的对抗样本,本申请在此不做限定。
上述第二对比损失是指基于训练样本的语义表示的对比损失。
在一种可能的实现方式中,确定第二对比损失的过程包括:计算第三相似度,第三相似度是指所述训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的相似度;计算第四相似度,第四相似度是指训练样本的语义表示和异样本对应的对抗样本的语义表示之间的相似度;根据第三相似度和第四相似度,计算第二对比损失。
需要说明的是,在本实施例中异样本可以是预先设置好的,也可以是在计算对比损失时从与训练样本属于不同类别的样本中获取的,例如随机采样获取m个样本作为异样本,m为正整数。
3、根据第一对比损失和第二对比损失,确定文本分类模型的对比损失。
在一些情况下,文本分类模型的对比损失为上述第一对比损失与上述第二对比损失之和。
在示例性实施例中,文本分类模型的对比损失的计算过程可以由如下公式表示:
Figure PCTCN2021101372-appb-000002
Figure PCTCN2021101372-appb-000003
L D=L D o+L D a
其中,i是指第i个训练样本,R i是指训练样本的语义表示,R i adv是训练样本对应的对抗样本的语义表示,R i,j是指第i个训练样本的第j个异样本的语义表示,R i,j adv是指第i个训练样本的第j个异样本对应的对抗样本的语义表示,m是指随机采样的异样本的数量,m为正整数,可以对m设定上限阀值,如此可以限定异样本的数量,减少异样本范围,j为小于或等于m的正整数。L D a是基于训练样本对应的对抗样本的语义表示的对比损失(即上述第一对比损失),它用于拉近R i adv和R i的距离,同时让R i adv和R i与R i,j的距离变大,该损失会在一个集合S adv={R i adv,R i,R i,1,…,R i,m}上计算。同理,L D o是基于训练样本的语义表示的对比损失(即上述第二对比损失),它用于拉近R i和R i adv的距离,同时让R i和R i adv与R i,j adv的距离变大,该损失会在一个集合S adv={R i adv,R i,R i,1 adv,…,R i,m adv}上计算。最终的对比损失L D是L D a和L D o的和。
其中,h θ函数是计算两个文本的语义表示相似度的判定函数。一般情况下,两个文本的相似度越高,h θ函数的输出就越大;反之,两个文本的相似度越低,h θ的输出就越小。h θ({R i adv,R i})用于确定训练样本对应的对抗样本的语义表示和训练样本的语义表示之间的相似度,即上述提及的第一相似度;h θ({R i adv,R i,j})用于确定训练样本对应的对抗样本的语义表示和异样本的语义表示之间的相似度,即上述提及的第二相似度;h θ({R i,R i adv})用于确定训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的相似度,即上述提及的第三相似度;h θ({R i,R i,j adv})用于确定训练样本的语义表示和异样本对应的对抗样本的语义表示之间的相似度,即上述提及的第四相似度。
对于函数h θ,使用向量的点乘结果作为相似度得分,然后使用一个超参数τ来调节该分数的动态范围,其公式如下:
Figure PCTCN2021101372-appb-000004
其中,x 1和x 2分别表示两个用于进行相似度计算的向量。
在一些情况下,本申请可以通过使用NCE(Noise-Contrastive Estimation,噪声对比估计)方法来减少计算资源的损耗。
由于文本分类模型需要得到的训练样本的语义表示和训练样本对应的对抗样本的语义表示的数量庞大,其通常可以达到10000到20000。因此在本申请中,文本分类模型至少包括两个动态缓冲器,例如第一动态缓冲器和第二动态缓冲器,第一动态缓冲器用于存储训练样本的语义表示,第二动态缓冲器用于存储训练样本对应的对抗样本的语义表示,并 动态更新第一动态缓冲器中存储的数据和第二动态缓冲器中存储的数据,以下为动态缓冲器的更新公式:
B orig[i]=M*B orig[i]+(1-M)*R i
B adv[i]=M*B adv[i]+(1-M)*R i adv
其中M是动量,一个超参数,B orig[i]是存储训练样本的语义表示的单位向量数集,B adv[i]是存储训练样本对应的对抗样本的语义表示的单位向量数集。R i和R i adv的每次计算都会被动态更新到B orig[i]和B adv[i]对应位置上。
当需要使用训练样本的语义表示和训练样本对应的对抗样本的语义表示时,只需输入训练样本的语义表示和训练样本对应的对抗样本的语义表示更新存储时的序列号,便可直接从对应的动态缓冲器中提取序列号对应的内容。例如,B orig[i]中存储的内容即是第i个训练样本语义表示,只需提取B orig[i]对应的存储内容即可获得第i个训练样本的语义表示。
步骤207,根据分类损失和对比损失,对文本分类模型进行训练。
在一个示例中,文本分类模型训练包括根据分类损失和对比损失,确定文本分类模型的总损失;调整文本分类模型的参数,以最小化该总损失,得到完成训练的文本分类模型。
上述文本分类模型训练过程可以用如下公式表示:
Figure PCTCN2021101372-appb-000005
其中,f是文本分类模型的forward函数,θ是文本分类模型的参数,L C和L D分别是分类损失和对比损失,v是一个训练样本,y是训练样本的真实标签(即真实分类结果),D是数据分布,δ是扰动信息,E是训练样本的词嵌入。首先通过最大化分类损失来得到扰动信息δ,再把它加入训练样本的词嵌入上得到对应的词嵌入形式的对抗样本,然后通过最小化该对抗样本的分类损失L C和对比损失L D之和,来调整文本分类模型的参数,得到训练完成的文本分类模型。即在文本分类模型的训练过程中,不仅要正确分类训练样本对应的对抗样本,还要让训练样本对应的对抗样本的语义表示和训练样本的语义表示尽可能的相似。
综上所述,本申请实施例提供的技术方案,在对文本分类模型进行训练的过程中,除了计算文本分类模型的分类损失之外,还通过训练样本和获取到的扰动信息生成训练样本对应的对抗样本,进而基于训练样本及其对抗样本的语义表示来计算模型的对比损失,综合上述分类损失和对比损失优化模型参数,以达到训练文本分类模型的目的。这样,在文本分类模型的训练过程中,不仅要正确分类训练样本及其对抗样本,还要通过计算对比损失让模型输出的训练样本与其对抗样本的语义表示尽可能地接近,避免模型的编码器被扰动信息所干扰,该方法不仅能够提高分类器的准确性和鲁棒性,同时还会提高编码器的鲁棒性,从而实现了文本分类模型的分类效果和鲁棒性的整体提高。
另外,本申请还通过最小化对比损失,拉近训练样本的语义表示和训练样本对应的对抗样本的语义表示之间的表示距离,来增强编码器的鲁棒性,并输出高质量的对抗样本,从而提高了分类器的分类效果,进而提高文本分类模型的分类效果和鲁棒性。
上文对文本分类模型的训练方法进行了介绍说明,下面将对文本分类模型如何确定待分类文本所属的类别进行介绍说明:
在一个示例中,通过文本分类模型确定待分类文本所属的类别,参考图6,具体包括如下几个步骤(601-603):
步骤601,获取待分类文本。
其中,待分类文本的所属类别未知,其中,待分类文本可以是一个或多个。
步骤602,通过文本分类模型提取待分类文本的语义表示。
上述文本分类模型包括词嵌入提取单元、编码器和分类器;其中,词嵌入提取单元用于提取待分类文本的词嵌入;编码器用于基于待分类文本的词嵌入,生成待分类文本的语义表示;分类器用于基于待分类文本的语义表示,确定待分类文本所属的类别。此时,待分类文本作为文本分类模型的输入文本。
例如,基于待分类文本的词嵌入,文本分类模型通过编码器生成待分类文本的语义表示。
步骤603,基于待分类文本的语义表示,确定待分类文本所属的类别。
基于待分类文本的语义表示,文本分类模型通过分类器确定待分类文本所属的类别。文本分类模型输出的待分类文本所属类别可以是模型中所有类别比重最高的。
综上所述,本申请通过训练好的文本分类模型对待分类文本进行分类,以确定待分类文本所属的类别。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
参考图7,其示出了本申请一个实施例提供的文本分类模型训练装置的框图。该装置具有实现上述方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的计算机设备,也可以设置在计算机设备中。如图7所示,该装置700包括:训练样本获取模块701、分类结果预测模块702、对抗样本生成模块703、语义表示生成模块704、分类损失生成模块705、对比损失生成模块706和分类模型训练模块707。
所述训练样本获取模块701,用于获取文本分类模型的训练样本,所述训练样本为文本。
所述分类结果预测模块702,用于通过所述文本分类模型确定所述训练样本的语义表示,以及基于所述语义表示确定所述训练样本的预测分类结果。
所述对抗样本生成模块703,用于根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本。
所述语义表示生成模块704,用于通过所述文本分类模型确定所述训练样本对应的对抗样本的语义表示。
所述分类损失生成模块705,用于基于所述训练样本的预测分类结果确定所述文本分类模型的分类损失。
所述对比损失生成模块706,用于基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失。
所述分类模型训练模块707,用于根据所述分类损失和所述对比损失,对所述文本分类模型进行训练。
在一个示例性实施例中,如图8所示,所述对比损失生成模块706,包括:第一损失确定单元706a、第二损失确定单元706b和对比损失确定单元706c。
第一损失确定单元706a,用于基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及异样本的语义表示,确定第一对比损失;其中,所述异样本是指与所述训练样本属于不同类别的样本。
第二损失确定单元706b,用于基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及所述异样本对应的对抗样本的语义表示,确定第二对比损失。
对比损失确定单元706c,用于根据所述第一对比损失和所述第二对比损失,确定所述文本分类模型的对比损失。
在一个示例性实施例中,所述第一损失确定单元706a,用于:
计算第一相似度,所述第一相似度是指所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示之间的相似度;
计算第二相似度,所述第二相似度是指所述训练样本对应的对抗样本的语义表示和所述异样本的语义表示之间的相似度;
根据所述第一相似度和所述第二相似度,计算所述第一对比损失。
在一个示例性实施例中,所述第二损失确定单元706b,用于:
计算第三相似度,所述第三相似度是指所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示之间的相似度;
计算第四相似度,所述第四相似度是指所述训练样本的语义表示和所述异样本对应的对抗样本的语义表示之间的相似度;
根据所述第三相似度和所述第四相似度,计算所述第二对比损失。
在一个示例性实施例中,所述第一损失确定单元706a或所述第二损失确定单元706b还用于:
从与所述训练样本属于不同类别的样本中,随机采样获取m个样本得到所述异样本,所述m为正整数。
在一个示例性实施例中,所述分类模型训练模块707,用于:
根据所述分类损失和所述对比损失,确定所述文本分类模型的总损失;
调整所述文本分类模型的参数,以最小化所述总损失,得到完成训练的所述文本分类模型。
在一个示例性实施例中,所述对抗样本生成模块703,用于:
获取所述训练样本的词嵌入;
给所述训练样本的词嵌入添加扰动信息,得到处理后的词嵌入;
其中,所述处理后的词嵌入作为所述训练样本对应的对抗样本。
在一个示例性实施例中,所述对抗样本生成模块703,用于:
给所述训练样本的文本内容添加扰动信息,得到处理后的文本信息;
其中,所述处理后的文本信息作为所述训练样本对应的对抗样本。
在一个示例性实施例中,如图8所示,所述装置700还包括:缓冲器创建模块708和缓冲器更新模块709。
所述缓冲器创建模块708,用于创建第一缓冲器和第二缓冲器;其中,所述第一缓冲器用于存储所述训练样本的语义表示,所述第二缓冲器用于存储所述训练样本对应的对抗样本的语义表示。
所述缓冲器更新模块709,用于对所述第一缓冲器中存储的数据进行动态更新。
所述缓冲器更新模块709,还用于对所述第二缓冲器中存储的数据进行动态更新。
在一个示例性实施例中,所述文本分类模型包括词嵌入提取单元、编码器、分类器和对比损失计算单元。
其中,所述词嵌入提取单元用于提取输入文本的词嵌入。
所述编码器用于基于所述输入文本的词嵌入,生成所述输入文本的语义表示。
所述分类器用于基于所述输入文本的语义表示,确定所述输入文本所属的类别。
所述对比损失计算单元用于计算所述文本分类模型的对比损失。
综上所述,本申请实施例提供的技术方案,在对文本分类模型进行训练的过程中,除了计算文本分类模型的分类损失之外,还通过训练样本和获取到的扰动信息生成训练样本对应的对抗样本,进而基于训练样本及其对抗样本的语义表示来计算模型的对比损失,综合上述分类损失和对比损失优化模型参数,以达到训练文本分类模型的目的。这样,在文本分类模型的训练过程中,不仅要正确分类训练样本及其对抗样本,还要通过计算对比损失让模型输出的训练样本与其对抗样本的语义表示尽可能地接近,避免模型的编码器被扰动信息所干扰,该方法不仅能够提高分类器的准确性和鲁棒性,同时还会提高编码器的鲁棒性,从而实现了文本分类模型的分类效果和鲁棒性的整体提高。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外, 上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图9,其示出了本申请一个实施例提供的计算机设备的结构框图。该计算机设备可以用于实施上述实施例中提供的文本分类模型的训练方法。具体来讲:
该计算机设备900包括处理单元(如CPU(Central Processing Unit,中央处理器)、GPU(Graphics Processing Unit,图形处理器)和FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)等)901、包括RAM(Random-Access Memory,随机存储器)902和ROM(Read-Only Memory,只读存储器)903的***存储器904,以及连接***存储器904和中央处理单元901的***总线905。该计算机设备900还包括帮助服务器内的各个器件之间传输信息的基本输入/输出***(Input Output System,I/O***)906,和用于存储操作***913、应用程序914和其他程序模块915的大容量存储设备907。
该基本输入/输出***906包括有用于显示信息的显示器908和用于用户输入信息的诸如鼠标、键盘之类的输入设备909。其中,该显示器908和输入设备909都通过连接到***总线905的输入输出控制器910连接到中央处理单元901。该基本输入/输出***906还可以包括输入输出控制器910以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器910还提供输出到显示屏、打印机或其他类型的输出设备。
该大容量存储设备907通过连接到***总线905的大容量存储控制器(未示出)连接到中央处理单元901。该大容量存储设备907及其相关联的计算机可读介质为计算机设备900提供非易失性存储。也就是说,该大容量存储设备907可以包括诸如硬盘或者CD-ROM(Compact Disc Read-Only Memory,只读光盘)驱动器之类的计算机可读介质(未示出)。
不失一般性,该计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦写可编程只读存储器)、闪存或其他固态存储其技术,CD-ROM、DVD(Digital Video Disc,高密度数字视频光盘)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知该计算机存储介质不局限于上述几种。上述的***存储器904和大容量存储设备907可以统称为存储器。
根据本申请实施例,该计算机设备900还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备900可以通过连接在该***总线905上的网络接口单元911连接到网络912,或者说,也可以使用网络接口单元911来连接到其他类型的网络或远程计算机***(未示出)。
所述存储器还包括计算机程序,该计算机程序存储于存储器中,且经配置以由一个或者一个以上处理器执行,以实现上述文本分类模型的训练方法。
在一个示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集在被处理器执行时以实现上述文本分类模型的训练方法。
可选地,该计算机可读存储介质可以包括:ROM(Read-Only Memory,只读存储器)、RAM(Random-Access Memory,随机存储器)、SSD(Solid State Drives,固态硬盘)或光盘等。其中,随机存取记忆体可以包括ReRAM(Resistance Random Access Memory,电阻式随机存取记忆体)和DRAM(Dynamic Random Access Memory,动态随机存取存储器)。
在一个示例性实施例中,还提供了一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从所述计算机可读存储介质中读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行上述文本分类模型的训练方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种文本分类模型的训练方法,所述方法由计算机设备执行,所述方法包括:
    获取文本分类模型的训练样本,所述训练样本为文本;
    通过所述文本分类模型确定所述训练样本的语义表示,以及基于所述语义表示确定所述训练样本的预测分类结果;
    根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本;
    通过所述文本分类模型确定所述训练样本对应的对抗样本的语义表示;
    基于所述训练样本的预测分类结果确定所述文本分类模型的分类损失;
    基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失;
    根据所述分类损失和所述对比损失,对所述文本分类模型进行训练。
  2. 根据权利要求1所述的方法,所述基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失,包括:
    基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及异样本的语义表示,确定第一对比损失;其中,所述异样本是指与所述训练样本属于不同类别的样本;
    基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及所述异样本对应的对抗样本的语义表示,确定第二对比损失;
    根据所述第一对比损失和所述第二对比损失,确定所述文本分类模型的对比损失。
  3. 根据权利要求2所述的方法,所述基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及异样本的语义表示,确定第一对比损失,包括:
    计算第一相似度,所述第一相似度是指所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示之间的相似度;
    计算第二相似度,所述第二相似度是指所述训练样本对应的对抗样本的语义表示和所述异样本的语义表示之间的相似度;
    根据所述第一相似度和所述第二相似度,计算所述第一对比损失。
  4. 根据权利要求2所述的方法,所述基于所述训练样本的语义表示、所述训练样本对应的对抗样本的语义表示,以及所述异样本对应的对抗样本的语义表示,确定第二对比损失,包括:
    计算第三相似度,所述第三相似度是指所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示之间的相似度;
    计算第四相似度,所述第四相似度是指所述训练样本的语义表示和所述异样本对应的对抗样本的语义表示之间的相似度;
    根据所述第三相似度和所述第四相似度,计算所述第二对比损失。
  5. 根据权利要求3或4所述的方法,所述方法还包括:
    从与所述训练样本属于不同类别的样本中,随机采样获取m个样本得到所述异样本,所述m为正整数。
  6. 根据权利要求1所述的方法,所述根据所述分类损失和所述对比损失,对所述文本分类模型进行训练,包括:
    根据所述分类损失和所述对比损失,确定所述文本分类模型的总损失;
    调整所述文本分类模型的参数,以最小化所述总损失,得到完成训练的所述文本分类模型。
  7. 根据权利要求1所述的方法,所述根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本,包括:
    获取所述训练样本的词嵌入;
    给所述训练样本的词嵌入添加扰动信息,得到处理后的词嵌入;
    其中,所述处理后的词嵌入作为所述训练样本对应的对抗样本。
  8. 根据权利要求1所述的方法,所述根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本,包括:
    给所述训练样本的文本内容添加扰动信息,得到处理后的文本信息;
    其中,所述处理后的文本信息作为所述训练样本对应的对抗样本。
  9. 根据权利要求1至4任一项所述的方法,所述方法还包括:
    创建第一缓冲器和第二缓冲器;其中,所述第一缓冲器用于存储所述训练样本的语义表示,所述第二缓冲器用于存储所述训练样本对应的对抗样本的语义表示;
    对所述第一缓冲器中存储的数据进行动态更新;
    对所述第二缓冲器中存储的数据进行动态更新。
  10. 根据权利要求1至4任一项所述的方法,所述文本分类模型包括词嵌入提取单元、编码器、分类器和对比损失计算单元;其中,
    所述词嵌入提取单元用于提取输入文本的词嵌入;
    所述编码器用于基于所述输入文本的词嵌入,生成所述输入文本的语义表示;
    所述分类器用于基于所述输入文本的语义表示,确定所述输入文本所属的类别;
    所述对比损失计算单元用于计算所述文本分类模型的对比损失。
  11. 一种文本分类模型的训练装置,所述装置部署在计算机设备上,所述装置包括:
    训练样本获取模块,用于获取文本分类模型的训练样本,所述训练样本为文本;
    分类结果预测模块,用于通过所述文本分类模型确定所述训练样本的语义表示,以及基于所述语义表示确定所述训练样本的预测分类结果;
    对抗样本生成模块,用于根据所述训练样本和获取到的扰动信息生成所述训练样本对应的对抗样本;
    语义表示生成模块,用于通过所述文本分类模型确定所述训练样本对应的对抗样本的语义表示;
    分类损失生成模块,用于基于所述训练样本的预测分类结果确定所述文本分类模型的分类损失;
    对比损失生成模块,用于基于所述训练样本的语义表示和所述训练样本对应的对抗样本的语义表示,确定所述文本分类模型的对比损失;
    分类模型训练模块,用于根据所述分类损失和所述对比损失,对所述文本分类模型进行训练。
  12. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述权利要求1至10任一项所述的文本分类模型的训练方法。
  13. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述权利要求1至10任一项所述的文本分类模型的训练方法。
  14. 一种计算机程序产品,当所述计算机程序产品被执行时,用于实现如上述权利要求1-10任一项所述的文本分类模型的训练方法。
PCT/CN2021/101372 2020-07-30 2021-06-22 文本分类模型的训练方法、装置、设备及存储介质 WO2022022163A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/948,348 US20230016365A1 (en) 2020-07-30 2022-09-20 Method and apparatus for training text classification model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010753159.6 2020-07-30
CN202010753159.6A CN111767405B (zh) 2020-07-30 2020-07-30 文本分类模型的训练方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/948,348 Continuation US20230016365A1 (en) 2020-07-30 2022-09-20 Method and apparatus for training text classification model

Publications (1)

Publication Number Publication Date
WO2022022163A1 true WO2022022163A1 (zh) 2022-02-03

Family

ID=72727935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101372 WO2022022163A1 (zh) 2020-07-30 2021-06-22 文本分类模型的训练方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20230016365A1 (zh)
CN (1) CN111767405B (zh)
WO (1) WO2022022163A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756678A (zh) * 2022-03-25 2022-07-15 鼎富智能科技有限公司 一种未知意图文本的识别方法及装置
US20230409832A1 (en) * 2022-06-16 2023-12-21 International Business Machines Corporation System and method for generating contrastive explanations for text guided by attributes

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767405B (zh) * 2020-07-30 2023-12-08 腾讯科技(深圳)有限公司 文本分类模型的训练方法、装置、设备及存储介质
CN112487826A (zh) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 信息抽取方法、抽取模型训练方法、装置以及电子设备
CN112507735B (zh) * 2020-12-18 2024-07-02 北京百度网讯科技有限公司 机器翻译模型的训练方法、装置和电子设备
CN112685539B (zh) * 2020-12-31 2022-12-23 成都网安科技发展有限公司 基于多任务融合的文本分类模型训练方法和装置
CN112765319B (zh) * 2021-01-20 2021-09-03 中国电子信息产业集团有限公司第六研究所 一种文本的处理方法、装置、电子设备及存储介质
CN112800227B (zh) * 2021-01-29 2023-01-17 科大讯飞股份有限公司 文本分类模型的训练方法及其设备、存储介质
CN112966112B (zh) * 2021-03-25 2023-08-08 支付宝(杭州)信息技术有限公司 基于对抗学习的文本分类模型训练和文本分类方法及装置
CN113053516A (zh) * 2021-03-26 2021-06-29 安徽科大讯飞医疗信息技术有限公司 一种对抗样本生成方法、装置、设备及存储介质
CN113110592B (zh) * 2021-04-23 2022-09-23 南京大学 一种无人机避障与路径规划方法
CN113220553B (zh) * 2021-05-13 2022-06-17 支付宝(杭州)信息技术有限公司 一种文本预测模型性能的评估方法和装置
CN113673201A (zh) * 2021-07-15 2021-11-19 北京三快在线科技有限公司 一种文本表示向量生成方法、装置、存储介质及电子设备
CN113836297B (zh) * 2021-07-23 2023-04-14 北京三快在线科技有限公司 文本情感分析模型的训练方法及装置
CN113590761B (zh) * 2021-08-13 2022-03-25 网易有道信息技术(北京)有限公司 文本处理模型的训练方法、文本处理方法及相关设备
CN113723070B (zh) * 2021-08-20 2024-01-23 上海浦东发展银行股份有限公司 文本相似度模型训练方法、文本相似度检测方法及装置
CN113705244B (zh) * 2021-08-31 2023-08-22 平安科技(深圳)有限公司 对抗文本样本生成方法、装置与存储介质
CN113837370B (zh) * 2021-10-20 2023-12-05 贝壳找房(北京)科技有限公司 用于训练基于对比学习的模型的方法和装置
CN114298122B (zh) * 2021-10-22 2024-06-18 腾讯科技(深圳)有限公司 数据分类方法、装置、设备、存储介质及计算机程序产品
CN114330312B (zh) * 2021-11-03 2024-06-14 腾讯科技(深圳)有限公司 标题文本处理方法、装置、存储介质和程序
CN114049634B (zh) * 2022-01-12 2022-05-13 深圳思谋信息科技有限公司 一种图像识别方法、装置、计算机设备和存储介质
CN114676255A (zh) * 2022-03-29 2022-06-28 腾讯科技(深圳)有限公司 文本处理方法、装置、设备、存储介质及计算机程序产品
CN114841137A (zh) * 2022-04-18 2022-08-02 北京百度网讯科技有限公司 模型获取方法、装置、电子设备及存储介质
CN114648032B (zh) * 2022-05-23 2022-08-19 腾讯科技(深圳)有限公司 语义理解模型的训练方法、装置和计算机设备
CN116010595A (zh) * 2022-11-15 2023-04-25 东北林业大学 基于同构性和异质性动态信息交互的多模态情感分类方法
CN116861302B (zh) * 2023-09-05 2024-01-23 吉奥时空信息技术股份有限公司 一种案件自动分类分拨方法
CN116932767B (zh) * 2023-09-18 2023-12-12 江西农业大学 基于知识图谱的文本分类方法、***、存储介质及计算机
CN117093715B (zh) * 2023-10-18 2023-12-29 湖南财信数字科技有限公司 词库扩充方法、***、计算机设备及存储介质
CN118277575A (zh) * 2024-06-04 2024-07-02 湖南工商大学 一种用于文本情感分析的集成对比方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095432A1 (en) * 2017-09-26 2019-03-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for building text classification model, and text classification method and apparatus
WO2019210695A1 (zh) * 2018-05-02 2019-11-07 北京三快在线科技有限公司 模型训练和业务推荐
CN110457701A (zh) * 2019-08-08 2019-11-15 南京邮电大学 基于可解释性对抗文本的对抗训练方法
CN110502976A (zh) * 2019-07-10 2019-11-26 深圳追一科技有限公司 文本识别模型的训练方法及相关产品
CN111767405A (zh) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 文本分类模型的训练方法、装置、设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704619B (zh) * 2019-09-24 2022-06-10 支付宝(杭州)信息技术有限公司 文本分类方法、装置及电子设备
CN111241287A (zh) * 2020-01-16 2020-06-05 支付宝(杭州)信息技术有限公司 用于生成对抗文本的生成模型的训练方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095432A1 (en) * 2017-09-26 2019-03-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for building text classification model, and text classification method and apparatus
WO2019210695A1 (zh) * 2018-05-02 2019-11-07 北京三快在线科技有限公司 模型训练和业务推荐
CN110502976A (zh) * 2019-07-10 2019-11-26 深圳追一科技有限公司 文本识别模型的训练方法及相关产品
CN110457701A (zh) * 2019-08-08 2019-11-15 南京邮电大学 基于可解释性对抗文本的对抗训练方法
CN111767405A (zh) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 文本分类模型的训练方法、装置、设备及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAI GUOYONG, QIANG LIN, KAIQI REN: "Cross-domain text sentiment classification based on domain-adversarial network and BERT", JOURNAL OF SHANDONG UNIVERSITY ENGINEERING SCIENCE, vol. 50, no. 1, 29 February 2020 (2020-02-29), XP055892083, ISSN: 1672-3961, DOI: 10.6040/j.issn.1672-3961.0.2019.293 *
CHEN HUIMIN: "Research on Text Sentiment Analysis Based on Adversarial Training", MASTER THESIS, TIANJIN POLYTECHNIC UNIVERSITY, CN, no. 1, 15 January 2020 (2020-01-15), CN , XP055892090, ISSN: 1674-0246 *
ZHANG XIAOHUI, YU SHUANG-YUAN;WANG QUAN-XIN;XU BAO-MIN: "Text Representation and Classification Algorithm Based on Adversarial Training", COMPUTER SCIENCE, KEXUE JISHU WENXIAN CHUBANSHE CHONGQING FENSHE, CN, vol. 47, no. 6, 15 June 2020 (2020-06-15), CN , pages 12 - 16, XP055892081, ISSN: 1002-137X, DOI: 10.11896/jsjkx.200200076 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756678A (zh) * 2022-03-25 2022-07-15 鼎富智能科技有限公司 一种未知意图文本的识别方法及装置
CN114756678B (zh) * 2022-03-25 2024-05-14 鼎富智能科技有限公司 一种未知意图文本的识别方法及装置
US20230409832A1 (en) * 2022-06-16 2023-12-21 International Business Machines Corporation System and method for generating contrastive explanations for text guided by attributes

Also Published As

Publication number Publication date
US20230016365A1 (en) 2023-01-19
CN111767405A (zh) 2020-10-13
CN111767405B (zh) 2023-12-08

Similar Documents

Publication Publication Date Title
WO2022022163A1 (zh) 文本分类模型的训练方法、装置、设备及存储介质
CN108984526B (zh) 一种基于深度学习的文档主题向量抽取方法
CN111753081B (zh) 基于深度skip-gram网络的文本分类的***和方法
CN112084327B (zh) 在保留语义的同时对稀疏标注的文本文档的分类
CN111930942B (zh) 文本分类方法、语言模型训练方法、装置及设备
WO2021121198A1 (zh) 基于语义相似度的实体关系抽取方法、装置、设备及介质
Shuang et al. AELA-DLSTMs: attention-enabled and location-aware double LSTMs for aspect-level sentiment classification
CN112164391A (zh) 语句处理方法、装置、电子设备及存储介质
WO2023137911A1 (zh) 基于小样本语料的意图分类方法、装置及计算机设备
WO2023134083A1 (zh) 基于文本的情感分类方法和装置、计算机设备、存储介质
Zhang et al. Sentiment classification for Chinese text based on interactive multitask learning
CN112052424B (zh) 一种内容审核方法及装置
CN113761190A (zh) 文本识别方法、装置、计算机可读介质及电子设备
CN114897060B (zh) 样本分类模型的训练方法和装置、样本分类方法和装置
CN114492661B (zh) 文本数据分类方法和装置、计算机设备、存储介质
CN110569355A (zh) 一种基于词块的观点目标抽取和目标情感分类联合方法及***
WO2023116572A1 (zh) 一种词句生成方法及相关设备
CN116975292A (zh) 信息识别方法、装置、电子设备、存储介质及程序产品
CN114970557B (zh) 基于知识增强的跨语言结构化情感分析方法
WO2023159759A1 (zh) 模型的训练方法、情感消息生成方法和装置、设备、介质
CN116263786A (zh) 舆情文本情感分析方法、装置、计算机设备及介质
CN113988085B (zh) 文本语义相似度匹配方法、装置、电子设备及存储介质
Ling Coronavirus public sentiment analysis with BERT deep learning
Yang [Retracted] Application of English Vocabulary Presentation Based on Clustering in College English Teaching
CN111723301B (zh) 基于层次化主题偏好语义矩阵的关注关系识别及标注方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850686

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 270623)

122 Ep: pct application non-entry in european phase

Ref document number: 21850686

Country of ref document: EP

Kind code of ref document: A1