CN115269844A

CN115269844A - Model processing method and device, electronic equipment and storage medium

Info

Publication number: CN115269844A
Application number: CN202210917417.9A
Authority: CN
Inventors: 周青宇; 李映辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-11-01
Anticipated expiration: 2042-08-01
Also published as: CN115269844B

Abstract

The embodiment of the application discloses a model processing method, a model processing device, electronic equipment and a storage medium; the method comprises the steps that a previous model, a training sample set and an expansion sample set can be obtained; training a previous model by adopting a training sample set to obtain a current model; inputting the entity in the extended sample set into the current model to obtain a prediction result corresponding to the entity; obtaining a prediction confidence corresponding to the entity based on the prediction result corresponding to the entity; classifying the entities according to the prediction confidence degrees corresponding to the entities to obtain the types of the entities; obtaining the contrast loss of the current model according to the entity pair; when the contrast loss is not greater than a preset threshold value, returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and when the contrast loss is not greater than the preset threshold value, finishing the training. Therefore, the accuracy of the model to be used in the using process can be improved.

Description

Model processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model processing method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, natural language processing technology is gradually mature. The information extraction is an important natural language processing technology, and aims to extract fact information such as entities, relations and events of specified types from natural language texts, so as to form structured data format output.

However, when the existing pre-training language model extracts information from natural language text according to a given entity, the accuracy of the extracted result is not high.

Disclosure of Invention

The embodiment of the application provides a model processing method and device, electronic equipment and a storage medium, and can improve the accuracy of a model to be used in use.

The embodiment of the application provides a model processing method, which comprises the following steps:

acquiring a last model, a training sample set and an expansion sample set, wherein the last model comprises an initial neural network model;

training the previous model by adopting the training sample set to obtain a current model;

inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity;

obtaining a prediction confidence corresponding to the entity based on the prediction result corresponding to the entity;

classifying the entities according to the prediction confidence degrees corresponding to the entities to obtain the types of the entities;

obtaining the contrast loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity;

when the contrast loss is not greater than a preset threshold value, returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model;

and when the contrast loss is not greater than a preset threshold value, ending the training.

An embodiment of the present application further provides a processing apparatus for a model, including:

the device comprises an acquisition unit, a comparison unit and a processing unit, wherein the acquisition unit is used for acquiring a previous model, a training sample set and an expansion sample set, and the previous model comprises an initial neural network model;

a first obtaining unit, configured to input an entity in the extended sample set to the current model, and obtain a prediction result corresponding to the entity;

a second obtaining unit, configured to obtain, based on a prediction result corresponding to the entity, a prediction confidence corresponding to the entity;

the classification unit is used for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity;

a third obtaining unit, configured to obtain a contrast loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive-type entity, and the negative entity is a negative-type entity;

a returning unit, configured to, when the contrast loss is not greater than a preset threshold, return to and execute the steps to obtain a previous model, a training sample set, and an extended sample set, where the previous model includes an initial neural network model;

and the training ending unit is used for ending the training when the contrast loss is not greater than a preset threshold value.

The embodiment of the application also provides electronic equipment, which comprises a memory, a storage and a control unit, wherein the memory stores a plurality of instructions; the processor loads instructions from the memory to perform the steps of any of the model processing methods provided by the embodiments of the present application.

The embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored in the computer-readable storage medium, and the instructions are suitable for being loaded by a processor to perform the steps in the processing method of any one of the models provided in the embodiment of the present application.

The method comprises the steps of obtaining a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; training the previous model by adopting the training sample set to obtain a current model; inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity; obtaining a prediction confidence corresponding to the entity based on the prediction result corresponding to the entity; classifying the entities according to the prediction confidence degrees corresponding to the entities to obtain the types of the entities; obtaining the contrast loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity; when the contrast loss is not greater than a preset threshold value, returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and when the contrast loss is not greater than a preset threshold value, finishing the training.

In the application, an initial neural network model can be trained through a training sample set to obtain a trained current model, then an extended sample set is input into the current model to obtain a prediction result, an entity type in the extended sample set can be obtained according to the prediction result, a comparison loss can be obtained according to the entity type, and the training completion degree of the current model can be determined according to the comparison loss. Meanwhile, when the previous model is trained, the training effect can be ensured by using the training sample set, and the problem that in the training process, when a new entity is introduced, the entity added in the set is easy to deviate from the category more and form vicious circle because the newly added entity does not belong to the category or has strong semantic information irrelevant to the category is avoided, so that the precision of the model to be used after the training is finished is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1a is a schematic view of a scenario of a model processing method provided in an embodiment of the present application;

FIG. 1b is a schematic flow chart illustrating a method for processing a model provided in an embodiment of the present application;

FIG. 2a is a schematic diagram of a server scenario in which a model processing method provided in an embodiment of the present application is applied;

FIG. 2b is a schematic flow chart of another model processing method provided in the embodiments of the present application;

FIG. 2c is a schematic flow chart illustrating another model processing method provided in the embodiments of the present application;

FIG. 2d is a schematic flow chart illustrating another exemplary model processing method according to an embodiment of the present disclosure;

FIG. 3 is a first schematic structural diagram of a model processing device provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a model processing method and device, electronic equipment and a storage medium.

The processing device of the model may be specifically integrated in an electronic device, and the electronic device may be a terminal, a server, or the like. The terminal can be a mobile phone, a tablet Computer, an intelligent bluetooth device, a notebook Computer, or a Personal Computer (PC), and the like; the server may be a single server or a server cluster composed of a plurality of servers.

In some embodiments, the processing apparatus of the model may also be integrated in multiple electronic devices, for example, the processing apparatus of the model may be integrated in multiple servers, and the processing method of the model of the present application is implemented by the multiple servers.

In some embodiments, the server may also be implemented in the form of a terminal.

For example, referring to fig. 1a, the electronic device may be integrated in a server, and may obtain a last model, a training sample set, and an extended sample set, the last model including an initial neural network model; training the previous model by adopting the training sample set to obtain a current model; inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity; obtaining a prediction confidence corresponding to the entity based on the prediction result corresponding to the entity; classifying the entities according to the prediction confidence degrees corresponding to the entities to obtain the types of the entities; obtaining the contrast loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity; when the contrast loss is not greater than a preset threshold value, returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and when the contrast loss is not greater than a preset threshold value, finishing the training.

The following are detailed descriptions. The numbers in the following examples are not intended to limit the order of preference of the examples.

Artificial Intelligence (AI) is a technique that uses a digital computer to simulate the human perception environment, acquire knowledge, and use the knowledge, which can make a machine function similar to human perception, reasoning, and decision making. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between a person and a computer using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

In this embodiment, a processing method of a model based on machine learning is provided, and as shown in fig. 1b, a specific flow of the processing method of the model may be as follows:

110. obtaining a last model, a training sample set and an expansion sample set, wherein the last model comprises an initial neural network model.

For example, in some embodiments, the previous model is the initial neural network model during the first iterative training, and the previous model is the model after the initial neural network model is trained during the second iterative training.

An iteration may be a process of repeating feedback while training the previous model, where each repetition of the process is referred to as an "iteration".

Wherein the last model may be a language model, an audio model, an image model, wherein, in some embodiments, the language model may be a natural language processing model for open information extraction, the open information extraction may be an entity set extension (entitysetextension), may be based on the acquired seed entities, and extracting entities with the same types as the seed entities from the natural language text or the network, and expanding entity information into the entities, wherein when the information is extracted, a natural language processing model is required to automatically judge the category information of the obtained seed entities, or category word expansion is carried out according to the categories of the seed entities. The language model may also be a natural language processing model for other domains.

For example, in some embodiments, the previous model may be a BERT (Bidirectional Encoder representation from transform), and may also be a Neural Network model such as CNN (convolutional Neural Network), DNN (Deep Neural Network), LSTM (Long Short-Term Memory Neural Network), and the like, which is not limited herein.

When the last model is BERT, the BERT adopts an encoder-decoder architecture, wherein the encoder is a multi-layer bidirectional Transformer (neural network architecture) network conforming to the BERT setting, the decoder is a classification head, and the classification head comprises two fully-connected layers and a softmax (logistic regression algorithm) layer. In use, an input character sequence is first decomposed into individual words using a word segmenter. The word embedding and position embedding for each word are then summed to yield its input embedding vector. The sequence of embedded vectors is input and then passed through 12-layer bi-directional transformers, where the dimension H =768 of the hidden embedded vector of each Transformer and the number of self-attention heads a =12. The final hidden embedded vector corresponding to the masked entity is then input into the classification header for decoding and output of the predicted probability distribution.

The training sample set may be a sample text for training the previous model, where the training sample set may be the same sample text or different sample texts during each iteration of training.

In some embodiments, the training sample set may include an entity in the entity vocabulary and a sample sentence containing the entity, where the sample sentence may be a text sentence.

The extended sample set may include a training sample set, and may also include other training sample sets.

Entities can be included in the training sample set and the expansion sample set, wherein the entities can be some distinguishable and independent matters which exist in practice in a guest, such as names of people, places, commodities, names and the like.

The training sample set and the extended sample set may be entity sets, and the entity sets may be sets of entities having the same type and the same attribute, for example, the entity sets may be sets of names of people, and the entity sets of different types and different attributes may be sets of names of people and names of places.

In this embodiment of the present application, the method for obtaining an extended sample set may further include:

obtaining an expanded sample set, the expanded sample set comprising a training sample set;

and carrying out entity expansion processing on the expanded sample set to obtain an expanded sample set.

The extended sample set may be an extended sample set of a last iterative training, and the extended sample set may include a training sample set and a set obtained by performing entity extension processing according to the training sample set.

The entity extension process may be an entity set extension (entitysetextension), and may extract an entity of the same type as the seed entity in a natural language text or a network according to the acquired seed entity and extend entity information into the entity. In some embodiments, the entity expansion processing on the expanded sample set may be characterized by obtaining entities of the same entity type in the entity set and taking the entities of the same entity type as the expanded sample set.

In this embodiment, the method for performing entity expansion processing on the expanded sample set to obtain an expanded sample set may include:

selecting an expanded entity in the expanded sample set to obtain a selected entity;

carrying out entity expansion processing on the selected entity to obtain a selected expanded entity;

adding the selected expansion entity to the expanded sample set to obtain the expanded sample set.

The selected entities are entities screened from the expanded sample set to have higher quality, and the entities having higher quality may include entities expanded by initial iterative training and entities in the training sample set when initial iterative training is performed for the first time, wherein in some embodiments, the quality of the entities may be determined according to the true confidence and the prediction confidence of the entities.

In this embodiment of the present application, the method for selecting an expanded entity in the expanded sample set to obtain a selected entity may include:

inputting the extended entity in the extended sample set to the current model to obtain an extended prediction result corresponding to the extended entity;

obtaining an expanded prediction confidence corresponding to the expanded entity based on an expanded prediction result corresponding to the expanded entity;

and selecting a selected entity with the expanded prediction confidence larger than a preset expansion threshold from the expanded sample set according to the expanded prediction confidence.

Wherein the extended prediction confidence may be a prediction confidence of the extended entity.

The method for obtaining the selected entity according to the extended prediction confidence may be to compare the prediction confidence with a confidence threshold, and when the extended prediction confidence is greater than the confidence threshold, the quality characterizing the extended entity is higher, and may be used for extension.

In this embodiment of the present application, according to the extended prediction confidence, the method for selecting a selected entity from the extended sample set, where the extended prediction confidence is greater than a preset extension threshold, may include:

acquiring true confidence of more than one expanded entity in the expanded sample set;

carrying out weighted averaging processing on the real confidence degrees of more than one expanded entity to obtain a preset expansion threshold value of the real confidence degrees;

and selecting the expanded entity according to the preset expansion threshold value to obtain the selected entity of which the expanded prediction confidence coefficient is greater than the preset expansion threshold value.

The method for obtaining the selected entity may be to compare the prediction confidence of each expanded entity with the true confidence of one expanded entity in the expanded sample set or the average true confidence of two or more entity sets, and select an entity whose prediction confidence is greater than the true confidence. By predicting entities with confidence degrees higher than the true confidence degrees, entities generated in the initial iteration and original entities in the first iteration in the expanded sample set can be selected as expanded sets, and therefore the quality of the expanded sets is improved.

120. And training the previous model by adopting the training sample set to obtain the current model.

The training process may be a process of adjusting parameters of a previous model to obtain a current model. In some embodiments, the training process may include inputting a part of or all samples in a preset training sample set to a previous model, comparing an output result with a label corresponding to an entity in the training sample set to obtain a comparison result, and performing parameter adjustment according to the comparison result.

For example, in some embodiments, when the current iteration is finished, and when the next iteration is performed after the current iteration is finished, the current model of the current iteration is used as the previous model of the next iteration for training, and therefore, as the training is continuously performed, the model can be continuously updated in each iteration.

In this embodiment of the present application, the method for obtaining the current model by training the previous model with the training sample set may include:

acquiring a preset training sample set, wherein the training sample set comprises text sentences, and the text sentences comprise entities and labels corresponding to the entities;

performing mask processing on the entity to obtain a masked text statement;

and training the previous model by adopting the training sample set to obtain the current model.

The masking processing on the entity can be to mask the entity in the text statement, and obtain the semantic category and the expanded entity of the entity through context during training.

For example, in some embodiments, a mask word may be used to replace an entity in the text statement, thereby obtaining a sample statement with the mask word.

In order to avoid the influence of over-sampling or under-sampling, which is used for reducing the negative influence caused by the imbalance of the occurrence frequency of the entities, only one entity is carried in each sample statement, and the number of the sample statements corresponding to each entity is equal to the average of the total number of the entities in the training sample set and the total number of the sample statements. When a sample statement includes a plurality of entities, the sample statement may be divided into a plurality of sub-sample statements with only one entity. This may be done by selecting a cross-entropy loss of label smoothing so that entities with similar semantic information as the entity being trained or predicted are not overly suppressed in the output predicted entity probability distribution.

For example, in some embodiments, when there are 10 text sentences corresponding to an entity, there are 100 sample sentences in the training sample set when there are 10 entities in the training sample set.

130. And inputting the entity in the extended sample set into the current model to obtain a prediction result corresponding to the entity.

The prediction result may be entity extension information obtained by inputting the entity into the current model and performing natural language processing of open information extraction.

For example, in some embodiments, when the entity set is a place name entity set, the entity may be a "XXX" place name, and after the entity may be the "XXX" place name entity set and trained by inputting the XXX "place name entity set into the current model, the obtained prediction result may be a" XXX "directly prefectured city name, a" YYY "province name, or other names, such as a person name and a brand name.

140. And obtaining the prediction confidence corresponding to the entity based on the prediction result corresponding to the entity.

The confidence level may be that, in statistics, the confidence interval of a probability sample is an interval estimate of some overall parameter of the sample. The confidence interval exhibits the extent to which the true value of this parameter has a certain probability of falling around the measurement. The confidence interval indicates the degree of plausibility of the measured value of the measured parameter, i.e. the "certain probability" required above. This probability is called the confidence level. If one has a 55% support in one large selection and the confidence interval at a confidence level of 0.95 is (50%, 60%), then his true support (confidence) has a probability of ninety-five percent falling between fifty and sixty percent, and thus less than half of his true support is less than 2.5 percent (assuming the distribution is symmetric). The confidence level can therefore also be how large the corresponding probability of the estimated value and the overall parameter are within a certain permissible error range, this corresponding probability being referred to as the confidence level.

The prediction confidence coefficient can be represented according to the prediction probability distribution of the output result, wherein the larger the probability distribution is, the higher the prediction confidence coefficient is, and the more the first entity meets the preset semantic recognition requirement, so that the prediction probability distribution can be represented through the prediction confidence coefficient, and the entities are sequenced according to the size of the prediction confidence coefficient, so that an entity sequence set is obtained.

150. And classifying the entities according to the prediction confidence degrees corresponding to the entities to obtain the types of the entities.

The types of the entities comprise positive entities and negative entities, wherein the positive entities can be entities of which the output result belongs to the semantic category of the entities, namely the prediction confidence of the positive entities is higher. The positive entity set is a set of positive entities, wherein there may be more than one positive entity in the positive entity set. The negative entity may be an entity whose output result does not belong to the semantic category of the entity, i.e. the prediction confidence of the positive entity is low. The negative entity set is a set of negative entities, wherein more than one negative entity in the negative entity set can exist.

The preset prediction confidence and the comparison of the prediction confidence can represent the correlation degree between the actual semantic category and the prediction semantic category of the entity.

For example, in some embodiments, when the result obtained by semantic extraction of the entity by the current model is a place name and the semantic category corresponding to the entity is a person name, the output result does not belong to the semantic category, and the prediction confidence is low; when the result obtained by semantic extraction of the entity of the current model is the place name and the semantic category corresponding to the entity is also the place name, the output result belongs to the semantic category and the prediction confidence is high.

When the prediction confidence is judged to be high or low, the result of the entity can be considered to belong to the entity of the semantic category of the entity by comparing the prediction confidence with the preset prediction confidence, when the prediction confidence is higher than the preset prediction confidence, the result of the entity can be considered to not belong to the entity of the semantic category of the entity when the prediction confidence is lower than the preset prediction confidence, and the preset prediction confidence can be adjusted manually according to the needs of a user.

In some embodiments, since the output result may include a plurality of results, for example, a part of the output results are results belonging to the semantic category corresponding to the entity, and another part of the output results are results not belonging to the semantic category corresponding to the entity, when performing semantic division, it may be determined that the entity is a positive entity or a negative entity according to a ratio of all the results to the results belonging to the semantic category corresponding to the entity. The ratio may be a preset prediction confidence, and the ratio of the preset prediction confidence may be set manually.

In this embodiment of the present application, the method for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity may include:

according to the prediction confidence, carrying out entity sorting processing on the extended sample set to obtain a sorted entity sequence set;

classifying the entity sequence set to obtain the type of the entity, wherein the type of the entity comprises a positive type and a negative type, the positive type represents that the sequence position of the entity in the entity sequence set is higher than a preset sequence position, and the negative type represents that the sequence position of the entity in the entity sequence set is not higher than the preset sequence position.

The entity sequence set may refer to a set in which entities are sorted according to the prediction confidence. Wherein, in some embodiments, the higher the ranking position, the greater the token prediction confidence.

The preset sequence position can be a manually set hyper-parameter, and the entity meeting the confidence requirement and the entity not meeting the confidence requirement can be screened through the preset sequence position.

For example, in some embodiments, the entity sequence set may sequentially include an entity e1, an entity e2, an entity e3, and an entity e4, where 1, 2, 3, and 4 may characterize an ordering of the entity sequence set according to a predicted probability distribution characterized by the first output result, where when the preset sequence position may be set to 2.5, the positive entity set includes the entity e1 and the entity e2, and the negative entity set includes the entity e3 and the entity e4.

160. And obtaining the contrast loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity.

The comparison loss function may be a loss function adopted in comparison learning, and the comparison learning is one of self-supervision learning and is used for shortening the distance of the positive sample and lengthening the distance of the negative sample.

The computer device may calculate a comparison loss corresponding to each entity pair based on the semantic representation corresponding to the positive entity in each positive entity set and the semantic representation corresponding to the negative entity in each negative entity set, and finally construct a comparison loss function according to the comparison losses corresponding to all the entity pairs.

Alignment loss may characterize an example between a positive entity and a negative entity, and may be used to characterize whether the current model converges.

In this embodiment of the present application, the method for obtaining the contrast loss of the current model according to the entity pair may include:

constructing a contrast loss function model of the current model;

determining the positive entity and the negative entity according to the entity pair;

and inputting the positive entity and the negative entity into the contrast loss function model to obtain the contrast loss.

Wherein the loss of contrast function loss _cl This can be shown as follows:

wherein, pos _i Can be a positive entity, neg _i Can be a negative entity, z is the semantic representation of the entity; n can represent that all data are batched and then processed in sequence, the contrast loss is calculated once for each batch, parameters are updated, and then the next batch is processed; j (i) may represent (x) _i ,x _j ) The union of the positive entity set and the negative entity set; tau is ⁺ The prior class probability can be a human set hyper-parameter, and beta is a hyper-parameter for controlling the concentration degree of the difficult negative entity; t is a temperature factor.

In the embodiment of the present application, when there are at least two entity pairs; the method for obtaining the contrast loss of the current model according to the entity pair may include:

obtaining the contrast loss of each entity pair;

and weighting and averaging the contrast loss of each entity pair to obtain the contrast loss of the current model.

Wherein the loss function loss is passed _cl The contrast loss for each sample pair is calculated and then based on summing the contrast losses for each sample pair and dividing by the number of sample pairs, the contrast loss can be found.

170. And when the contrast loss is not greater than a preset threshold value, returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model.

After the contrast loss is obtained, the contrast loss is not greater than the preset threshold, which indicates that the current model is not converged, and the training effect of the current model is not completed, so that the step 110 can be returned and executed for iteration.

And returning and executing the steps to obtain a current model which can be currently iteratively trained by the previous model in the previous model, the training sample set and the expansion sample set.

180. And when the contrast loss is not greater than a preset threshold value, finishing the training.

After the contrast loss is obtained, the contrast loss is larger than a preset threshold value, the current model is represented to be converged, and the training effect of the current model is completed.

In the embodiment of the present application, when there are N previous models, where N is an integer greater than or equal to 2; the method for performing initial training processing on the previous model to obtain the current model of the current iteration may include:

carrying out initial training processing on the N previous models to obtain N models to be selected;

and screening and integrating the N models to be selected to obtain an integrated model, wherein the integrated model is the current model.

The initial training process may be a process of training the previous model respectively.

For example, in some embodiments, the initial training process may be performed on N previous models in a parallel processing manner, so as to obtain N candidate models.

The screening integration process may be a process of screening and integrating N candidate models according to performance expectations of the candidate models.

The integrated model may be a process of performing integrated processing on the screened candidate model to obtain a new model, where an output of the integrated model may be an average output of the screened candidate model.

In this embodiment of the present application, the screening and integrating process performed on the N candidate models to obtain the integrated language model may include: inputting the training sample set to the model to be selected to obtain a prediction output result of the model to be selected, wherein the training sample set comprises a text statement, and the text statement comprises an entity and a label corresponding to the entity; acquiring a cross entropy loss function corresponding to the sample label; calculating the cross entropy loss of each model to be selected according to the cross entropy loss function; screening N screening models from the N models to be selected according to cross entropy loss, wherein N is a positive integer less than or equal to N; and carrying out integration processing on the n screening models to obtain an integrated model.

In this embodiment of the present application, the method for performing integration processing on n screening models to obtain an integrated model may include:

acquiring input parameter sets of n screening models;

acquiring an output average value according to the input parameter set, wherein the output average value is the average value of the output results of the n screening models;

and constructing an integrated model according to the input parameter set and the output average value.

The input parameters may be variables input into the screening model, the input parameter sets of the n screening models, or the input parameter sets respectively input into the n screening models.

The output average value is the output average value obtained by inputting each input parameter in the parameter set into the n screening models.

Obtaining a previous model and an entity set, wherein the previous model is a language model obtained by previous iteration; inputting a preset training sample set into a previous model for training to obtain a current model of the current iteration, wherein the previous iteration is the previous iteration of the current iteration; inputting entities in the entity set to a current model for training to obtain a first prediction confidence coefficient, wherein the first prediction confidence coefficient represents a prediction probability distribution obtained by comparing an output result obtained by inputting the entities to the previous model with a label corresponding to the entities; obtaining a positive entity set and a negative entity set according to the first prediction confidence, wherein the positive entity set is an entity of which the first prediction confidence in the entity set is greater than the preset prediction confidence, and the negative entity set is an entity of which the first prediction confidence in the entity set is less than the preset prediction confidence; and obtaining a first comparison loss of the current model according to the positive entity set and the negative entity set, wherein the first comparison loss is obtained by training the current model according to a comparison loss function constructed by the positive entity set and the negative entity set. In the application, an initial neural network model can be trained through a training sample set to obtain a trained current model, then an extended sample set is input into the current model to obtain a prediction result, an entity type in the extended sample set can be obtained according to the prediction result, a comparison loss can be obtained according to the entity type, and the training completion degree of the current model can be determined according to the comparison loss, wherein when the training of the current model is not completed, the current model can be used as a previous model of next iteration to carry out iterative training through the extended sample set, when the training of the current model is completed, the current model is used as the trained initial neural network model, the entities in the extended sample set are classified through the current model after the training of the training sample set, so that the current model is continuously trained, the prediction effect is better, when the entities in the extended sample set are classified, the classification effect is better, the entity vectors of positive entities and negative entities are more separated, further, when the losses are calculated through the positive entities and the negative entities, the errors are reduced, and the training effect of the initial neural network model is ensured. Meanwhile, when the previous model is trained, the training effect can be guaranteed by using the training sample set, and the problem that in the training process, when a new entity is introduced, the entity added in the set subsequently is easy to deviate from the category due to the fact that the newly added entity does not belong to the category or has strong semantic information irrelevant to the category, so that vicious circle is formed is solved, and therefore the precision of the model to be used after the training is finished is improved.

The method described in the above embodiments is further described in detail below.

In this embodiment, a server is taken as an example, and the method in this embodiment will be described in detail.

As shown in fig. 2a, a specific flow of a model processing method is as follows:

201. and training the plurality of models by adopting a preset entity set to obtain a plurality of trained models.

The preset entity set comprises each entity in the entity word list and each sample statement containing the entity, wherein the entity in the sample statement is replaced by one mask word, so that the entity is shielded.

In order to reduce the over-sampling or under-sampling means of negative influence caused by entity imbalance and introduce certain sample randomness for subsequent ensemble learning, each sample statement in a preset entity set corresponds to an entity, and the number of the sample statements of each entity is controlled to be the average value of each entity and the total number of the sample statements. In this way, through the cross entropy loss of the label smoothing, it can be ensured that in the output predicted entity probability distribution, entities having similar semantic information to the entities are not excessively suppressed.

Wherein the cross entropy loss function loss is given by _pred Obtaining:

wherein v is _e Is the size of the mini-batch, v, at the time of training _e Is the size of the entity word list, eta is a smoothing coefficient, the larger eta is, the higher the label smoothing degree is, y _i The sample i corresponds to the index of the entity.

Through the cross entropy loss function, cross entropy loss of each trained model can be calculated, and the initial model is screened according to the cross entropy loss.

202. And screening the plurality of trained models to obtain k initial models meeting the effect requirement.

The screening processing refers to screening according to the use effect of the multiple trained models in the entity set extension task, and during screening, the trained models can be evaluated through the following evaluation model functions:

wherein,

all seed entities (entities in the entity set), S, of class cls _e Is all samples of entity e, r (e) is a probabilistic representation of entity e, KL _ Div may be KL (Kullback-Leibler) divergence.

After the trained model is evaluated by evaluating the model function, the scoring can be performed by the following scoring model:

and after the model is scored, obtaining the initial model meeting the requirements for the trained model.

203. And performing integration processing on the k initial models to obtain an integrated model.

After k initial models are screened out, a parameter set of each initial model is obtained, the parameter set is input into each initial model to obtain k model outputs, and finally the k outputs are averaged to obtain the output of the integrated model.

Wherein, the integration model is as follows:

wherein, the theta top is a parameter set,

is the output of the integration model.

204. Acquiring the predicted expected distribution of each entity in the entity set and the average real expected distribution of all entities;

205. and screening the expanded entity set with the predicted expected distribution larger than the average real expected distribution in the entity set by adopting a window algorithm.

The window algorithm may rank all entities in the extended set and the entity set according to the predicted expected distribution and the actual expected distribution of all entities to obtain a ranked entity set, where the entities in the extended entity set may be the top-ranked entities in the ranked entity set.

206. And carrying out entity set expansion processing on the expanded entity set to obtain an expanded set.

The number of new extensions to the extension set may be preset, and the extension is stopped after the preset number is reached.

207. And merging the expanded entity set and the first entity set to obtain a second entity set.

As shown in fig. 2c, the process diagram of another model processing method is that, the entity set is a "current entity set" as shown in fig. 2c, the "current entity set" is subjected to predictive training by the "entity-level pre-training language model" obtained by training in step 203 to obtain a confidence coefficient for representing "probabilistic representation of the entity set", the "current entity set" is ranked according to the confidence coefficient to obtain a candidate entity queue, then a higher entity is selected from the candidate entity queue by the "candidate entity score", and entity expansion processing is performed according to the entity to obtain a target entity, and the target entity is stored in the "current entity set" until the number of entity expansion satisfies a preset number requirement, and then expansion is stopped.

208. And re-ordering the entities in the second entity set to obtain a re-ordered entity set.

The re-ordering of the entities in the second entity set may be determined by scoring according to a ranking formula shown as follows:

wherein i is an entity e _i Is added into the order of the expanded set, rank (e) _i ) Is entity e _i Ranking in a set of ranking entities.

As shown in fig. 2d, which is a schematic flow diagram of another model processing method, the second entity set may be an "expanded entity set," and an "expanded entity ordered queue" that orders the second entities according to scores is obtained by scoring the second entities in the "expanded entity set," so as to determine the positive entity set and the negative entity set.

209. Determining a positive entity set and a negative entity set in the second entity set according to a preset ranking threshold;

210. and constructing a contrast loss function according to the positive entity set and the negative entity set.

Constructed contrast loss function loss _cl As follows:

wherein, pos _i Can be a positive entity, neg _i Can be a negative entity, z is the semantic representation of the entity; n can represent that all data are batched and then processed in sequence, the contrast loss is calculated once for each batch, parameters are updated, and then the next batch is processed; j (i) may represent (x) _i x _j ) The union of the positive entity set and the negative entity set; tau is ⁺ The prior class probability is a hyper-parameter set by a person, and beta is a hyper-parameter for controlling the concentration degree of the difficult negative entity; t is a temperature factor.

211. Training the integrated model according to the comparison loss function to obtain the comparison loss of the integrated model;

212. when the contrast loss of the integrated model represents that the current model is not converged, returning to and executing the step 201;

213. and when the contrast loss of the integrated model represents the convergence of the current model, taking the integrated model as the model to be used.

The model to be used in the embodiment of the present application may also be used in a Class-name guidance Entity Selection (Class-Guided Entity Selection) module of CGExpan. Wherein, the candidate Entity score of the Class name guide Entity Selection (Class-Guided Entity Selection) module of CGExpan is defined as follows:

wherein i is the index of the candidate entity in the entity vocabulary,

the association of the token with the guide name,

similarity is represented for the embedding of the candidate entity and the current entity set.

Wherein the embedding represents similarity as follows:

wherein, E _s Is a number of entities, V, randomly selected from a current set of entities _e Is the average BERT context word embedding of entity e, cos (·) refers to cosine similarity.

In the embodiment of the present application, can be used

Substitute for

Fusing a model to be used with a Class name guidance Entity Selection (Class-Guided Entity Selection) module of CGExpan, wherein a frame obtained after fusing is as follows:

by the above, in the application, when the model is trained, the positive entity and the negative entity can be separated more accurately, the separated boundary can be better determined, and the training effect of the model is improved. Therefore, the accuracy of the model to be used in use is improved.

In order to better implement the above method, an embodiment of the present application further provides a processing apparatus of a model, where the processing apparatus of the model may be specifically integrated in an electronic device, and the electronic device may be a terminal, a server, or other devices. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and other devices; the server may be a single server or a server cluster composed of a plurality of servers.

For example, in this embodiment, the method of the embodiment of the present application will be described in detail by taking an example in which a processing device of a model is specifically integrated in a terminal.

As shown in fig. 2b, the flow diagram of another model processing method is as follows:

the method comprises the following steps of 'learning of a masking entity task prediction task', training a pre-training model through a preset entity set, and accordingly adjusting parameters of the pre-training model.

Model selection and model integration, namely, selecting and integrating a pre-training model trained by shielding entity task prediction task learning according to the using effect to obtain an integrated model.

And the entity set expansion framework is used for sequencing, expanding and reordering the expanded set so as to obtain an expanded set after expansion and sequencing.

The method comprises the steps of 'shielding entity prediction task learning combined with contrast learning', classifying entities in an expansion set to obtain positive entities and negative entities, and obtaining contrast loss according to the positive entities and the negative entities, wherein when an integrated model represented by the contrast loss is not converged, iterative training of 'shielding entity task prediction task learning' is conducted on the integrated model again through 'initial model parameters', and when the integrated model represented by the contrast loss is converged, model selection and model integration can be conducted on the integrated model to obtain an 'entity-level pre-training model'.

For example, as shown in fig. 3, the processing device of the model may include an obtaining unit 301, an employing unit 302, a first obtaining unit 303, a second obtaining unit 304, a classifying unit 305, a third obtaining unit 306, a returning unit 307, and an ending training unit 308, as follows:

the acquisition unit 301:

the obtaining unit 301 may obtain a last model including an initial neural network model, a training sample set, and an extended sample set.

In some embodiments, the obtaining unit 301 is further specifically configured to:

and according to the expanded prediction confidence, selecting a selected entity with the expanded prediction confidence larger than a preset expansion threshold from the expanded sample set.

In some embodiments, there are N previous models, where N is an integer greater than or equal to 2, and the obtaining unit 301 is further specifically configured to:

and screening and integrating the N models to be selected to obtain an integrated model, wherein the integrated model is used as the current model.

inputting the training sample set to the model to be selected to obtain a prediction output result of the model to be selected, wherein the training sample set comprises a text statement, and the text statement comprises an entity and a label corresponding to the entity;

acquiring a cross entropy loss function corresponding to the label;

calculating the cross entropy loss of each model to be selected according to the cross entropy loss function;

screening N screening models from the N models to be selected according to cross entropy loss, wherein N is a positive integer less than or equal to N;

and carrying out integration processing on the n screening models to obtain an integrated model.

acquiring input parameter sets of n screening models;

(II) adopting a unit 302:

the applying unit 302 may apply the training sample set to train the previous model, so as to obtain a current model.

In some embodiments, the training sample set includes a text sentence, where the text sentence includes an entity and a label corresponding to the entity, and the employing unit 302 is specifically configured to:

performing mask processing on the entity to obtain a masked text statement;

(third) the first obtaining unit 303:

the first obtaining unit 303 may input the entity in the extended sample set to the current model, and obtain a prediction result corresponding to the entity.

(fourth) second obtaining unit 304:

the second obtaining unit 304 may obtain the prediction confidence corresponding to the entity based on the prediction result corresponding to the entity.

(V) classification unit 305:

the classification unit 305 may classify the entity according to the prediction confidence corresponding to the entity, so as to obtain the type of the entity.

In some embodiments, the second obtaining unit 304 is further specifically configured to:

(sixth) third obtaining unit 306:

the third obtaining unit 306 may obtain the contrast loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity.

In some embodiments, the classification unit 305 is further specifically configured to:

constructing a contrast loss function model of the current model; determining the positive entity and the negative entity according to the entity pair;

obtaining the contrast loss of each entity pair;

(seventh) return unit 307:

the returning unit 307 may return and perform the steps of obtaining a last model, a training sample set, and an extended sample set when the contrast loss is not greater than the preset threshold, where the last model includes an initial neural network model.

(eight) end training unit 308:

the end training unit 308 may end training when the contrast loss is not greater than a preset threshold.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, the processing apparatus of the model of this embodiment includes an obtaining unit, configured to obtain a previous model, a training sample set, and an extended sample set, where the previous model includes an initial neural network model; a first obtaining unit, configured to input an entity in the extended sample set to the current model, and obtain a prediction result corresponding to the entity; a second obtaining unit, configured to obtain, based on the prediction result corresponding to the entity, a prediction confidence corresponding to the entity; the classification unit is used for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity; a third obtaining unit, configured to obtain a contrast loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive-type entity, and the negative entity is a negative-type entity; the return unit is used for returning and executing the steps to obtain a previous model, a training sample set and an expansion sample set when the contrast loss is not greater than a preset threshold value, wherein the previous model comprises an initial neural network model; and the training ending unit is used for ending the training when the contrast loss is not greater than a preset threshold value. Therefore, the accuracy of the model to be used in use is improved.

The embodiment of the application also provides the electronic equipment which can be equipment such as a terminal and a server. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and the like; the server may be a single server, a server cluster composed of a plurality of servers, or the like.

In some embodiments, the processing apparatus of the model may also be integrated in a plurality of electronic devices, for example, the processing apparatus of the model may be integrated in a plurality of servers, and the processing method of the model of the present application is implemented by the plurality of servers.

In this embodiment, a detailed description will be given by taking the electronic device of this embodiment as an example of a terminal, for example, as shown in fig. 4, which shows a schematic structural diagram of the terminal according to the embodiment of the present application, specifically:

the terminal may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, an input module 404, and a communication module 405. Those skilled in the art will appreciate that the terminal configuration shown in fig. 4 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall detection of the terminal. In some embodiments, processor 401 may include one or more processing cores; in some embodiments, processor 401 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The terminal also includes a power supply 403 for powering the various components, and in some embodiments, the power supply 403 may be logically coupled to the processor 401 via a power management system, such that the power management system may perform functions of managing charging, discharging, and power consumption. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The terminal may also include an input module 404, the input module 404 being operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

The terminal may also include a communication module 405, and in some embodiments the communication module 405 may include a wireless module, through which the terminal may wirelessly transmit over short distances, thereby providing wireless broadband internet access to the user. For example, the communication module 405 may be used to assist a user in sending and receiving e-mails, browsing web pages, accessing streaming media, and the like.

Although not shown, the terminal may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 401 in the terminal loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:

obtaining a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model;

and when the contrast loss is not greater than a preset threshold value, finishing the training.

In some embodiments, a computer program product is also proposed, which comprises a computer program or instructions that, when executed by a processor, implement the steps in the processing method of any of the models described above.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in the processing method of any one of the models provided in the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in the processing method of any model provided in the embodiment of the present application, the beneficial effects that can be achieved by the processing method of any model provided in the embodiment of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The foregoing describes in detail a model processing method, apparatus, electronic device, and storage medium provided in an embodiment of the present application, and a specific example is applied in the present application to explain the principle and implementation of the present application, and the description of the foregoing embodiment is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of model processing, comprising:

2. The method of claim 1, wherein the set of training samples comprises a textual statement comprising an entity and a label corresponding to the entity;

the training the previous model by using the training sample set to obtain a current model includes:

performing mask processing on the entity to obtain a masked text statement;

3. The method of claim 1, wherein obtaining the extended sample set comprises:

4. The method of claim 3, wherein the performing entity expansion processing on the expanded sample set to obtain an expanded sample set comprises:

5. The method of claim 4, wherein the selecting the expanded entity in the expanded sample set to obtain a selected entity comprises:

6. The method of claim 5, wherein the selecting, according to the extended prediction confidence, the selected entity from the extended sample set whose extended prediction confidence is greater than a preset extension threshold comprises:

7. The method of claim 1, wherein the classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity comprises:

according to the prediction confidence, entity sorting processing is carried out on the extended sample set to obtain a sorted entity sequence set;

8. The method of claim 1, wherein the deriving the contrast loss of the current model from the entity pair comprises:

9. The method of claim 8, wherein there are at least two of the entity pairs;

the obtaining of the contrast loss of the current model according to the entity pair includes:

obtaining the contrast loss of each entity pair;

10. The method of claim 1, wherein there are N of said previous patterns, said N being an integer greater than or equal to 2;

11. The method according to claim 10, wherein the screening and integrating the N candidate models to obtain an integrated language model comprises:

acquiring a cross entropy loss function corresponding to the label;

12. The method of claim 10, wherein the integrating the n screening models to obtain an integrated model comprises:

acquiring n input parameter sets of the screening model;

acquiring an output average value according to an input parameter set, wherein the output average value is the average value of output results of the n screening models;

13. An apparatus for processing a model, comprising:

the adoption unit is used for adopting the training sample set to train the previous model to obtain a current model;

14. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions; the processor loads instructions from the memory to perform the steps in the method of processing a model according to any one of claims 1 to 12.

15. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the method of processing a model according to any one of claims 1 to 12.

16. A computer program product comprising a computer program or instructions, characterized in that said computer program or instructions, when executed by a processor, implement the steps in the method of processing a model according to any one of claims 1 to 12.