CN118051768A - Model training method, label generating method, device, equipment and medium - Google Patents

Model training method, label generating method, device, equipment and medium Download PDF

Info

Publication number
CN118051768A
CN118051768A CN202211429424.0A CN202211429424A CN118051768A CN 118051768 A CN118051768 A CN 118051768A CN 202211429424 A CN202211429424 A CN 202211429424A CN 118051768 A CN118051768 A CN 118051768A
Authority
CN
China
Prior art keywords
text
training
tag
vector
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211429424.0A
Other languages
Chinese (zh)
Inventor
张乐中
汪力
方俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211429424.0A priority Critical patent/CN118051768A/en
Publication of CN118051768A publication Critical patent/CN118051768A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure discloses a model training method, a label generating method, a device, equipment and a medium. One embodiment of the method comprises the following steps: determining a positive sample group corresponding to each training text; for each training text in the training text set, performing a first predictor generating step: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set; for each remaining training text, determining a first predicted value; and performing primary training on the initial text coding model and the initial label coding model according to the plurality of first predicted value group sets, the positive sample group sets and the negative sample labels. This embodiment is related to artificial intelligence, and more accurate text labels can be screened out using the text coding model after the primary training and the label coding model after the primary training.

Description

Model training method, label generating method, device, equipment and medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a model training method, a label generating device, equipment and a medium.
Background
Currently, text and tag matching is an important development direction in the field of natural language processing. The purpose of text-to-tag matching is to perform similarity computation by text semantics to obtain tags that characterize important text features of the text. For the determination of matching tags, the following is generally adopted: first, each tag in the tag set is input to a double-coding (Bi-Encoder) model along with matching text to generate a similarity. And selecting labels with the similarity at the front target number from the label set to be used as a matched label set of the matched text.
However, the inventors have found that when the above manner is used to determine matching tags, there are often the following technical problems:
the Bi-Encoder model cannot effectively learn the interactive information between the text and the tags, so that limitations exist in semantic expressions of the text and the tags, and therefore the screened matched tag sets are not accurate enough.
The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose a model training method, a label generating method, a device, an apparatus, and a medium to solve the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a model training method, comprising: determining a positive sample group corresponding to each training text in the training text set of the current batch, wherein the positive samples comprise: training text and text labels; for each training text in the training text set, executing a first predicted value generation step: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed from the training text sets; for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the set of sub-text vectors and each label vector in the corresponding label vector group, wherein the first negative sample comprises: the training sample corresponds to the text label with the label vector; and performing primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets to obtain a text coding model after primary training and a tag coding model after primary training.
Optionally, determining the first predicted value of each label vector corresponding to the first negative sample according to the similarity set between the set of sub-text vectors and each label vector in the corresponding label vector group includes: and screening out the similarity meeting the preset condition from the similarity set corresponding to the label vector, and taking the similarity as a first predicted value of a first negative sample corresponding to the label vector.
Optionally, the performing initial training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value sets, the obtained positive sample set and the negative sample tags corresponding to the obtained multiple first negative sample sets to obtain a text coding model after initial training and a tag coding model after initial training includes: correspondingly inputting the plurality of first predicted value group sets and the negative sample label into a target loss function to obtain a plurality of first loss value group sets; generating a second loss value corresponding to each positive sample in the positive sample group set according to the initial text coding model and the initial label coding model to obtain a second loss value group set; and performing primary training on the initial text coding model and the initial tag coding model according to the first loss value group sets and the second loss value group sets to obtain a text coding model after primary training and a tag coding model after primary training.
Optionally, the method further comprises: inputting a text label group set corresponding to the training text set into the label coding model after the initial training to obtain a training label vector group set; updating a pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue, wherein the tag vector queue comprises: a set of tag vector groups for a plurality of historical batches.
Optionally, the method further comprises: responding to the fact that the updated label vector in the updated label vector queue meets a preset convergence condition, and acquiring a training text set of the next batch as a target training text set; determining a positive sample group corresponding to each target training text in the target training text set as a target positive sample group to obtain a target positive sample group set; and taking the label coding model after initial training as an initial label coding model, taking the text coding model after initial training as an initial text coding model, and executing the first predicted value generation step aiming at the target training text set according to the target positive sample set to obtain a plurality of second predicted value set.
Optionally, the method further comprises: for each target training text in the target training text set, executing a second predicted value generating step: performing text division on the target training text to generate a target sub-text set; inputting the target sub-text set into an initial text coding model to obtain a target sub-text vector set; for each updated label vector in the updated label vector queue, determining the similarity between the target sub-text vector set and the updated label vector as target similarity, and obtaining a target similarity set; for each update tag vector in the update tag vector queue, determining a similarity in the set of object similarities corresponding to the update tag vector, which satisfies a predetermined condition, as a third predicted value of the update tag vector corresponding to a second negative sample, wherein the update tag vector corresponding to the second negative sample includes: the target training text corresponds to the text label with the updated label vector; and retraining the initial text coding model and the initial label coding model according to the obtained third predicted value group set, the target positive sample group set, the plurality of second predicted value group sets and the negative sample labels corresponding to the obtained second negative sample group set to obtain a retrained text coding model and a retrained label coding model.
Optionally, retraining the initial text coding model and the initial label coding model according to the obtained third predicted value set, the target positive sample set, the plurality of second predicted value sets, and the negative sample labels corresponding to the obtained second negative sample set to obtain a retrained text coding model and a retrained label coding model, including: generating a plurality of third loss value group sets according to the plurality of second prediction value group sets and the corresponding negative sample labels; removing a third predicted value with the predicted value smaller than or equal to a target threshold value from the third predicted value group set to obtain a removed third predicted value group set; generating a fourth loss value group set according to the removed third predicted value group set and the corresponding negative sample label; and retraining the initial text coding model and the initial tag coding model according to the third loss value group sets and the fourth loss value group sets to obtain a retrained text coding model and a retrained tag coding model.
Optionally, updating the pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue, including: removing the tag vector groups with the addition time meeting the preset time condition from each tag vector group set included in the tag vector queue to obtain a removed tag vector queue; and adding the training tag vector group set serving as a tag vector group set into the removed tag vector queue to obtain an updated tag vector queue.
In a second aspect, some embodiments of the present disclosure provide a model training apparatus comprising: a first determining unit configured to determine a positive sample group corresponding to each training text in the training text set of the current batch, wherein the positive samples include: training text and text labels; an execution unit configured to execute, for each training text in the training text set, a first predicted value generation step of: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed from the training text sets; for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the set of sub-text vectors and each label vector in the corresponding label vector group, wherein the first negative sample comprises: the training sample corresponds to the text label with the label vector; the training unit is configured to perform primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets, so as to obtain a text coding model after primary training and a tag coding model after primary training.
Alternatively, the execution unit may be configured to: and screening out the similarity meeting the preset condition from the similarity set corresponding to the label vector, and taking the similarity as a first predicted value of a first negative sample corresponding to the label vector.
Alternatively, the training unit may be configured to: correspondingly inputting the plurality of first predicted value group sets and the negative sample label into a target loss function to obtain a plurality of first loss value group sets; generating a second loss value corresponding to each positive sample in the positive sample group set according to the initial text coding model and the initial label coding model to obtain a second loss value group set; and performing primary training on the initial text coding model and the initial tag coding model according to the first loss value group sets and the second loss value group sets to obtain a text coding model after primary training and a tag coding model after primary training.
Optionally, the apparatus further includes: inputting a text label group set corresponding to the training text set into the label coding model after the initial training to obtain a training label vector group set; updating a pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue, wherein the tag vector queue comprises: a set of tag vector groups for a plurality of historical batches.
Optionally, the apparatus further includes: responding to the fact that the updated label vector in the updated label vector queue meets a preset convergence condition, and acquiring a training text set of the next batch as a target training text set; determining a positive sample group corresponding to each target training text in the target training text set as a target positive sample group to obtain a target positive sample group set; and taking the label coding model after initial training as an initial label coding model, taking the text coding model after initial training as an initial text coding model, and executing the first predicted value generation step aiming at the target training text set according to the target positive sample set to obtain a plurality of second predicted value set.
Optionally, the apparatus further includes: for each target training text in the target training text set, executing a second predicted value generating step: performing text division on the target training text to generate a target sub-text set; inputting the target sub-text set into an initial text coding model to obtain a target sub-text vector set; for each updated label vector in the updated label vector queue, determining the similarity between the target sub-text vector set and the updated label vector as target similarity, and obtaining a target similarity set; for each update tag vector in the update tag vector queue, determining a similarity in the set of object similarities corresponding to the update tag vector, which satisfies a predetermined condition, as a third predicted value of the update tag vector corresponding to a second negative sample, wherein the update tag vector corresponding to the second negative sample includes: the target training text corresponds to the text label with the updated label vector; and retraining the initial text coding model and the initial label coding model according to the obtained third predicted value group set, the target positive sample group set, the plurality of second predicted value group sets and the negative sample labels corresponding to the obtained second negative sample group set to obtain a retrained text coding model and a retrained label coding model.
Optionally, the apparatus further includes: generating a plurality of third loss value group sets according to the plurality of second prediction value group sets and the corresponding negative sample labels; removing a third predicted value with the predicted value smaller than or equal to a target threshold value from the third predicted value group set to obtain a removed third predicted value group set; generating a fourth loss value group set according to the removed third predicted value group set and the corresponding negative sample label; and retraining the initial text coding model and the initial tag coding model according to the third loss value group sets and the fourth loss value group sets to obtain a retrained text coding model and a retrained tag coding model.
Optionally, the apparatus further includes: removing the tag vector groups with the addition time meeting the preset time condition from each tag vector group set included in the tag vector queue to obtain a removed tag vector queue; and adding the training tag vector group set serving as a tag vector group set into the removed tag vector queue to obtain an updated tag vector queue.
In a third aspect, some embodiments of the present disclosure provide a tag generation method, including: acquiring a target text; inputting the target text into a pre-trained text encoding model to generate a target text vector, wherein the text encoding model is generated based on the method of the first aspect; searching a target number of target tag vectors from a tag vector database, wherein the target tag vectors are vectors which meet a preset association relation between each tag vector and the target text vector, each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated based on the method of the first aspect; and determining the tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
In a fourth aspect, some embodiments of the present disclosure provide a tag generating apparatus including: an acquisition unit configured to acquire a target text; an input unit configured to input the target text to a pre-trained text encoding model to generate a target text vector, wherein the text encoding model is generated based on the method of the first aspect; the searching unit is configured to search a target number of target tag vectors from a tag vector database, wherein the target tag vectors are vectors which meet a preset association relation with the target text vectors in the tag vectors, each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated based on the method of the first aspect; and a second determining unit configured to determine a tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
In a fifth aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first and third aspects.
In a sixth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as described in any of the implementations of the first and third aspects.
In a seventh aspect, some embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method described in any one of the implementations of the first and third aspects above.
The above embodiments of the present disclosure have the following advantageous effects: by the model training method of some embodiments of the present disclosure, more accurate text labels can be screened out by using the text coding model after initial training and the label coding model after initial training. Specifically, the reason for the inability to accurately screen more accurate text labels is: the Bi-Encoder model cannot learn the interactive information between the text and the tag effectively, so that there is a limitation in the semantic expression of the text and the tag. Thus, the set of matching tags screened out is not accurate enough. Based on this, the model training method of some embodiments of the present disclosure first obtains a positive sample group corresponding to each training text in the training text set that determines the current batch. Wherein the positive samples include: training text and text labels. The obtained training text set is used for training a subsequent text coding model and a label coding model. In addition, the training text set and the corresponding text label set of the training text set are used for generating each negative sample to realize the training of the subsequent text coding model and the label coding model. Then, for each training text in the training text set, performing a first predicted value generation step: first, the training texts are subjected to text division to generate a sub-text set. Here, the training text is divided into texts for the subsequent generation of negative samples corresponding to the training text for each sub-text. And secondly, inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set. The rest training text sets are text sets in which the training texts are removed from the training text sets. Here, each text in the sub-text set and each tag in the text tag set are encoded for subsequent determination of the association between the sub-text and the text tag. And thirdly, for each of the rest training texts in the rest training text sets, determining a first predicted value of each label vector corresponding to the first negative sample according to a similarity set between the sub-text vector set and each label vector in the corresponding label vector group. Wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels. Here, the association relationship between each sub-text and the text label corresponding to the label vector can be effectively determined through the similarity set between each sub-text vector and each label vector. Therefore, the association relation between the training text and the text label is determined based on the association relation between each sub-text and the corresponding text label, and the interactive relation information between the text and the rest text labels is added from the aspect of the local characteristics (namely the sub-text characteristics) of the training text. The expression capability of the text coding model and the label coding model is greatly enriched. And finally, performing primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets to obtain a text coding model with more accurate text coding after primary training and a tag coding model with more accurate tag coding after primary training.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
1-2 Are schematic diagrams of one application scenario of a model training method according to some embodiments of the present disclosure;
FIG. 3 is a flow chart of some embodiments of a model training method according to the present disclosure;
FIG. 4 is a flow chart of further embodiments of a model training method according to the present disclosure;
FIG. 5 is a flow chart of some embodiments of a tag generation method according to the present disclosure;
FIG. 6 is a schematic structural view of some embodiments of a model training apparatus according to the present disclosure;
FIG. 7 is a schematic structural view of some embodiments of a label generating apparatus according to the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1-2 are schematic diagrams of one application scenario of a model training method according to some embodiments of the present disclosure.
In the application scenario of fig. 1-2, the electronic device 101 may determine a positive sample set corresponding to each training text in the training text set 102 of the current batch. Wherein the positive samples include: training text and text labels. In the present application scenario, training text set 102 includes: training text 1021 and training text 1022. The training text 1021 may be "shock absorbing, braking of bicycle". Training text 1022 may be "bike is somewhat heavy, not very handy, and cost effective. Training text 1021 corresponds to a positive sample group comprising: positive samples 1031 and positive samples 1032. Training text 1022 corresponds to a positive sample set including: positive samples 1033 and positive samples 1034. The positive sample 1031 may be "" bicycle shock absorption "", brake good "" ". The positive sample 1032 may be "" bicycle shock absorbing "", brake well "". Positive sample 1033 may be "" bike somewhat heavy, less handy, cost effective "" | "weight bias" ". The positive sample 1034 may be "" the bicycle is somewhat heavy, not too handy, cost performance low "". Then, for each training text in the training text set, performing a first predicted value generation step: first, the training texts are subjected to text division to generate a sub-text set. And secondly, inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set. The rest training text sets are text sets in which the training texts are removed from the training text sets. And thirdly, for each of the rest training texts in the rest training text sets, determining a first predicted value of each label vector corresponding to the first negative sample according to a similarity set between the sub-text vector set and each label vector in the corresponding label vector group. Wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels. In the present application scenario, the training text is a training text 1021, and in a first step, the training text is text-divided to generate the sub-text set 114. The sub-text set 114 may include: a sub-text 1141 and a sub-text 1142. The sub-text 1141 may be "bicycle shock absorbing". The sub-text 1142 may be "brake good". In the second step, the above-mentioned sub-text set 114 is input into the initial text coding model 110 to obtain a sub-text vector set 115, and the text label set corresponding to the rest of the training text sets is input into the initial label coding model 111 to obtain a label vector set. The set of sub-text vectors 115 includes: a sub-text vector 1151 corresponding to the sub-text 1141, a sub-text vector 1152 corresponding to the sub-text 1142. The remaining training text sets are training text 1022. The text label group 116 corresponding to the training text 1022 includes: text label 1161 and text label 1162. Text label 1161 may be "weight biased". Text label 1162 may be "cost effective". The set of label vectors 117 corresponding to the set of text labels 116 includes: a sample vector 1171 corresponding to text label 1161, a sample vector 1172 corresponding to text label 1162. Third, according to the similarity set between the sub-text vector set 115 and each of the label vectors in the corresponding label vector set 117, a first predicted value of each label vector corresponding to the first negative sample is determined. Wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels. For a tag vector of 1171, the set of similarities 118 includes: similarity 1181 between sub-text vector 1151 and label vector 1171, and similarity 1182 between sub-text vector 1152 and label vector 1171. The similarity 1181 may be "0.6". The similarity 1182 may be "0.3". For a label vector of 1172, the set of similarities 119 includes: similarity 1191 between sub-text vector 1151 and label vector 1172, and similarity 1192 between sub-text vector 1152 and label vector 1172. Similarity 1191 may be "0.4". Similarity 1192 may be "0.7". The first predictor corresponding to tag vector 1171 may be first predictor 1051. The first predictor corresponding to tag vector 1172 may be first predictor 1052. Finally, the electronic device 101 may perform initial training on the initial text encoding model 110 and the initial tag encoding model 111 according to the obtained plurality of first predicted value sets 104, the obtained positive sample set 103, and the obtained negative sample tags 109 corresponding to the plurality of first negative sample sets, so as to obtain an initial trained text encoding model 112 and an initial trained tag encoding model 113. In the present application scenario, the plurality of first predictor group sets 104 includes: a first set of predictors 105 for training text 1021, a first set of predictors 106 for training text 1021, a first set of predictors 107 for training text 1022, and a first set of predictors 108 for training text 1022. The first predictor set 105 may include: first predicted value 1051 and first predicted value 1052. The first predicted value 1051 may be "0.6". The first predicted value 1052 may be "0.7". The first set of predictors 106 may include: a first predicted value 1061 and a first predicted value 1062. The first predicted value 1061 may be "0.3". The first predicted value 1062 may be "0.4". The first predictor set 107 may include: a first predicted value 1071 and a first predicted value 1072. The first predicted value 1071 may be "0.5". The first predicted value 1072 may be "0.45". The first set of predictors 108 may include: a first predicted value 1081 and a first predicted value 1082. The first prediction value 1081 may be "0.7". The first prediction value 1082 may be "0.9". Negative sample label 109 may be "0".
The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware device. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of electronic devices in fig. 1-2 is merely illustrative. There may be any number of electronic devices as desired for an implementation.
With continued reference to fig. 3, a flow 300 of some embodiments of a model training method according to the present disclosure is shown. The model training method comprises the following steps:
step 301, determining a positive sample group corresponding to each training text in the training text set of the current batch.
In some embodiments, the subject of execution of the model training method described above (e.g., the electronic device 101 shown in fig. 1) may determine a positive sample set for each training text in the training text set of the current batch. Wherein the positive samples include: training text and text labels. The training text set of the current batch (batch) may be a sample set of the tag coding model and the text coding model to be currently model trained, where there may be multiple batches of training text sets to train for training of the tag coding model and the text coding model. The tag coding model and the text coding model adjust the model parameters once for each batch of training sample sets. For example, there may be 10 epochs for training of the tag coding model and the text coding model. Each epoch is divided into 10 batches (batches). The current lot may be lot 3 of the 3 rd epoch. The training text may be long text of various fields. For example, the training text may be a long text associated with a cell phone. Positive samples may be samples of training text and text labels. The text labels in the positive samples may be labels that match the training samples. For the case that at least one text label corresponds to the training text, the execution body may combine each text label with the training text, and may obtain a positive sample group.
It should be noted that, the text label may be a label that represents a feature related to an object in the text. For example, text is "height is high". The text label may be "height high". The object features involved in the text labels are height features.
Step 302, for each training text in the training text set, performing a first predicted value generating step:
In step 3021, the training text is text-divided to generate a sub-text set.
In some embodiments, the execution body may perform text division on the training text to generate a sub-text set.
As an example, the execution body may perform sentence breaking processing on the training text to obtain each sentence, and use each sentence as a sub-text to obtain a sub-text set.
Step 3022, inputting the above-mentioned sub-text set into the initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of the training text set into the initial label coding model to obtain a label vector set.
In some embodiments, the executing entity may input the sub-text set into an initial text encoding model to obtain a sub-text vector set, and input the text label set corresponding to the rest of the training text set into the initial label encoding model to obtain a label vector set. The rest training text sets are text sets in which the training texts are removed from the training text sets. Since each training text has a corresponding positive sample set and each positive sample has a corresponding text label, each training text has a corresponding text label set and the remaining training text sets have corresponding text label set sets. The initial text encoding model may be a text encoding model that has not been trained to end. The text encoding model may be a model that vector encodes text. For example, the text encoding model may be one of: a Bert coding model, a transducer coding model. The initial tag coding model may be a tag coding model that has not been trained to end. The tag coding model may be a model that vector codes the tag. For example, the tag encoding model may be one of: a Bert coding model, a transducer coding model. In practice, no parameters may be shared between the tag coding model and the text coding model. The sub-text vectors may characterize feature information of the corresponding sub-text. The tag vector may characterize the characteristic information of the corresponding tag.
Step 3023, for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to the first negative sample according to the similarity set between the sub-text vector set and each label vector in the corresponding label vector set.
In some embodiments, the executing entity may determine, for each of the remaining training texts in the remaining training text set, a first predicted value of each of the label vectors corresponding to the first negative sample according to a similarity set between the set of sub-text vectors and each of the label vectors in the corresponding label vector group. Wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels.
In practice, the execution subject may determine the cosine distance between the sub-text vector and the tag vector as the similarity.
As an example, for each tag vector in the tag vector group described above, the following predicted value generation step is performed:
First, a similarity set corresponding to the tag vector is determined.
Removing the similarity with the value smaller than the preset value from the similarity set to obtain a removed similarity set;
And thirdly, determining an average value of the similarities included in the removed similarity set as a first predicted value for the tag vector.
In some optional implementations of some embodiments, a similarity satisfying a preset condition is selected from the similarity set corresponding to the tag vector, and the selected similarity is used as the first predicted value of the first negative sample corresponding to the tag vector.
The preset condition may be that the similarity is the maximum value in the similarity set.
And 303, performing primary training on the initial text coding model and the initial label coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample labels corresponding to the multiple first negative sample group sets, so as to obtain a text coding model after primary training and a label coding model after primary training.
In some embodiments, the executing body may perform initial training on the initial text encoding model and the initial tag encoding model according to the obtained multiple first predicted value sets, the obtained positive sample set, and the negative sample tags corresponding to the obtained multiple first negative sample set, so as to obtain the text encoding model after initial training and the tag encoding model after initial training. The text coding model after initial training is a model with updated model parameters in the initial text coding model. The label coding model after the initial training is a model with updated model parameters in the initial label coding model.
As an example, first, the execution body may screen out a first predicted value having a predicted value greater than a predetermined threshold value from a plurality of first predicted value group sets, and obtain a first target predicted value set as the first target predicted value. The predetermined threshold may be a preset value. For example, the predetermined threshold may be 0.6. And then, correspondingly inputting the first target predicted value set and the negative sample label into a target loss function to obtain a fifth loss value set. And then, summing the fifth loss values included in the fifth loss value set to obtain a summed loss value. And then, generating a second loss value corresponding to each positive sample in the positive sample group according to the initial text coding model and the initial label coding model, and obtaining a second loss value group set. Further, each second loss value in the second loss value group is subjected to weighted summation processing, and a weighted summation loss value is obtained. Further, the sum loss value and the weighted sum loss value are added to each other to obtain an added loss value. And finally, reversely updating the model parameters in the initial text coding model and the model parameters in the initial tag coding model according to the added loss value to obtain a text coding model after primary training and a tag coding model after primary training.
In some optional implementations of some embodiments, the performing initial training on the initial text coding model and the initial tag coding model according to the obtained plurality of first predicted value sets, the obtained positive sample set, and the negative sample tags corresponding to the obtained plurality of first negative sample sets to obtain an initial trained text coding model and an initial trained tag coding model may include the following steps:
The first step is to input the first predicted value group sets and the negative sample labels to the target loss function correspondingly to obtain the first loss value group sets.
For example, the target loss function may be a circle loss function.
And a second step of generating a second loss value corresponding to each positive sample in the positive sample group set according to the initial text coding model and the initial label coding model to obtain a second loss value group set.
For example, for each positive sample in the positive sample group set, first, the execution body may input training text included in the positive sample into an initial text encoding model to obtain a text vector. Then, the execution body may input the text label included in the positive sample to the initial label coding model to obtain a text label vector. The execution body may then determine a vector similarity between the text vector and the text label vector. And finally, inputting the vector similarity and the sample true value corresponding to the positive sample into a target loss function to obtain a second loss value.
And thirdly, performing primary training on the initial text coding model and the initial tag coding model according to the first loss value group sets and the second loss value group sets to obtain a text coding model after primary training and a tag coding model after primary training.
As an example, first loss values included in a plurality of first loss value group sets are averaged and processed to obtain first averaged and loss values. Then, the second loss values included in the second loss value group set are averaged and processed to obtain second averaged and loss values. Then, the first average sum-loss value and the second average sum-loss value are added to obtain a first added-loss value. And finally, according to the first addition loss value, reversely updating the model parameters in the initial text coding model and the model parameters in the initial tag coding model to obtain a text coding model after initial training and a tag coding model after initial training.
The above embodiments of the present disclosure have the following advantageous effects: by the model training method of some embodiments of the present disclosure, more accurate text labels can be screened out by using the text coding model after initial training and the label coding model after initial training. Specifically, the reason for the inability to accurately screen more accurate text labels is: the Bi-Encoder model cannot learn the interactive information between the text and the tag effectively, so that there is a limitation in the semantic expression of the text and the tag. Thus, the set of matching tags screened out is not accurate enough. Based on this, the model training method of some embodiments of the present disclosure first obtains a positive sample group corresponding to each training text in the training text set that determines the current batch. Wherein the positive samples include: training text and text labels. The obtained training text set is used for training a subsequent text coding model and a label coding model. In addition, the training text set and the corresponding text label set of the training text set are used for generating each negative sample to realize the training of the subsequent text coding model and the label coding model. Then, for each training text in the training text set, performing a first predicted value generation step: first, the training texts are subjected to text division to generate a sub-text set. Here, the training text is divided into texts for the subsequent generation of negative samples corresponding to the training text for each sub-text. And secondly, inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set. The rest training text sets are text sets in which the training texts are removed from the training text sets. Here, each text in the sub-text set and each tag in the text tag set are encoded for subsequent determination of the association between the sub-text and the text tag. And thirdly, for each of the rest training texts in the rest training text sets, determining a first predicted value of each label vector corresponding to the first negative sample according to a similarity set between the sub-text vector set and each label vector in the corresponding label vector group. Wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels. Here, the association relationship between each sub-text and the text label corresponding to the label vector can be effectively determined through the similarity set between each sub-text vector and each label vector. Therefore, the association relation between the training text and the text label is determined based on the association relation between each sub-text and the corresponding text label, and the interactive relation information between the text and the rest text labels is added from the aspect of the local characteristics (namely the sub-text characteristics) of the training text. The expression capability of the text coding model and the label coding model is greatly enriched. And finally, performing primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets to obtain a text coding model with more accurate text coding after primary training and a tag coding model with more accurate tag coding after primary training.
With further reference to fig. 4, a flow 400 of further embodiments of a model training method according to the present disclosure is shown. The model training method comprises the following steps:
Step 401, determining a positive sample group corresponding to each training text in the training text set of the current batch, where the positive samples include: training text and text labels.
Step 402, for each training text in the training text set, performing a first predicted value generating step:
in step 4021, the training text is text-divided to generate a set of sub-texts.
Step 4022, inputting the above-mentioned sub-text set into the initial text coding model to obtain a sub-text vector set, and inputting the text label set corresponding to the rest of the training text set into the initial label coding model to obtain a label vector set.
Step 4023, for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to the first negative sample according to a similarity set between the set of sub-text vectors and each label vector in the corresponding label vector set.
Step 403, performing primary training on the initial text coding model and the initial label coding model according to the obtained multiple first predicted value sets, the obtained positive sample set and the obtained negative sample labels corresponding to the multiple first negative sample sets, so as to obtain a text coding model after primary training and a label coding model after primary training.
In some embodiments, the specific implementation of steps 401-403 and the technical effects thereof may refer to steps 301-303 in the corresponding embodiment of fig. 3, which are not described herein.
And step 404, inputting the text label group set corresponding to the training text set into the label coding model after the initial training to obtain a training label vector group set.
In some embodiments, the executing body (e.g., the electronic device 101 shown in fig. 1) may input the text label set corresponding to the training text set into the label coding model after the initial training, to obtain a training label vector set.
Step 405, updating the pre-stored tag vector queue according to the training tag vector group set, to obtain an updated tag vector queue.
In some embodiments, the execution body may update a pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue. Wherein, the tag vector queue includes: a set of tag vector groups for a plurality of historical batches. The tag vector queue may be a vector queue stored in a memory. Each set of tag vector sets has corresponding historical lot information.
As an example, the execution body may add the training tag vector group set to the tag vector queue to obtain an added queue as an updated tag vector queue.
In some optional implementations of some embodiments, updating the tag vector queue stored in the memory according to the training tag vector group set to obtain an updated tag vector queue may include the following steps:
And a first step of removing the tag vector group with the joining time meeting the preset time condition from each tag vector group set included in the tag vector queue, and obtaining a removed tag vector queue.
The predetermined time condition may be that the tag vector group is the vector group with the earliest joining time for the tag vector group set.
And secondly, taking the training tag vector group set as a tag vector group set, and adding the training tag vector group set into the removed tag vector queue to obtain an updated tag vector queue.
For example, the tag vector queue is { a set of tag vector groups corresponding to a first lot, a set of tag vector groups corresponding to a second lot, a set of tag vector groups corresponding to a third lot, a set of tag vector groups corresponding to a fourth lot }. The current lot is the fifth lot, and the updated tag vector queue may be { the set of tag vector sets corresponding to the second lot, the set of tag vector sets corresponding to the third lot, the set of tag vector sets corresponding to the fourth lot, the set of training tag vector sets }
In step 406, in response to determining that the updated tag vector in the updated tag vector queue meets a preset convergence condition, a training text set of a next batch is obtained as a target training text set.
In some embodiments, in response to determining that the updated tag vector in the updated tag vector queue meets a preset convergence condition, the executing entity may acquire a training text set of a next batch in a wired manner or a wireless manner as the target training text set. The preset convergence condition may be that each vector element of at least one updated label vector for the same text label in the updated label vector queue converges gradually.
Step 407, determining a positive sample group corresponding to each target training text in the target training text set as a target positive sample group, and obtaining a target positive sample group set.
In some embodiments, the executing body may determine a positive sample group corresponding to each target training text in the target training text set, and obtain a target positive sample group set as a target positive sample group. Wherein the target positive samples include: target training text and text labels.
Step 408, taking the label coding model after the initial training as an initial label coding model, taking the text coding model after the initial training as an initial text coding model, and executing the first predicted value generating step aiming at the target training text set according to the target positive sample set to obtain a plurality of second predicted value set.
In some embodiments, the executing body may use the tag coding model after the initial training as an initial tag coding model, and the text coding model after the initial training as an initial text coding model, and execute the first predicted value generating step for the target training text set according to the target positive sample set, so as to obtain a plurality of second predicted value sets. The specific implementation of the second predicted value may refer to the first predicted value generating step, which is not described herein.
Step 409, for each target training text in the target training text set, performing a second predicted value generating step:
Step 4091, performing text division on the target training text to generate a target sub-text set.
In some embodiments, the execution body may perform text division on the target training text to generate a target sub-text set.
As an example, the execution body may perform sentence breaking processing on the target training text to obtain each sentence, and use each sentence as the target sub-text to obtain the target sub-text set.
Step 4092, inputting the target sub-text set into the initial text coding model to obtain a target sub-text vector set.
In some embodiments, the execution body may input the target sub-text set into an initial text encoding model to obtain a target sub-text vector set.
Step 4093, for each updated label vector in the updated label vector queue, determining the similarity between the target sub-text vector set and the updated label vector as the target similarity, and obtaining a target similarity set.
In some embodiments, the execution body may determine, for each update tag vector in the update tag vector queue, a similarity between the target sub-text vector set and the update tag vector as a target similarity, to obtain a target similarity set.
As an example, the execution body may determine, for each update tag vector in the update tag vector queue, a cosine distance between the target sub-text vector set and the update tag vector as a target similarity, to obtain a target similarity set.
Step 4094, for each update label vector in the update label vector queue, determining a similarity in the update label vector corresponding to the target similarity set that satisfies a predetermined condition, as a third predicted value of the update label vector corresponding to the second negative sample.
In some embodiments, the execution body may determine, for each update tag vector in the update tag vector queue, a similarity in the update tag vector corresponding to the target similarity set that satisfies the predetermined condition, as the third predicted value of the update tag vector corresponding to the second negative sample. Wherein, the updating the label vector to correspond to the second negative sample includes: the target training text corresponds to the text label with the updated label vector. The specific implementation is not described here in detail.
And 410, retraining the initial text coding model and the initial label coding model according to the obtained third predicted value group set, the target positive sample group set, the plurality of second predicted value group sets and the negative sample labels corresponding to the obtained second negative sample group set to obtain a retrained text coding model and a retrained label coding model.
In some embodiments, the executing body may retrain the initial text encoding model and the initial tag encoding model according to the obtained third predicted value set, the target positive sample set, the plurality of second predicted value sets, and the negative sample tags corresponding to the obtained second negative sample set, to obtain a retrained text encoding model and a retrained tag encoding model.
As an example, first, the third predictor group set and the plurality of second predictor group sets are subjected to predictor fusion, to obtain a fused predictor set. And then, screening predicted values with the numerical values at the preset number of ranks from the fused predicted value set, and taking the predicted values as target predicted values to obtain a target predicted value set. Then, the target predicted value set and the negative sample label are correspondingly input into the target loss function, and a loss value set aiming at the target predicted value set is obtained as a sixth loss value set. Further, a seventh loss value corresponding to each target positive sample in the target positive sample group set is generated based on the initial text encoding model and the initial label encoding model, and a sixth loss value group set is obtained. Further, the seventh loss value group set and the sixth loss value set are subjected to weighted summation processing, and an eighth loss value is obtained. And finally, retraining the initial text coding model and the initial tag coding model according to the eighth loss value to obtain a retrained text coding model and a retrained tag coding model.
In some optional implementations of some embodiments, retraining the initial text encoding model and the initial tag encoding model according to the obtained third set of predicted values, the obtained target set of positive samples, the obtained plurality of second sets of predicted values, and the negative sample tags corresponding to the obtained second set of negative samples, to obtain a retrained text encoding model and a retrained tag encoding model may include the following steps:
and a first step of generating a plurality of third loss value group sets according to the plurality of second prediction value group sets and the corresponding negative sample labels.
As an example, the plurality of second predicted value group sets and the negative sample label are correspondingly input to a target loss function, so as to obtain a plurality of third loss value group sets.
And a second step of removing the third predicted value with the predicted value smaller than or equal to the target threshold value from the third predicted value group set to obtain a removed third predicted value group set.
The target threshold may be a preset threshold. For example, the target threshold may be 0.6. Each of the second negative samples in the second negative sample group corresponding to the removed third predicted value group set may be a difficult-to-case negative sample.
And thirdly, generating a fourth loss value group set according to the removed third predicted value group set and the corresponding negative sample label.
As an example, the execution body may correspondingly input the removed third predicted value set and the negative sample label to a target loss function, to obtain a fourth loss set.
And step four, retraining the initial text coding model and the initial tag coding model according to the third loss value group sets and the fourth loss value group sets to obtain a retrained text coding model and a retrained tag coding model.
As an example, first, third loss values included in a plurality of third loss value group sets are averaged and processed to obtain third averaged and loss values. Then, the fourth loss value included in the fourth loss group set is averaged and processed to obtain a fourth averaged and loss value. Then, the third average sum-loss value and the fourth average sum-loss value are added to each other to obtain a second added-loss value. And finally, reversely updating the model parameters in the initial text coding model and the model parameters in the initial tag coding model according to the second additive loss value to obtain a retrained text coding model and an initial retrained tag coding model.
As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 3, in the process 400 of the model training method in some embodiments corresponding to fig. 4, the updated label vector in the updated label vector queue satisfies the preset convergence condition, and the training sample set of the next batch may be obtained. For the training sample set of the next batch (i.e. the target training text set), not only the corresponding second predicted value set is generated by means of the first predicted value generating step, but also more predicted values (i.e. third predicted values) for the second negative sample of the target training text are generated by considering the corresponding relation between each target training text and the label vector in the label vector queue. In addition, since the number of the third predicted value group sets corresponding to the second negative sample group set is large, and the second negative samples corresponding to the third predicted values larger than the target threshold are often difficult negative samples, the third predicted values smaller than the target threshold are removed from the third predicted value group set corresponding to the target training text set. The text coding model and the label coding model are trained subsequently by using the removed third predicted value set, so that the training speed can be improved, model learning pressure caused by more negative samples is reduced, and the model can learn the characteristic information of more negative samples of difficult cases, thereby improving the accuracy of the trained model. Therefore, the text coding model and the label coding model are trained by the second predicted value set and the removed third predicted value set, so that the model can learn the characteristic information of more difficult negative examples, and the retrained text coding model and the retrained label coding model are more accurate.
With continued reference to fig. 5, a flow 500 of some embodiments of a tag generation method according to the present disclosure is shown. The label generating method comprises the following steps:
In step 501, a target text is acquired.
In some embodiments, the execution subject (e.g., electronic device) of the tag generation method may acquire the target text through a wired manner or a wireless manner.
Step 502, inputting the target text into a pre-trained text coding model to generate a target text vector.
In some embodiments, the execution body may input the target text into a pre-trained text encoding model to generate a target text vector. Wherein the text encoding model is generated by a model training method of some embodiments of the present disclosure. The pre-trained text coding model can be a text coding model after initial training, a text coding model after retraining, or a text coding model after training in all batches.
In step 503, a target number of target tag vectors is found from the tag vector database.
In some embodiments, the execution body may find a target number of target tag vectors from a tag vector database. The target label vector is a vector which satisfies a preset association relation between each label vector and the target text vector. Each of the tag vectors in the tag vector database is generated based on a pre-trained tag coding model generated by the model training method of some embodiments of the present disclosure. The target number may be a preset number. The target label vector may be a label vector with a similarity with the target text vector in a number of previous targets in each label vector in the label vector database. The pre-trained tag coding model can be a tag coding model after primary training, a tag coding model after retraining, or a tag coding model after all batches of training.
It should be noted that, each text label collected in advance is input to a label coding model trained in advance, so as to obtain each label vector in the label vector database.
And step 504, determining the tag set corresponding to the target number of target tag vectors as the text tag set of the target text.
In some embodiments, the executing entity may determine a tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
The above embodiments of the present disclosure have the following advantageous effects: according to the label generation method, the text coding model is utilized, so that the vector representation corresponding to the target text can be generated accurately, and meanwhile, the text labels corresponding to the target text can be screened accurately due to the fact that the vector representation of the label vector in the database is also accurate.
With further reference to fig. 6, as an implementation of the method illustrated in the above figures, the present disclosure provides some embodiments of a model training apparatus, which correspond to those method embodiments illustrated in fig. 3, which may find particular application in a variety of electronic devices.
As shown in fig. 6, a model training apparatus 600 includes: a first determination unit 601, an execution unit 602, and a training unit 603. Wherein, the first determining unit 601 is configured to determine a positive sample group corresponding to each training text in the training text set of the current batch, where the positive samples include: training text and text labels; an execution unit 602 configured to execute, for each training text in the training text set, a first predicted value generation step: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed from the training text sets; for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the set of sub-text vectors and each label vector in the corresponding label vector group, wherein the first negative sample comprises: the training sample corresponds to the text label with the label vector; the training unit 603 is configured to perform initial training on the initial text encoding model and the initial tag encoding model according to the obtained multiple first predicted value sets, the obtained positive sample set and the obtained negative sample tags corresponding to the multiple first negative sample sets, so as to obtain a text encoding model after initial training and a tag encoding model after initial training.
In some optional implementations of some embodiments, the execution unit 602 in the model training apparatus 600 may be further configured to: and screening out the similarity meeting the preset condition from the similarity set corresponding to the label vector, and taking the similarity as a first predicted value of a first negative sample corresponding to the label vector.
In some optional implementations of some embodiments, the training unit 603 in the model training apparatus 600 may be further configured to: correspondingly inputting the plurality of first predicted value group sets and the negative sample label into a target loss function to obtain a plurality of first loss value group sets; generating a second loss value corresponding to each positive sample in the positive sample group set according to the initial text coding model and the initial label coding model to obtain a second loss value group set; and performing primary training on the initial text coding model and the initial tag coding model according to the first loss value group sets and the second loss value group sets to obtain a text coding model after primary training and a tag coding model after primary training.
In some optional implementations of some embodiments, the model training apparatus 600 further includes: an input unit (not shown) and an update unit (not shown). Wherein the input unit may be configured to: and inputting the text label group set corresponding to the training text set into the label coding model after the initial training to obtain a training label vector group set. The updating unit may be configured to: updating a pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue, wherein the tag vector queue comprises: a set of tag vector groups for a plurality of historical batches.
In some optional implementations of some embodiments, the model training apparatus 600 further includes: a training text acquisition unit, a third determination unit and a first step execution unit (not shown in the figure). Wherein the training text acquisition unit may be configured to: and responding to the fact that the updated label vector in the updated label vector queue meets a preset convergence condition, and acquiring a training text set of the next batch to serve as a target training text set. The third determination unit may be configured to: and determining a positive sample group corresponding to each target training text in the target training text set as a target positive sample group to obtain a target positive sample group set. The first step execution unit may be configured to: and taking the label coding model after initial training as an initial label coding model, taking the text coding model after initial training as an initial text coding model, and executing the first predicted value generation step aiming at the target training text set according to the target positive sample set to obtain a plurality of second predicted value set.
In some optional implementations of some embodiments, the model training apparatus 600 further includes: a second step execution unit (not shown) and a retraining unit (not shown). Wherein the second step execution unit may be configured to: for each target training text in the target training text set, executing a second predicted value generating step: performing text division on the target training text to generate a target sub-text set; inputting the target sub-text set into an initial text coding model to obtain a target sub-text vector set; for each updated label vector in the updated label vector queue, determining the similarity between the target sub-text vector set and the updated label vector as target similarity, and obtaining a target similarity set; for each update tag vector in the update tag vector queue, determining a similarity in the update tag vector corresponding target similarity set that satisfies the predetermined condition as a third predicted value of the update tag vector corresponding to a second negative sample, wherein the update tag vector corresponding to the second negative sample includes: the target training text corresponds to the text label with the updated label vector. The retraining unit may be configured to: and retraining the initial text coding model and the initial label coding model according to the obtained third predicted value group set, the target positive sample group set, the plurality of second predicted value group sets and the negative sample labels corresponding to the obtained second negative sample group set to obtain a retrained text coding model and a retrained label coding model.
In some optional implementations of some embodiments, the retraining unit may be further configured to: generating a plurality of third loss value group sets according to the plurality of second prediction value group sets and the corresponding negative sample labels; removing a third predicted value with the predicted value smaller than or equal to a target threshold value from the third predicted value group set to obtain a removed third predicted value group set; generating a fourth loss value group set according to the removed third predicted value group set and the corresponding negative sample label; and retraining the initial text coding model and the initial tag coding model according to the third loss value group sets and the fourth loss value group sets to obtain a retrained text coding model and a retrained tag coding model.
In some optional implementations of some embodiments, the update unit may be further configured to: removing the tag vector groups with the addition time meeting the preset time condition from each tag vector group set included in the tag vector queue to obtain a removed tag vector queue; and adding the training tag vector group set serving as a tag vector group set into the removed tag vector queue to obtain an updated tag vector queue.
It will be appreciated that the elements described in the model training apparatus 600 correspond to the various steps in the method described with reference to fig. 3. Thus, the operations, features, and advantages described above with respect to the method are equally applicable to the model training apparatus 600 and the units contained therein, and are not described herein.
With further reference to fig. 7, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a tag generation apparatus, which correspond to those method embodiments shown in fig. 5, and which are particularly applicable in various electronic devices.
As shown in fig. 7, a label generating apparatus 700 includes: an acquisition unit 701, an input unit 702, a search unit 703, and a second determination unit 704. Wherein, the acquiring unit 701 is configured to acquire a target text; an input unit 702 configured to input the target text to a pre-trained text encoding model to generate a target text vector, wherein the text encoding model is generated by a model training method of some embodiments of the present disclosure; a searching unit 703 configured to search out a target number of target tag vectors from the tag vector database, where the target tag vectors are vectors satisfying a preset association relationship with the target text vector in each tag vector, where each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated by a model training method according to some embodiments of the present disclosure; the second determining unit 704 is configured to determine a tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
It will be appreciated that the elements described in the label producing apparatus 700 correspond to the various steps in the method described with reference to fig. 5. Thus, the operations, features, and advantages described above with respect to the method are equally applicable to the tag generating apparatus 700 and the units contained therein, and are not described herein.
Referring now to fig. 8, a schematic diagram of an electronic device 800 (e.g., electronic device 101 of fig. 1) suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 8 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 8, the electronic device 800 may include a processing means (e.g., a central processor, a graphics processor, etc.) 801, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc.; storage 808 including, for example, magnetic tape, hard disk, etc.; communication means 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 8 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communication device 809, or from storage device 808, or from ROM 802. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 801.
It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a positive sample group corresponding to each training text in the training text set of the current batch, wherein the positive samples comprise: training text and text labels; for each training text in the training text set, executing a first predicted value generation step: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed from the training text sets; for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the set of sub-text vectors and each label vector in the corresponding label vector group, wherein the first negative sample comprises: the training sample corresponds to the text label with the label vector; and performing primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets to obtain a text coding model after primary training and a tag coding model after primary training. Acquiring a target text; inputting the target text into a pre-trained text coding model to generate a target text vector, wherein the text coding model is generated by a model training method according to some embodiments of the present disclosure; searching target number of target tag vectors from the tag vector database, wherein the target tag vectors are vectors which meet a preset association relation between each tag vector and the target text vector, each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated through a model training method of some embodiments of the disclosure; and determining the tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first determination unit, an execution unit, and a training unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the first determining unit may also be described as "a unit that determines a positive sample group corresponding to each training text in the training text set of the current batch".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
Some embodiments of the present disclosure also provide a computer program product comprising a computer program which, when executed by a processor, implements any of the model training methods and tag generation methods described above.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (14)

1. A model training method, comprising:
Determining a positive sample group corresponding to each training text in the training text set of the current batch, wherein the positive samples comprise: training text and text labels;
For each training text in the training text set, performing a first predictor generating step:
Performing text division on the training text to generate a sub-text set;
Inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed in the training text sets;
For each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the sub-text vector set and each label vector in the corresponding label vector set, wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels;
and performing primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets to obtain a text coding model after primary training and a tag coding model after primary training.
2. The method of claim 1, wherein the determining a first predictor for each label vector corresponding to a first negative sample from a set of similarities between the set of sub-text vectors and each label vector in the set of corresponding label vectors comprises:
And screening out the similarity meeting the preset condition from the similarity set corresponding to the label vector, and taking the similarity as a first predicted value of a first negative sample corresponding to the label vector.
3. The method of claim 1, wherein the performing initial training on the initial text coding model and the initial tag coding model according to the obtained plurality of first predicted value sets, the obtained positive sample set, and the negative sample tags corresponding to the obtained plurality of first negative sample set, to obtain an initial trained text coding model and an initial trained tag coding model includes:
Correspondingly inputting the plurality of first predicted value group sets and the negative sample label to a target loss function to obtain a plurality of first loss value group sets;
generating a second loss value corresponding to each positive sample in the positive sample group set according to the initial text coding model and the initial label coding model to obtain a second loss value group set;
And performing primary training on the initial text coding model and the initial tag coding model according to the first loss value group sets and the second loss value group sets to obtain a text coding model after primary training and a tag coding model after primary training.
4. The method of claim 1, wherein the method further comprises:
Inputting the text label group set corresponding to the training text set into the label coding model after the initial training to obtain a training label vector group set;
Updating a pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue, wherein the tag vector queue comprises: a set of tag vector groups for a plurality of historical batches.
5. The method of claim 4, wherein the method further comprises:
Responding to the fact that the updated label vector in the updated label vector queue meets a preset convergence condition, and acquiring a training text set of the next batch as a target training text set;
Determining a positive sample group corresponding to each target training text in the target training text set as a target positive sample group to obtain a target positive sample group set;
and taking the label coding model after initial training as an initial label coding model, taking the text coding model after initial training as an initial text coding model, and executing the first predicted value generation step aiming at the target training text set according to the target positive sample set to obtain a plurality of second predicted value set.
6. The method of claim 5, wherein the method further comprises:
For each target training text in the target training text set, performing a second predicted value generation step:
Performing text division on the target training text to generate a target sub-text set;
inputting the target sub-text set into an initial text coding model to obtain a target sub-text vector set;
for each updated label vector in the updated label vector queue, determining the similarity between the target sub-text vector set and the updated label vector as target similarity, and obtaining a target similarity set;
For each update tag vector in the update tag vector queue, determining a similarity in the update tag vector corresponding target similarity set that satisfies a predetermined condition as a third predicted value of the update tag vector corresponding to a second negative sample, wherein the update tag vector corresponding to the second negative sample comprises: the target training text corresponds to the text label with the updated label vector;
And retraining the initial text coding model and the initial label coding model according to the obtained third predicted value group set, the target positive sample group set, the plurality of second predicted value group sets and the negative sample labels corresponding to the obtained second negative sample group set to obtain a retrained text coding model and a retrained label coding model.
7. The method of claim 6, wherein retraining the initial text encoding model and the initial tag encoding model according to the obtained third set of predictors, the target set of positive samples, the plurality of sets of second predictors, and negative sample tags corresponding to the obtained second set of negative samples, to obtain a retrained text encoding model and a retrained tag encoding model, comprises:
Generating a plurality of third loss value group sets according to the plurality of second prediction value group sets and the corresponding negative sample labels;
Removing a third predicted value with the predicted value smaller than or equal to a target threshold value from the third predicted value group set to obtain a removed third predicted value group set;
generating a fourth loss value group set according to the removed third predicted value group set and the corresponding negative sample label;
And retraining the initial text coding model and the initial tag coding model according to the third loss value group sets and the fourth loss value group sets to obtain a retrained text coding model and a retrained tag coding model.
8. The method of claim 4, wherein updating the pre-stored tag vector queue according to the training tag vector group set to obtain an updated tag vector queue comprises:
Removing the tag vector groups with the addition time meeting the preset time condition from each tag vector group set included in the tag vector queue to obtain a removed tag vector queue;
And taking the training tag vector group set as a tag vector group set, adding the training tag vector group set into the removed tag vector queue, and obtaining an updated tag vector queue.
9. A label generation method, comprising:
Acquiring a target text;
inputting the target text into a pre-trained text encoding model to generate a target text vector, wherein the text encoding model is generated based on the method of one of claims 1-8;
searching a target number of target tag vectors from a tag vector database, wherein the target tag vectors are vectors meeting a preset association relation between each tag vector and the target text vector, each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated based on the method according to one of claims 1-8;
And determining the tag sets corresponding to the target number of target tag vectors as text tag sets corresponding to the target text.
10. A model training apparatus comprising:
A first determining unit configured to determine a positive sample group corresponding to each training text in the training text set of the current batch, wherein the positive samples include: training text and text labels;
An execution unit configured to execute, for each training text in the training text set, a first predicted value generation step of: performing text division on the training text to generate a sub-text set; inputting the sub-text set into an initial text coding model to obtain a sub-text vector set, and inputting a text label set corresponding to the rest of training text sets into the initial label coding model to obtain a label vector set, wherein the rest of training text sets are text sets in which the training text is removed in the training text sets; for each of the remaining training texts in the remaining training text set, determining a first predicted value of each label vector corresponding to a first negative sample according to a similarity set between the sub-text vector set and each label vector in the corresponding label vector set, wherein the first negative sample comprises: the training samples and the label vectors correspond to text labels;
The training unit is configured to perform primary training on the initial text coding model and the initial tag coding model according to the obtained multiple first predicted value group sets, the obtained positive sample group sets and the obtained negative sample tags corresponding to the multiple first negative sample group sets, so as to obtain a text coding model after primary training and a tag coding model after primary training.
11. A label generating apparatus comprising:
An acquisition unit configured to acquire a target text;
An input unit configured to input the target text to a pre-trained text encoding model to generate a target text vector, wherein the text encoding model is generated based on the method of one of claims 1-8;
A searching unit configured to search a target number of target tag vectors from a tag vector database, wherein the target tag vectors are vectors satisfying a preset association relationship with the target text vector in each tag vector, wherein each tag vector in the tag vector database is generated based on a pre-trained tag coding model, and the tag coding model is generated based on the method according to one of claims 1 to 8;
and the second determining unit is configured to determine the tag set corresponding to the target number of target tag vectors as a text tag set of the target text.
12. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-9.
13. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-9.
14. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-9.
CN202211429424.0A 2022-11-15 2022-11-15 Model training method, label generating method, device, equipment and medium Pending CN118051768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211429424.0A CN118051768A (en) 2022-11-15 2022-11-15 Model training method, label generating method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211429424.0A CN118051768A (en) 2022-11-15 2022-11-15 Model training method, label generating method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN118051768A true CN118051768A (en) 2024-05-17

Family

ID=91050802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211429424.0A Pending CN118051768A (en) 2022-11-15 2022-11-15 Model training method, label generating method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN118051768A (en)

Similar Documents

Publication Publication Date Title
CN110288049B (en) Method and apparatus for generating image recognition model
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
CN113470619B (en) Speech recognition method, device, medium and equipment
CN111523640A (en) Training method and device of neural network model
CN112650841A (en) Information processing method and device and electronic equipment
CN112200173B (en) Multi-network model training method, image labeling method and face image recognition method
CN113327599B (en) Voice recognition method, device, medium and electronic equipment
CN116128055A (en) Map construction method, map construction device, electronic equipment and computer readable medium
CN113591490B (en) Information processing method and device and electronic equipment
CN111090993A (en) Attribute alignment model training method and device
CN111008213A (en) Method and apparatus for generating language conversion model
CN111797263A (en) Image label generation method, device, equipment and computer readable medium
CN116843991A (en) Model training method, information generating method, device, equipment and medium
CN116644180A (en) Training method and training system for text matching model and text label determining method
CN115049730B (en) Component mounting method, component mounting device, electronic apparatus, and storage medium
CN114625876B (en) Method for generating author characteristic model, method and device for processing author information
CN113986958B (en) Text information conversion method and device, readable medium and electronic equipment
CN118051768A (en) Model training method, label generating method, device, equipment and medium
CN114330239A (en) Text processing method and device, storage medium and electronic equipment
CN114020908A (en) Text classification method and device, computer readable storage medium and electronic equipment
CN114792086A (en) Information extraction method, device, equipment and medium supporting text cross coverage
CN112149426A (en) Reading task processing method and related equipment
CN114792388A (en) Image description character generation method and device and computer readable storage medium
CN112820280A (en) Generation method and device of regular language model
CN117743555B (en) Reply decision information transmission method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication