CN117034926A - Model training method and device for multi-field text classification model - Google Patents

Model training method and device for multi-field text classification model Download PDF

Info

Publication number
CN117034926A
CN117034926A CN202311038789.5A CN202311038789A CN117034926A CN 117034926 A CN117034926 A CN 117034926A CN 202311038789 A CN202311038789 A CN 202311038789A CN 117034926 A CN117034926 A CN 117034926A
Authority
CN
China
Prior art keywords
classification
layer
sub
text
text data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311038789.5A
Other languages
Chinese (zh)
Inventor
杨仁杰
王智君
魏一雄
王聪
曹靖楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311038789.5A priority Critical patent/CN117034926A/en
Publication of CN117034926A publication Critical patent/CN117034926A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a method and a device for model training of a multi-field text classification model. The text classification model can learn potential association under each classification field in the model training process, so that the text classification model can give more accurate classification results, and the text classification model comprises a plurality of sub-classification layers, so that the aim of classifying the text data in each classification field can be fulfilled after the text classification model is trained, and the training cost is greatly saved.

Description

Model training method and device for multi-field text classification model
Technical Field
The present disclosure relates to the field of computer technology and the field of text processing, and in particular, to a method and apparatus for model training for a multi-domain text classification model.
Background
With the continuous development of computer technology, artificial intelligence has been widely applied to various business scenarios, in which business data related to a business scenario is input into a pre-trained model by using the model, so that business execution is performed based on the result output by the model, and the business execution efficiency in various business scenarios is greatly improved.
For example, in a text classification scenario, text data is typically required to be input into a pre-trained text classification model, and the results output by the text classification model may be used for purposes such as information recommendation, security pre-warning, and the like.
However, the text classification model is only trained by using the sample text data in a single classification scene, which results in that the trained text classification model can usually only give a classification result in a classification scene to the input text data. If the classification results under various classification scenes are desired, the text classification models used under the classification scenes are respectively constructed and trained, so that the training cost is greatly increased. Moreover, since one text data may contain contents in multiple classification fields, the text classification model often does not pay attention to potential links existing between the classification fields, so that the accuracy of classification results given by the text classification model is low.
Disclosure of Invention
The present disclosure provides a method and apparatus for model training for a multi-domain text classification model, so as to partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a method of model training for a multi-domain text classification model, comprising:
acquiring sample text data;
inputting the sample text data into a coding layer in a text classification model to be trained, so as to code the sample text data through the coding layer, and obtaining a first coding vector corresponding to the sample text data;
inputting the first coding vector into each expert network layer contained in the text classification model respectively to determine, for each expert network layer, a second coding vector output by the expert network layer for the first coding vector;
inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, and processing the second coding vector output by each expert network layer input into each sub-classification layer by the sub-classification layer to obtain a classification result under the classification field corresponding to the sub-classification layer as a classification result corresponding to the sub-classification layer;
And training the text classification model by taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target.
Optionally, for each sub-classification layer, the text classification model further includes a weight distribution layer for the sub-classification layer;
inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, and processing the second coding vector output by each expert network layer input into the sub-classification layer by the sub-classification layer to obtain a classification result under the classification field corresponding to the sub-classification layer, wherein the classification result specifically comprises:
and inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, so as to obtain a weighted coding vector of each expert network layer by weighting the second coding vector output by each expert network layer through a weight distribution layer set for each sub-classification layer, and inputting the weighted coding vector of each expert network layer into the sub-classification layer, so as to output a classification result under the classification field corresponding to the sub-classification layer through the sub-classification layer as a classification result corresponding to the sub-classification layer.
Optionally, acquiring sample text data specifically includes:
acquiring initial sample text data;
identifying invalid words contained in the initial sample text data, and removing the invalid words from the initial sample text data to obtain transition sample text data;
and intercepting the transition sample text data according to the appointed text length to obtain sample text data input into the text classification model.
Optionally, training the text classification model with an optimization objective that minimizes a deviation between a classification result corresponding to each sub-classification layer and an actual classification result corresponding to the sample text data in a classification field corresponding to each sub-classification layer, specifically including:
aiming at each round of training, determining a part of network layers from the text classification model, and fixing network parameters of the part of network layers in the round of training;
and taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer in the round of training as an optimization target, and adjusting network parameters of other network layers except for the partial network layer contained in the text classification model so as to execute the round of training of the text classification model.
The present specification provides an apparatus for model training for a multi-domain text classification model, comprising:
the acquisition module is used for acquiring sample text data;
the first coding module is used for inputting the sample text data into a coding layer in a text classification model to be trained so as to code the sample text data through the coding layer to obtain a first coding vector corresponding to the sample text data;
the second coding module is used for respectively inputting the first coding vector into each expert network layer contained in the text classification model so as to determine a second coding vector output by the expert network layer for the first coding vector for each expert network layer;
the output module is used for inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model so as to process the second coding vector output by each expert network layer input into the sub-classification layer through the sub-classification layer for each sub-classification layer, so as to obtain a classification result in the classification field corresponding to the sub-classification layer, and the classification result is used as a classification result corresponding to the sub-classification layer;
And the training module is used for training the text classification model by taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target.
Optionally, for each sub-classification layer, the text classification model further includes a weight distribution layer for the sub-classification layer;
the output module is specifically configured to input a second coding vector output by each expert network layer to each sub-classification layer included in the text classification model, so as to weight the second coding vector output by each expert network layer by means of a weight distribution layer set for the sub-classification layer for each sub-classification layer, obtain a weighted coding vector of each expert network layer, and input the weighted coding vector of each expert network layer to the sub-classification layer, so as to output a classification result under the classification field corresponding to the sub-classification layer by means of the sub-classification layer, as a classification result corresponding to the sub-classification layer.
Optionally, the acquiring module is specifically configured to acquire initial sample text data; identifying invalid words contained in the initial sample text data, and removing the invalid words from the initial sample text data to obtain transition sample text data; and intercepting the transition sample text data according to the appointed text length to obtain sample text data input into the text classification model.
Optionally, the training module is specifically configured to determine, for each training round, a part of network layers from the text classification model, and fix network parameters of the part of network layers in the training round; and taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer in the round of training as an optimization target, and adjusting network parameters of other network layers except for the partial network layer contained in the text classification model so as to execute the round of training of the text classification model.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model training for a multi-domain text classification model described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-described method of model training for a multi-domain text classification model when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the method for model training of the multi-domain text classification model provided in the present disclosure, the text classification model includes a coding layer, a plurality of expert network layers, and a plurality of sub-classification layers, each of the sub-classification layers is configured to respectively give a classification result in a classification domain, so after sample text data is obtained, the sample text data may be input to the coding layer in the text classification model to obtain a first coding vector corresponding to the sample text data, and then the first coding vector may be respectively input to each of the expert network layers included in the text classification model, where each of the expert network layers respectively gives each of the second coding vectors. Further, the second coding vector output by each expert network layer is respectively input into each sub-classification layer included in the text classification model, so that the second coding vector output by each expert network layer input into the sub-classification layer is processed by the sub-classification layer for each sub-classification layer, and a classification result under the classification field corresponding to the sub-classification layer is obtained and is used as a classification result corresponding to the sub-classification layer. Finally, deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer can be minimized as an optimization target, and the text classification model is trained.
According to the method for training the model of the multi-domain text classification model, the second coding vectors are given out by the expert network layers, then the second coding vectors given out by each expert network layer are respectively input into the sub-classification layers, and the text classification model is trained by taking the deviation between the classification result given out by each sub-classification layer and the classification label corresponding to each sub-classification layer as the target, so that the text classification model can learn potential association in each classification domain in the model training process, the text classification model can give more accurate classification results, and the text classification model comprises the sub-classification layers, so that the aim of classifying the text data in each classification domain can be achieved after the text classification model is trained, and the training cost is greatly saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow diagram of a method for model training for a multi-domain text classification model provided in an embodiment of the present disclosure;
FIG. 2 is a network schematic diagram of a text classification model according to an embodiment of the present disclosure;
FIG. 3 is a network schematic diagram of a text classification model according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of an apparatus for model training for a multi-domain text classification model according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for model training for a multi-domain text classification model according to an embodiment of the present disclosure, which specifically includes the following steps:
s101: sample text data is obtained.
In the embodiment of the present specification, the execution subject of the model training method provided in the present specification may be a server, or may be a terminal device such as a notebook computer or a desktop computer, and for convenience of description, the model training method provided in the present specification will be described below with only the terminal device as the execution subject.
In order to train a model capable of classifying texts in various classification fields, in the present specification, a terminal device needs to obtain sample text data, where the sample text data may be obtained from a training sample set that is built in advance. The obtained sample text data can be aimed at various practical application scenes, for example, in the field of information recommendation, the preference classification result of the user in business scenes such as travel, delicacies, entertainment and the like can be determined through the obtained text data of the user, and the preference classification result can be used for representing the preference degree of various business objects contained in one business scene by the user. For example, for a food service scenario, the preference classification results in the service scenario may be used to indicate what cuisine is more interesting to the user. In the subsequent process, information recommendation can be performed to the user in each service scene according to the determined preference classification result. In this scenario, the obtained text data of the user can be used as sample text data required for training under the condition that the specific preference of the user is explicitly known.
For another example, in the information early warning field, a user may submit a problem text for a problem encountered, and the terminal device may classify the problem text through a text classification model to determine a classification result of the problem text under different classification standards. The classification result of the question text under a classification standard is used for indicating which question category under the classification standard the question text specifically should belong to. Further, in the subsequent process, the question text may be forwarded to a worker for processing the specified question according to the determined classification result, and the worker may answer the question for the user. In this scenario, the obtained question text of the user can be used as sample text data required for training under the condition that the question category to which the question text belongs is explicitly known.
In practical applications, the text classification is more involved, the above is only an exemplary listing of two specific application fields related to text classification, and other fields are not illustrated herein.
Further, some grammar errors or useless words may occur in the sample text data obtained by the terminal device, and in order to ensure training efficiency and training effect of the model, the obtained sample text data may be subjected to data cleaning, and the cleaned sample text data is used as a training sample to train the text classification model.
Specifically, in the present specification, the terminal device may acquire initial sample text data, and then may identify invalid words included in the acquired initial sample text data, and reject the identified invalid words from the initial sample text data, so as to obtain transition sample text data.
The terminal device can specifically recognize and reject invalid words contained in the initial sample text data in a mode of regular processing and stop word processing.
Further, since the text classification model to be trained often specifies the length of the input data, for this purpose, the terminal device needs to intercept the transition sample text data according to the specified text length, so as to obtain the sample text data input into the text classification model.
S102: and inputting the sample text data into a coding layer in a text classification model to be trained, so as to code the sample text data through the coding layer, and obtaining a first coding vector corresponding to the sample text data.
S103: the first coding vector is respectively input into each expert network layer contained in the text classification model, so that a second coding vector output by the expert network layer for the first coding vector is determined for each expert network layer.
S104: and inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, and processing the second coding vector output by each expert network layer input into the sub-classification layer by the sub-classification layer for each sub-classification layer to obtain a classification result under the classification field corresponding to the sub-classification layer as a classification result corresponding to the sub-classification layer.
In this specification, the text classification model includes a coding layer, which is mainly used to code input sample text data to obtain coding vectors that can be processed by other network layers in the text classification model. In addition, the text classification model comprises a plurality of sub-classification layers, and each sub-classification layer corresponds to one classification field, so that the text classification model provided by the specification can actually classify text data in multiple classification fields.
Further, the text classification model in the specification is further provided with a plurality of expert network layers, and the expert network layers can be understood as giving respective classification suggestions to the text data input into the text classification model, wherein the classification suggestions given by the expert network layers are summarized into each sub-classification layer, and each sub-classification layer refers to the classification suggestions given by each expert network layer by combining the classification characteristics of the corresponding classification field, so as to output corresponding classification results.
The so-called classification suggestions given by these expert network layers are actually further processed by the coding vectors output by the coding layers, and the data obtained by the processing implies the classification knowledge learned by these expert network layers during the model training process.
Based on this, in the present specification, after the terminal device inputs the obtained sample text data to the coding layer in the text classification model to be trained, the coding layer may code the sample text data to obtain the first coding vector corresponding to the sample text data.
The coding layer then inputs the output first coding vector to each of the expert network layers included in the text classification model, and for each of the expert network layers, the expert network layer further processes the first coding vector to determine a second coding vector output by the expert network layer for the first coding vector in combination with knowledge learned by itself during training (the knowledge is specifically achieved by adjusting network parameters in the expert network layer during training).
Each expert network layer may further input the respective output second encoding vector to each sub-classification layer, i.e., each sub-classification layer may receive the respective second encoding vectors output by all expert network layers. And for each sub-classification layer, processing the second coding vector output by each expert network layer input into the sub-classification layer through the sub-classification layer, so as to obtain a classification result under the classification field corresponding to the sub-classification layer, and taking the classification result as the classification result corresponding to the sub-classification layer.
From the above, each sub-classification layer can provide its own classification result according to the second coding vectors output by all the expert network layers, so that the setting of the model structure can ensure that the classification result output by each sub-classification layer is as accurate as possible compared with the structure of a single expert network layer.
S105: and training the text classification model by taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target.
After each sub-classification layer outputs the classification result, the terminal device can determine the loss value corresponding to each sub-classification layer according to the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer, and further sum the loss values to obtain the total loss. Finally, the terminal device may train the text classification model in a manner that minimizes this total loss, as shown in fig. 2.
Fig. 2 is a network schematic diagram of a text classification model according to an embodiment of the present disclosure.
In the text classification model shown in fig. 2, there are one coding layer, three expert network layers and three sub-classification layers, which correspond to three different classification fields. It should be noted that there is no strict correspondence between the number of expert network layers and the number of sub-classification layers.
After the terminal device obtains a first coding vector of the sample text data through the coding layer, the coding layer inputs the first coding vector into the three expert network layers respectively. The three expert network layers input the second encoded vectors of the outputs to the three sub-classification layers, for example, the second encoded vectors of the outputs of the expert network layer 1, the expert network layer 2 and the expert network layer 3 are summarized into the sub-classification layer a.
For each sub-classification layer, the sub-classification layer will give a classification result under the classification domain to which the sub-classification layer belongs, and the terminal device determines the loss value corresponding to the sub-classification layer according to the classification result given by the sub-classification layer and the predetermined actual classification result (the classification label under the classification domain) of the sample text data under the classification domain to which the sub-classification layer belongs, for example, the loss value a can be determined based on the classification result output by the sub-classification layer a.
And finally, the terminal equipment sums the loss value a, the loss value b and the loss value c to obtain total loss, and trains the text classification model by taking minimizing the total loss as an optimization target.
From the above, it can be seen that, firstly, for any sub-classification layer, the input of the sub-classification layer is given by the output of a plurality of expert network layers, so that the accuracy of the output result of the sub-classification layer can be ensured, and secondly, since the corresponding loss values of all the sub-classification layers are summed and the text classification model is trained through the loss and the value, the text classification model can learn the potential relevance between the classification fields in the model training process, so that no matter the coding layer, each expert network layer or each sub-classification layer in the text classification model outputs the result in a single classification field, that is, each network layer in the text classification model can comprehensively consider the characteristics in the classification fields and the potential relations between the classification fields, and a more accurate and reasonable result is given.
Further, in order to ensure the training effect of the text classification model, the trained text classification model can give more accurate and reasonable classification results, in the specification, the text classification model may be further provided with a plurality of weight distribution layers, and the number of the weight distribution layers may correspond to the number of the sub-classification layers, that is, the number of the weight distribution layers which is the same as the number of the sub-classification layers is set in the text classification model.
The main function of the weight distribution layer is to give out the duty ratio (i.e. weight) of the output result of each expert network layer which the sub-classification layer should refer to under the classification field to which the sub-classification layer belongs, so that the sub-classification layer can give out the classification result more in accordance with the classification field to which the sub-classification layer belongs.
Specifically, after obtaining each second coding vector output by each expert network layer, the second coding vector output by each expert network layer may be input to each sub-classification layer included in the text classification model, so that, for each sub-classification layer, the second coding vector output by each expert network layer is weighted by a weight distribution layer set for the sub-classification layer, to obtain a weighted coding vector of each expert network layer, and the weighted coding vector of each expert network layer is input to the sub-classification layer, so that a classification result under the classification field corresponding to the sub-classification layer is output by the sub-classification layer, as a classification result corresponding to the sub-classification layer, as shown in fig. 3.
Fig. 3 is a network schematic diagram of a text classification model according to an embodiment of the present disclosure.
As can be seen from fig. 3, a weight distribution layer a for the sub-classification layer a is set in the text classification model, the first code vector output by the code layer may be input into the weight distribution layer a, the weight distribution layer a gives the respective weights of the second code vector output by each expert network layer under the classification field corresponding to the sub-classification layer a according to the first code vector, and further weights the determined weights to the second code vector output by each expert network layer to obtain the weighted code vector of each expert network layer, so that the sub-classification layer a may give a corresponding classification result according to the weighted code vector of each expert network layer.
It should be noted that, in fig. 3, only one weight distribution layer corresponding to a sub-classification layer is shown, and in fact, in the text classification model, a weight distribution layer corresponding to a sub-classification layer B and a weight distribution layer corresponding to a sub-classification layer C are also provided, which are not shown in fig. 3, and the specific function is the same as that of the weight distribution layer a corresponding to the sub-classification layer a, which is not described in detail herein.
Further, for the weight distribution layer in any classification domain, the capability of the weight distribution layer to output the weight of the second coding vector output by each expert network layer under the classification domain is obtained by training the text classification model, that is, by the training manner, the network parameters included in the weight distribution layer in the text classification model are required to be continuously adjusted in practice, so that accurate and reasonable weight can be given to the weight distribution layer included in the trained text classification model, and the accuracy of the classification result output by each sub-classification layer is ensured.
It should be noted that, since the text classification model includes sub-classification layers in multiple classification fields, if a conventional manner is adopted, it may be difficult to achieve faster convergence of network parameters, and therefore, in this specification, network parameters in a part of the network layers may be fixed, and network parameters of other networks may be adjusted to improve training efficiency of the text classification model.
Specifically, in general, the terminal device performs a model training task in a multi-round iterative training manner, and for each round of training, a part of network layers may be determined from the text classification model, and network parameters of the part of network layers may be fixed in the round of training.
And then, the terminal equipment can minimize the deviation between the classification result corresponding to each sub-classification layer in the round training and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target, and adjust the network parameters of other network layers except the partial network layers contained in the text classification model so as to execute the round training of the text classification model.
It should be noted that the terminal device may fix more network parameters of the network layer in the previous training round, and as the training round increases, the number of network layers that need to fix the network layer parameters in each training round may be continuously reduced.
According to the method, the second coding vectors are given out by the expert network layers, then the second coding vectors given out by each expert network layer are respectively input into each sub-classification layer, and the text classification model is trained by taking the deviation between the classification result given out by each sub-classification layer and the classification label corresponding to each sub-classification layer as the target, so that the text classification model can learn the potential association under each classification field in the model training process, the text classification model can give out more accurate classification results, and the text classification model comprises the sub-classification layers, so that the aim of classifying the text data in each classification field can be achieved after the text classification model is trained, and the training cost is greatly saved.
The above method for model training for a multi-domain text classification model provided for one or more embodiments of the present specification further provides a corresponding device for model training for a multi-domain text classification model based on the same thought, as shown in fig. 4.
Fig. 4 is a schematic structural diagram of a device for model training for a multi-domain text classification model according to an embodiment of the present disclosure, which specifically includes:
an obtaining module 401, configured to obtain sample text data;
a first encoding module 402, configured to input the sample text data to an encoding layer in a text classification model to be trained, so as to encode the sample text data by the encoding layer, and obtain a first encoding vector corresponding to the sample text data;
a second encoding module 403, configured to input the first encoding vector into each expert network layer included in the text classification model, so as to determine, for each expert network layer, a second encoding vector output by the expert network layer for the first encoding vector;
the output module 404 is configured to input the second encoded vector output by each expert network layer to each sub-classification layer included in the text classification model, so as to process, for each sub-classification layer, the second encoded vector output by each expert network layer input into the sub-classification layer through the sub-classification layer, to obtain a classification result under the classification domain corresponding to the sub-classification layer, and use the classification result as a classification result corresponding to the sub-classification layer;
The training module 405 is configured to train the text classification model with a deviation between a classification result corresponding to each sub-classification layer and an actual classification result corresponding to the sample text data in a classification field corresponding to each sub-classification layer as an optimization target.
Optionally, for each sub-classification layer, the text classification model further includes a weight distribution layer for the sub-classification layer;
the output module 404 is specifically configured to input the second encoded vector output by each expert network layer to each sub-classification layer included in the text classification model, so as to weight the second encoded vector output by each expert network layer by means of the weight distribution layer set for the sub-classification layer for each sub-classification layer, obtain a weighted encoded vector of each expert network layer, and input the weighted encoded vector of each expert network layer to the sub-classification layer, so as to output, by the sub-classification layer, a classification result under the classification field corresponding to the sub-classification layer as a classification result corresponding to the sub-classification layer.
Optionally, the obtaining module 401 is specifically configured to obtain initial sample text data; identifying invalid words contained in the initial sample text data, and removing the invalid words from the initial sample text data to obtain transition sample text data; and intercepting the transition sample text data according to the appointed text length to obtain sample text data input into the text classification model.
Optionally, the training module 405 is specifically configured to determine, for each round of training, a partial network layer from the text classification model, and fix network parameters of the partial network layer in the round of training; and taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer in the round of training as an optimization target, and adjusting network parameters of other network layers except for the partial network layer contained in the text classification model so as to execute the round of training of the text classification model.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the method of model training for a multi-domain text classification model provided in fig. 1 above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of model training for the multi-domain text classification model provided in fig. 1 above.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. A method of model training for a multi-domain text classification model, comprising:
acquiring sample text data;
inputting the sample text data into a coding layer in a text classification model to be trained, so as to code the sample text data through the coding layer, and obtaining a first coding vector corresponding to the sample text data;
inputting the first coding vector into each expert network layer contained in the text classification model respectively to determine, for each expert network layer, a second coding vector output by the expert network layer for the first coding vector;
Inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, and processing the second coding vector output by each expert network layer input into each sub-classification layer by the sub-classification layer to obtain a classification result under the classification field corresponding to the sub-classification layer as a classification result corresponding to the sub-classification layer;
and training the text classification model by taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target.
2. The method of claim 1, wherein for each sub-classification layer, the text classification model further comprises a weight distribution layer for the sub-classification layer;
inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, and processing the second coding vector output by each expert network layer input into the sub-classification layer by the sub-classification layer to obtain a classification result under the classification field corresponding to the sub-classification layer, wherein the classification result specifically comprises:
And inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model, so as to obtain a weighted coding vector of each expert network layer by weighting the second coding vector output by each expert network layer through a weight distribution layer set for each sub-classification layer, and inputting the weighted coding vector of each expert network layer into the sub-classification layer, so as to output a classification result under the classification field corresponding to the sub-classification layer through the sub-classification layer as a classification result corresponding to the sub-classification layer.
3. The method of claim 1, wherein obtaining sample text data comprises:
acquiring initial sample text data;
identifying invalid words contained in the initial sample text data, and removing the invalid words from the initial sample text data to obtain transition sample text data;
and intercepting the transition sample text data according to the appointed text length to obtain sample text data input into the text classification model.
4. The method of claim 1, wherein training the text classification model with respect to minimizing a deviation between a classification result corresponding to each sub-classification layer and an actual classification result corresponding to the sample text data in a classification field corresponding to each sub-classification layer is performed as an optimization objective, and specifically includes:
Aiming at each round of training, determining a part of network layers from the text classification model, and fixing network parameters of the part of network layers in the round of training;
and taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer in the round of training as an optimization target, and adjusting network parameters of other network layers except for the partial network layer contained in the text classification model so as to execute the round of training of the text classification model.
5. An apparatus for model training for a multi-domain text classification model, comprising:
the acquisition module is used for acquiring sample text data;
the first coding module is used for inputting the sample text data into a coding layer in a text classification model to be trained so as to code the sample text data through the coding layer to obtain a first coding vector corresponding to the sample text data;
the second coding module is used for respectively inputting the first coding vector into each expert network layer contained in the text classification model so as to determine a second coding vector output by the expert network layer for the first coding vector for each expert network layer;
The output module is used for inputting the second coding vector output by each expert network layer into each sub-classification layer contained in the text classification model so as to process the second coding vector output by each expert network layer input into the sub-classification layer through the sub-classification layer for each sub-classification layer, so as to obtain a classification result in the classification field corresponding to the sub-classification layer, and the classification result is used as a classification result corresponding to the sub-classification layer;
and the training module is used for training the text classification model by taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer as an optimization target.
6. The apparatus of claim 5, wherein for each sub-classification layer, the text classification model further comprises a weight distribution layer for the sub-classification layer;
the output module is specifically configured to input a second coding vector output by each expert network layer to each sub-classification layer included in the text classification model, so as to weight the second coding vector output by each expert network layer by means of a weight distribution layer set for the sub-classification layer for each sub-classification layer, obtain a weighted coding vector of each expert network layer, and input the weighted coding vector of each expert network layer to the sub-classification layer, so as to output a classification result under the classification field corresponding to the sub-classification layer by means of the sub-classification layer, as a classification result corresponding to the sub-classification layer.
7. The apparatus of claim 5, wherein the acquisition module is specifically configured to acquire initial sample text data; identifying invalid words contained in the initial sample text data, and removing the invalid words from the initial sample text data to obtain transition sample text data; and intercepting the transition sample text data according to the appointed text length to obtain sample text data input into the text classification model.
8. The apparatus of claim 5, wherein the training module is specifically configured to determine, for each round of training, a partial network layer from the text classification model, and fix network parameters of the partial network layer in the round of training; and taking the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer in the round of training as an optimization target, and adjusting network parameters of other network layers except for the partial network layer contained in the text classification model so as to execute the round of training of the text classification model.
9. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-4.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-4 when executing the program.
CN202311038789.5A 2023-08-16 2023-08-16 Model training method and device for multi-field text classification model Pending CN117034926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311038789.5A CN117034926A (en) 2023-08-16 2023-08-16 Model training method and device for multi-field text classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311038789.5A CN117034926A (en) 2023-08-16 2023-08-16 Model training method and device for multi-field text classification model

Publications (1)

Publication Number Publication Date
CN117034926A true CN117034926A (en) 2023-11-10

Family

ID=88644595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311038789.5A Pending CN117034926A (en) 2023-08-16 2023-08-16 Model training method and device for multi-field text classification model

Country Status (1)

Country Link
CN (1) CN117034926A (en)

Similar Documents

Publication Publication Date Title
CN109062782B (en) Regression test case selection method, device and equipment
CN110516915B (en) Service node training and evaluating method and device and electronic equipment
CN116304720A (en) Cost model training method and device, storage medium and electronic equipment
CN117409466B (en) Three-dimensional dynamic expression generation method and device based on multi-label control
CN116578877B (en) Method and device for model training and risk identification of secondary optimization marking
CN116402113A (en) Task execution method and device, storage medium and electronic equipment
CN116824331A (en) Model training and image recognition method, device, equipment and storage medium
CN116308738A (en) Model training method, business wind control method and device
CN117034926A (en) Model training method and device for multi-field text classification model
CN114997277A (en) Model training method, task execution method and device
CN109325127B (en) Risk identification method and device
CN114120273A (en) Model training method and device
CN113205377A (en) Information recommendation method and device
CN116996397B (en) Network packet loss optimization method and device, storage medium and electronic equipment
CN115017915B (en) Model training and task execution method and device
CN117786061B (en) Large language model prediction method and device based on space-time attention mechanism
CN117194992B (en) Model training and task execution method and device, storage medium and equipment
CN116109008B (en) Method and device for executing service, storage medium and electronic equipment
CN115862675B (en) Emotion recognition method, device, equipment and storage medium
CN111461352B (en) Model training method, service node identification device and electronic equipment
CN117201334B (en) Multi-mode network traffic prediction method and device
CN116434787B (en) Voice emotion recognition method and device, storage medium and electronic equipment
CN116363418A (en) Method and device for training classification model, storage medium and electronic equipment
CN116501852B (en) Controllable dialogue model training method and device, storage medium and electronic equipment
CN116342974A (en) Model training method, knee joint segmentation method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination