CN113918720A - Training method, device and equipment of text classification model and storage medium - Google Patents

Training method, device and equipment of text classification model and storage medium Download PDF

Info

Publication number
CN113918720A
CN113918720A CN202111275208.0A CN202111275208A CN113918720A CN 113918720 A CN113918720 A CN 113918720A CN 202111275208 A CN202111275208 A CN 202111275208A CN 113918720 A CN113918720 A CN 113918720A
Authority
CN
China
Prior art keywords
text classification
training
model
classification model
candidate text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111275208.0A
Other languages
Chinese (zh)
Inventor
魏万顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202111275208.0A priority Critical patent/CN113918720A/en
Publication of CN113918720A publication Critical patent/CN113918720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to artificial intelligence, in particular to the technical field of natural language processing, and provides a training method, a device, equipment and a storage medium for a text classification model, wherein the method comprises the following steps: acquiring a first training data set and a second training data set; inputting the first training data set into a plurality of candidate text classification models for model training to obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model; determining a candidate text classification model with the strongest generalization capability in the plurality of candidate text classification models according to the model evaluation information, and taking the candidate text classification model as a target text classification model; and inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain the trained target text classification model, so that the accuracy of text classification is improved. The application also relates to blockchain techniques, and the first training data set and the second training data set may be stored in blockchain link points.

Description

Training method, device and equipment of text classification model and storage medium
Technical Field
The present application relates to the field of natural language processing technology in the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for training a text classification model.
Background
At present, texts can be classified manually, and also can be classified by a pre-training model, so that manual operation is omitted. For example, the induced text is classified by a pre-trained model. However, the current pre-training model cannot sufficiently extract the features of the text to be received, is deficient in model representation, and the accuracy of text classification needs to be improved.
Therefore, how to improve the accuracy of text classification becomes an urgent problem to be solved.
Disclosure of Invention
The application provides a training method, a device, equipment and a storage medium of a text classification model, aiming at improving the accuracy of text classification.
In order to achieve the above object, the present application provides a method for training a text classification model, where the method for training the text classification model includes:
acquiring a first training data set and a second training data set, wherein the first training data set comprises label data, and the second training data set comprises label-free data;
respectively inputting the first training data set into a plurality of candidate text classification models, and performing model training on the candidate text classification models to obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model;
and inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
In addition, to achieve the above object, the present application further provides a training apparatus for a text classification model, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first training data set and a second training data set, the first training data set comprises label data, and the second training data set comprises label-free data;
the first training module is used for respectively inputting the first training data set into a plurality of candidate text classification models, performing model training on the candidate text classification models and obtaining model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
the model determining module is used for determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model;
and the second training module is used for inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
In addition, to achieve the above object, the present application also provides a computer device comprising a memory and a processor;
the memory for storing a computer program;
the processor is configured to execute the computer program and implement the training method of the text classification model as described above when the computer program is executed.
In addition, to achieve the above object, the present application further provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the training method of the text classification model described above.
The application discloses a training method, a device, equipment and a storage medium of a text classification model, which are characterized in that a first training data set and a second training data set are obtained, wherein the first training data set comprises label data, the second training data set comprises label-free data, the first training data set is respectively input into a plurality of candidate text classification models, model training is carried out on the candidate text classification models, model evaluation information corresponding to each candidate text classification model is obtained, the model evaluation information represents the generalization capability of the candidate text classification models, the candidate text classification model with the strongest generalization capability in the candidate text classification models is determined according to the model evaluation information corresponding to each candidate text classification model and is used as a target text classification model, the second training data set is input into the target text classification model, and model training is carried out on the target text classification model, and obtaining a trained target text classification model. The target text classification model has the strongest generalization capability and the best text classification capability, and the target text classification model is subjected to model training through label-free data, so that the trained target text classification model can fully extract the characteristics of the text, and therefore, the text classification is performed through the trained target text classification model, and the accuracy of the text classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating steps of a method for training a text classification model according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating exemplary collection of text data of a recorded text without labeling;
FIG. 3 is a schematic diagram of the text data of the voice recording text shown in FIG. 2 after labeling the voice recording text line by line;
FIG. 4 is a schematic flow chart diagram of a step of acquiring a second training data set according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of steps for performing data preprocessing on the plurality of text data to generate a plurality of label-free data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an obtained unlabeled data provided by an embodiment of the present application;
FIG. 7 is a flowchart illustrating steps of another method for training a text classification model according to an embodiment of the present application;
FIG. 8 is a schematic flowchart illustrating model training of candidate text classification models by a training engine according to an embodiment of the present application;
FIG. 9 is a schematic block diagram of an apparatus for training a text classification model according to an embodiment of the present application;
fig. 10 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
The embodiment of the application provides a training method, a training device, equipment and a storage medium of a text classification model, which are used for improving the accuracy of text classification.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for training a text classification model according to an embodiment of the present application. The method can be applied to computer equipment, and the application scene of the method is not limited in the application. The following describes the training method of the text classification model in detail by taking the example that the training method of the text classification model is applied to computer equipment.
As shown in fig. 1, the training method of the text classification model specifically includes steps S101 to S104.
S101, a first training data set is obtained, and a second training data set is obtained, wherein the first training data set comprises label data, and the second training data set comprises label-free data.
Illustratively, both the tag data and the non-tag data are text data. Wherein the text data includes, but is not limited to, urging recorded text. It should be noted that the prompt recording text includes an agent text and a client text.
For example, the non-label data is obtained by querying from a preset database, then the label labeling operation is performed on the obtained non-label data to obtain corresponding label data, and a first training data set is formed by a plurality of label data.
Illustratively, label labeling is performed on the non-label data in a supervised line-by-line labeling mode to obtain label data.
For example, taking an example of an entry-forcing text, as shown in fig. 2 and fig. 3, fig. 2 is text data of the entry-forcing text without label labeling, and fig. 3 is text data of the entry-forcing text with labels labeled line by line, where bold-faced bold words in each line are labels corresponding to the entry-forcing text.
In some embodiments, as shown in fig. 4, step S101 may include sub-step S1011 and sub-step S1012.
S1011, acquiring a plurality of unprocessed text data;
s1012, performing data preprocessing on the text data to generate a plurality of label-free data, and forming a second training data set by the label-free data; wherein the data preprocessing comprises at least one of data splicing and data filtering.
For example, a plurality of unprocessed text data are obtained from an open-domain corpus, and then data preprocessing such as data concatenation and data filtering is performed on the plurality of unprocessed text data.
In some embodiments, as shown in fig. 5, step S1012 may include sub-steps S10121 and S10122.
S10121, performing semantic analysis on the text data, determining the text data corresponding to preset semantics in the text data as non-key text data, and filtering the non-key text data;
s10122, performing data splicing on the plurality of text data with the non-key text data filtered out to obtain the non-label data.
For example, at least one preset semantic is preset, for example, a semantic of a category such as a greeting semantic, a thank you semantic, and the like is set as the preset semantic. Semantic analysis is carried out on each text data, the text data corresponding to preset semantics in the text data are determined to be non-key text data, and the determined non-key text data are filtered. Non-critical text data includes, but is not limited to, greetings, thanks, etc. By means of data filtering, interference of irrelevant information in the generated second training data set on model training is reduced. And then performing data splicing on the filtered text data to obtain corresponding label-free data.
For example, taking the voice recording text collection urging shown in fig. 2 as an example, the greeting words in the voice recording text collection urging are filtered, and the filtered voice recording text collection urging text is subjected to data splicing to obtain the corresponding final non-tag data, for example, as shown in fig. 6.
In some embodiments, data stitching the plurality of filtered text data may include:
and according to a preset data length threshold value, performing data splicing on the plurality of filtered text data to generate the non-tag data, wherein the data length corresponding to the non-tag data does not exceed the data length threshold value.
Illustratively, the data length threshold is preset, for example, the data length threshold is set to 512 bytes. It should be noted that the preset data length threshold may be flexibly set according to actual situations, and is not particularly limited herein.
For example, taking the voice recording text to be received as an example, a large number (for example, 300 ten thousand lines) of non-tag voice recording text to be received is obtained, non-key sentences such as greetings and thank you are filtered from the voice recording text to be received, and the voice recording text to be received is spliced according to a preset data length threshold, for example, 512 bytes in length, to generate non-tag data whose data length does not exceed the data length threshold.
S102, respectively inputting the first training data set into a plurality of candidate text classification models, and performing model training on the candidate text classification models to obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value.
Illustratively, the plurality of candidate text classification models includes an open-source Chinese pre-training model and/or a pre-set text classification model. That is, the candidate text classification models may all be the chinese pre-training models, or may include both the chinese pre-training models and the preset text classification models.
For example, the plurality of candidate text classification models includes n1 chinese pre-trained models and n2 preset text classification models. The n1 and n2 can be flexibly set according to actual conditions, and are not particularly limited herein. For example, n1 is preset to be 110, n2 is 2, that is, 110 open-source chinese pre-training models and 2 preset text classification models are used as candidate text classification models.
Illustratively, the Chinese pre-training model is a model with an attention residual error structure, the depth of the model is deep, the pre-training type of the current model is huge, and the training data is more. The preset text classification model is a model with rnn and CNN as basic architectures, and the depth of the model is shallow. For example, the preset Text classification model includes, but is not limited to, a Text-CNN model, a blstm model, and the like. The Text-CNN model and the Bilstm model are not pre-trained, and word2vec vectors are trained through label-free data and serve as input end models of the Text-CNN model and the Bilstm model. The word2vec vector is a mode of converting text into numerical vectors, and is trained by using label-free data in a self-supervision mode.
And respectively inputting the obtained first training data set into each candidate text classification model, and performing model training on each candidate text classification model to obtain model evaluation information corresponding to each candidate text classification model. The model evaluation information includes, but is not limited to, any one of ACC (accuracy rate), F1 (harmonic mean of accuracy rate and recall rate), loss (loss function), and the like. Model evaluation information such as an ACC value, an F1 value and a loss value characterizes the generalization ability of the candidate text classification model.
S103, determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model.
And selecting the candidate text classification model with the strongest generalization capability from the candidate text classification models according to the model evaluation information corresponding to each candidate text classification model, and determining the candidate text classification model as the target text classification model.
In some embodiments, taking the model evaluation information as an ACC value as an example, obtaining an ACC value corresponding to each candidate text classification model, and determining the candidate text classification model corresponding to the largest ACC value among the ACC values as the target text classification model.
In other embodiments, for example, the loss value is taken as the model evaluation information, the loss value corresponding to each candidate text classification model is obtained, and the candidate text classification model corresponding to the minimum loss value is determined as the target text classification model.
In some embodiments, as shown in fig. 7, step S102 may be preceded by step S105, and step S102 may include sub-step S1021 and sub-step S1022.
S105, storing model related information corresponding to the candidate text classification models in a database; the model related information comprises a model name, an execution instance ID and an execution result;
s1021, starting a plurality of training machines, and determining candidate text classification models to be trained, which correspond to the training machines at present, based on the model related information stored in the database;
and S1022, inputting the first training data set into the candidate text classification model corresponding to each training machine at present, and performing model training on the candidate text classification model corresponding to each training machine at present.
Illustratively, the model-related information includes a model name, an execution instance ID, an execution result, and the like. And the ID of the execution instance, the execution result and the model name are related and stored in a database.
Illustratively, the model name is stored in the database in a text-type format, the execution instance ID is stored in the database in a text-type format, and the execution result is stored in the database in a floating-point format.
For example, before model training of the candidate text classification model, the execution instance ID is default to null, and the execution result is a preset default value, for example, the default value is-1.0.
For example, the structure of the database may be as follows:
Figure BDA0003329131120000071
Figure BDA0003329131120000081
the N training machines are started by setting the N training machines, the N training machines determine candidate text classification models to be subjected to model training by inquiring the relevant information of the models stored in the database, and the candidate text classification models are subjected to model training according to the first training data set.
The training machine includes, but is not limited to, a GPU (Graphics Processing Unit) or a TPU (temporal Processing Unit) server.
Illustratively, part of the data is selected from the first training data set as a training set, and another part of the data is selected as a verification set, for example, according to the following 4: 1, determining a training set and a verification set according to the data quantity proportion, and respectively carrying out model training on each candidate text classification model according to the training set and the verification set.
Illustratively, the training set and validation set for each training machine remain consistent.
And the N training machines work simultaneously, and model training is carried out on corresponding candidate text classification models in the database based on the same training set and verification set to obtain model evaluation information corresponding to each candidate text classification model.
In some embodiments, determining the candidate text classification model to be trained that each of the training machines currently corresponds to based on the model-related information stored in the database may include:
searching a target execution instance ID which is currently null value through a first training machine traversing the database, updating the null value of the target execution instance ID into ID information corresponding to the first training machine, and returning model name information corresponding to the target execution instance ID; the first training machine is any one of a plurality of training machines;
if the model name information is a null value, returning to execute the operation of traversing the database through the first training machine to inquire the current null value of the target execution instance ID;
and if the model name information is a non-null value, determining a first candidate text classification model corresponding to the model name information as a candidate text classification model to be trained, which is currently corresponding to the first training machine.
Illustratively, the ID information corresponding to the training machine includes, but is not limited to, mac address information of the training machine. Taking any one of the multiple training machines as an example, the training machine traverses the database to query a target execution instance ID which is currently a null value, updates the null value of the target execution instance ID into mac address information of the training machine, and returns model name information corresponding to the target execution instance ID. And if the model name information is a null value, continuously and circularly executing the operation of querying the target execution instance ID which is currently the null value through traversing the database by the training machine, and if the returned model name information is a non-null value, determining a first candidate text classification model corresponding to the returned model name information as a candidate text classification model to be trained which is currently corresponding to the training machine.
In some embodiments, if the model name information is a non-null value, determining the first candidate text classification model corresponding to the model name information as the candidate text classification model to be trained corresponding to the first training machine currently may include:
inputting the first training data set into the first candidate text classification model, and performing model training on the first candidate text classification model to obtain first model evaluation information corresponding to the first candidate text classification model;
updating the execution result corresponding to the first candidate text classification model in the database into the first model evaluation information; before model training is performed on the first candidate text classification model, an execution result corresponding to the first candidate text classification model is a preset default value.
Before the first candidate text classification model is subjected to model training, the execution result corresponding to the first candidate text classification model in the database is a preset default value, such as a default value of-1.0, it should be noted that the default value may be other data than-1.0.
And after the model training of the first candidate text classification model is completed through the training machine, updating the execution result corresponding to the first candidate text classification model in the database into first model evaluation information corresponding to the first candidate text classification model.
After model training is completed on a plurality of candidate text classification models through a plurality of training machines, model evaluation information corresponding to each candidate text classification model can be obtained through querying a database, and then the candidate text classification model with the strongest generalization capability is selected according to the model evaluation information corresponding to each candidate text classification model to be determined as the target text classification model.
The target text classification model is screened from the candidate text classification models through parallel execution work of the training machines, the efficiency of determining the target text classification model is improved, and when the candidate text classification model fails to be trained, fault tolerance can be carried out on the program which fails to be trained. Meanwhile, the training machines can be dynamically stretched to achieve load balance of the plurality of training machines.
For example, as shown in fig. 8, the process of performing model training on the candidate text classification model by the training machine is as follows:
step a, starting a training machine;
step b, determining the instance ID as the local mac address of the training machine;
step c, inquiring a data row with a first execution instance ID being empty from the database, updating the execution instance ID to be a local mac address, updating an execution result to be-1, and returning a model name;
d, determining whether the name of the returned model is empty, if so, returning to execute the step c; if not, executing step e;
step e, performing model training on the candidate text classification model corresponding to the model name;
and f, updating the execution result corresponding to the candidate text classification model.
Illustratively, when the ACC value is used as an evaluation index of model training, the execution result of each candidate text classification model is updated to be the ACC value corresponding to the verification set. Namely, the verification set does not participate in direct training, the candidate text classification model is weight-isolated, and the ACC value corresponding to the verification set is used as the evaluation index of the candidate text classification model.
And S104, inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
After a target text classification model is determined from the candidate text classification models, model training is carried out on the target text classification model according to a second training data set generated after data splicing, data filtering and the like.
Illustratively, the target text classification model is iteratively trained according to preset epoch values. For example, if the preset epoch value is 3, performing 3-stage iterative training on the target text classification model based on the second training data set, thereby obtaining a trained target text classification model. Wherein each unlabeled data in the second training data set corresponds to a data length that does not exceed a data length threshold, e.g., 512 bytes in length.
Illustratively, the determined target text classification model is subjected to model training in a tesla-v100 single-card training environment. Wherein, tesla-v100 is a GPU with Volta architecture.
After the trained target text classification model is obtained, downstream end-to-end task execution can be performed through the trained target text classification model, and texts (such as voice recording text collection) are automatically classified. For example, agent intention classification and customer intention classification are performed through the trained target text classification model.
In this embodiment, a first training data set and a second training data set are obtained, where the first training data set includes label data, and the second training data set includes label-free data, the first training data set is input into a plurality of candidate text classification models, model training is performed on the candidate text classification models, model evaluation information corresponding to each candidate text classification model is obtained, the model evaluation information represents generalization ability of the candidate text classification models, a candidate text classification model with the strongest generalization ability among the candidate text classification models is determined according to the model evaluation information corresponding to each candidate text classification model, and the candidate text classification model is used as a target text classification model, the second training data set is input into the target text classification model, and model training is performed on the target text classification model, so that a trained target text classification model is obtained. The target text classification model has the strongest generalization capability and the best text classification capability, and the target text classification model is subjected to model training through label-free data, so that the trained target text classification model can fully extract the characteristics of the text, and therefore, the text classification is performed through the trained target text classification model, and the accuracy of the text classification is improved.
Referring to fig. 9, fig. 9 is a schematic block diagram of a training apparatus for a text classification model according to an embodiment of the present application, which may be configured in a computer device for executing the aforementioned method for training a text classification model.
As shown in fig. 9, the training apparatus 1000 for text classification model includes: an acquisition module 1001, a first training module 1002, a model determination module 1003, and a second training module 1004.
An obtaining module 1001, configured to obtain a first training data set and a second training data set, where the first training data set includes tag data and the second training data set includes non-tag data;
a first training module 1002, configured to input the first training data set into multiple candidate text classification models respectively, perform model training on the multiple candidate text classification models, and obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
the model determining module 1003 is configured to determine, according to the model evaluation information corresponding to each candidate text classification model, a candidate text classification model with the strongest generalization capability among the plurality of candidate text classification models as a target text classification model;
and a second training module 1004, configured to input the second training data set into the target text classification model, and perform model training on the target text classification model to obtain a trained target text classification model.
In one embodiment, the model determination module 1003 is further configured to:
determining the candidate text classification model corresponding to the maximum ACC value as the target text classification model according to the ACC value corresponding to each candidate text classification model; the larger the ACC value corresponding to the candidate text classification model is, the stronger the generalization capability of the candidate text classification model is.
In one embodiment, the training apparatus 1000 for the text classification model further includes:
the storage module is used for storing the model related information corresponding to the candidate text classification models in a database; the model related information comprises a model name, an execution instance ID and an execution result;
the first training module 1002 is further configured to:
starting a plurality of training machines, and determining candidate text classification models to be trained, which correspond to the training machines at present, based on the model-related information stored in the database;
inputting the first training data set into a candidate text classification model corresponding to each training machine at present, and performing model training on the candidate text classification model corresponding to each training machine at present.
In one embodiment, the first training module 1002 is further configured to:
searching a target execution instance ID which is currently null value through a first training machine traversing the database, updating the null value of the target execution instance ID into ID information corresponding to the first training machine, and returning model name information corresponding to the target execution instance ID; the first training machine is any one of a plurality of training machines;
if the model name information is a null value, returning to execute the operation of traversing the database through the first training machine to inquire the current null value of the target execution instance ID;
and if the model name information is a non-null value, determining a first candidate text classification model corresponding to the model name information as a candidate text classification model to be trained, which is currently corresponding to the first training machine.
In one embodiment, the first training module 1002 is further configured to:
inputting the first training data set into the first candidate text classification model, and performing model training on the first candidate text classification model to obtain first model evaluation information corresponding to the first candidate text classification model;
the storage module is further configured to:
updating the execution result corresponding to the first candidate text classification model in the database into the first model evaluation information; before model training is performed on the first candidate text classification model, an execution result corresponding to the first candidate text classification model is a preset default value.
In one embodiment, the obtaining module 1001 is further configured to:
acquiring a plurality of unprocessed text data;
performing data preprocessing on the plurality of text data to generate a plurality of label-free data, and forming a second training data set by the plurality of label-free data; wherein the data preprocessing comprises at least one of data splicing and data filtering.
In one embodiment, the obtaining module 1001 is further configured to:
performing semantic analysis on the text data, determining the text data corresponding to preset semantics in the text data as non-key text data, and filtering the non-key text data;
and performing data splicing on the plurality of text data with the non-key text data filtered out to obtain the non-label data.
Each module in the training apparatus for the text classification model corresponds to each step in the embodiment of the training method for the text classification model, and the functions and implementation processes thereof are not described in detail herein.
The methods, apparatus, and devices of the present application may be deployed in numerous general-purpose or special-purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
For example, the method and apparatus described above may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 10.
Referring to fig. 10, fig. 10 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.
Referring to fig. 10, the computer device includes a processor and a memory connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.
The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.
The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by the processor causes the processor to perform any of the methods for training a text classification model.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:
acquiring a first training data set and a second training data set, wherein the first training data set comprises label data, and the second training data set comprises label-free data;
respectively inputting the first training data set into a plurality of candidate text classification models, and performing model training on the candidate text classification models to obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model;
and inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
In an embodiment, when the processor determines, according to the model evaluation information corresponding to each candidate text classification model, a candidate text classification model with the strongest generalization capability among the plurality of candidate text classification models as a target text classification model, the processor is configured to:
determining the candidate text classification model corresponding to the maximum ACC value as the target text classification model according to the ACC value corresponding to each candidate text classification model; the larger the ACC value corresponding to the candidate text classification model is, the stronger the generalization capability of the candidate text classification model is.
In one embodiment, the processor, prior to implementing the model training of the plurality of candidate text classification models, is configured to implement:
storing the model related information corresponding to the candidate text classification models in a database; the model related information comprises a model name, an execution instance ID and an execution result;
the processor is configured to, when implementing the model training on the candidate text classification models by inputting the first training data set into the candidate text classification models respectively, implement:
starting a plurality of training machines, and determining candidate text classification models to be trained, which correspond to the training machines at present, based on the model-related information stored in the database;
inputting the first training data set into a candidate text classification model corresponding to each training machine at present, and performing model training on the candidate text classification model corresponding to each training machine at present.
In one embodiment, the processor, when implementing the determining of the candidate text classification model to be trained currently corresponding to each of the training machines based on the model-related information stored in the database, is configured to implement:
searching a target execution instance ID which is currently null value through a first training machine traversing the database, updating the null value of the target execution instance ID into ID information corresponding to the first training machine, and returning model name information corresponding to the target execution instance ID; the first training machine is any one of a plurality of training machines;
if the model name information is a null value, returning to execute the operation of traversing the database through the first training machine to inquire the current null value of the target execution instance ID;
and if the model name information is a non-null value, determining a first candidate text classification model corresponding to the model name information as a candidate text classification model to be trained, which is currently corresponding to the first training machine.
In an embodiment, after the processor determines, if the model name information is a non-null value, the first candidate text classification model corresponding to the model name information as the candidate text classification model to be trained currently corresponding to the first training machine, the processor is configured to implement:
inputting the first training data set into the first candidate text classification model, and performing model training on the first candidate text classification model to obtain first model evaluation information corresponding to the first candidate text classification model;
updating the execution result corresponding to the first candidate text classification model in the database into the first model evaluation information; before model training is performed on the first candidate text classification model, an execution result corresponding to the first candidate text classification model is a preset default value.
In one embodiment, the processor, when effecting acquiring the second training data set, is operable to effect:
acquiring a plurality of unprocessed text data;
performing data preprocessing on the plurality of text data to generate a plurality of label-free data, and forming a second training data set by the plurality of label-free data; wherein the data preprocessing comprises at least one of data splicing and data filtering.
In one embodiment, when the processor performs the data preprocessing on the plurality of text data to generate a plurality of unlabeled data, the processor is configured to perform:
performing semantic analysis on the text data, determining the text data corresponding to preset semantics in the text data as non-key text data, and filtering the non-key text data;
and performing data splicing on the plurality of text data with the non-key text data filtered out to obtain the non-label data.
The embodiment of the application also provides a computer readable storage medium.
The computer-readable storage medium of the present application has stored thereon a computer program which, when being executed by a processor, carries out the steps of the training method of a text classification model as described above.
The computer-readable storage medium may be an internal storage unit of the training apparatus for text classification model or the computer device described in the foregoing embodiments, for example, a hard disk or a memory of the training apparatus for text classification model or the computer device. The computer readable storage medium may also be an external storage device of the training apparatus for text classification model or a computer device, such as a plug-in hard disk equipped on the training apparatus for text classification model or the computer device, a Smart Media Card (SMC), a Secure Digital Card (Secure Digital Card, SD Card), a Flash memory Card (Flash Card), and the like.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention.

Claims (10)

1. A training method of a text classification model is characterized in that the training method of the text classification model comprises the following steps:
acquiring a first training data set and a second training data set, wherein the first training data set comprises label data, and the second training data set comprises label-free data;
respectively inputting the first training data set into a plurality of candidate text classification models, and performing model training on the candidate text classification models to obtain model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model;
and inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
2. The method for training the text classification model according to claim 1, wherein the determining, according to the model evaluation information corresponding to each candidate text classification model, a candidate text classification model with the strongest generalization capability from among the plurality of candidate text classification models as the target text classification model comprises:
determining the candidate text classification model corresponding to the maximum ACC value as the target text classification model according to the ACC value corresponding to each candidate text classification model; the larger the ACC value corresponding to the candidate text classification model is, the stronger the generalization capability of the candidate text classification model is.
3. The method of training a text classification model according to claim 1, wherein before performing model training on the candidate text classification models, the method comprises:
storing the model related information corresponding to the candidate text classification models in a database; the model related information comprises a model name, an execution instance ID and an execution result;
the respectively inputting the first training data set into a plurality of candidate text classification models, and performing model training on the candidate text classification models includes:
starting a plurality of training machines, and determining candidate text classification models to be trained, which correspond to the training machines at present, based on the model-related information stored in the database;
inputting the first training data set into a candidate text classification model corresponding to each training machine at present, and performing model training on the candidate text classification model corresponding to each training machine at present.
4. The method for training the text classification model according to claim 3, wherein the determining the candidate text classification model to be trained, which is currently corresponding to each of the training machines, based on the model-related information stored in the database comprises:
searching a target execution instance ID which is currently null value through a first training machine traversing the database, updating the null value of the target execution instance ID into ID information corresponding to the first training machine, and returning model name information corresponding to the target execution instance ID; the first training machine is any one of a plurality of training machines;
if the model name information is a null value, returning to execute the operation of traversing the database through the first training machine to inquire the current null value of the target execution instance ID;
and if the model name information is a non-null value, determining a first candidate text classification model corresponding to the model name information as a candidate text classification model to be trained, which is currently corresponding to the first training machine.
5. The method for training the text classification model according to claim 4, wherein if the model name information is a non-null value, determining the first candidate text classification model corresponding to the model name information as the candidate text classification model to be trained corresponding to the first training machine currently, comprises:
inputting the first training data set into the first candidate text classification model, and performing model training on the first candidate text classification model to obtain first model evaluation information corresponding to the first candidate text classification model;
updating the execution result corresponding to the first candidate text classification model in the database into the first model evaluation information; before model training is performed on the first candidate text classification model, an execution result corresponding to the first candidate text classification model is a preset default value.
6. A method of training a text classification model according to any one of claims 1 to 5, characterized in that said obtaining a second training data set comprises:
acquiring a plurality of unprocessed text data;
performing data preprocessing on the plurality of text data to generate a plurality of label-free data, and forming a second training data set by the plurality of label-free data; wherein the data preprocessing comprises at least one of data splicing and data filtering.
7. The method for training a text classification model according to claim 6, wherein the pre-processing the plurality of text data to generate a plurality of unlabeled data comprises:
performing semantic analysis on the text data, determining the text data corresponding to preset semantics in the text data as non-key text data, and filtering the non-key text data;
and performing data splicing on the plurality of text data with the non-key text data filtered out to obtain the non-label data.
8. An apparatus for training a text classification model, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first training data set and a second training data set, the first training data set comprises label data, and the second training data set comprises label-free data;
the first training module is used for respectively inputting the first training data set into a plurality of candidate text classification models, performing model training on the candidate text classification models and obtaining model evaluation information corresponding to each candidate text classification model; the model evaluation information represents the generalization capability of the candidate text classification model, and comprises any one of an ACC value, an F1 value and a loss value;
the model determining module is used for determining a candidate text classification model with the strongest generalization capability in the candidate text classification models as a target text classification model according to the model evaluation information corresponding to each candidate text classification model;
and the second training module is used for inputting the second training data set into the target text classification model, and performing model training on the target text classification model to obtain a trained target text classification model.
9. A computer device, wherein the computer device comprises a memory and a processor;
the memory for storing a computer program;
the processor for executing the computer program and for implementing the training method of the text classification model according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when being executed by a processor, carries out the steps of the method of training a text classification model according to any one of claims 1 to 7.
CN202111275208.0A 2021-10-29 2021-10-29 Training method, device and equipment of text classification model and storage medium Pending CN113918720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111275208.0A CN113918720A (en) 2021-10-29 2021-10-29 Training method, device and equipment of text classification model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111275208.0A CN113918720A (en) 2021-10-29 2021-10-29 Training method, device and equipment of text classification model and storage medium

Publications (1)

Publication Number Publication Date
CN113918720A true CN113918720A (en) 2022-01-11

Family

ID=79243913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111275208.0A Pending CN113918720A (en) 2021-10-29 2021-10-29 Training method, device and equipment of text classification model and storage medium

Country Status (1)

Country Link
CN (1) CN113918720A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098680A (en) * 2022-06-29 2022-09-23 腾讯科技(深圳)有限公司 Data processing method, data processing apparatus, electronic device, medium, and program product
CN117371428A (en) * 2023-09-25 2024-01-09 百度国际科技(深圳)有限公司 Text processing method and device based on large language model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490242A (en) * 2019-08-12 2019-11-22 腾讯医疗健康(深圳)有限公司 Training method, eye fundus image classification method and the relevant device of image classification network
CN112528029A (en) * 2020-12-29 2021-03-19 平安普惠企业管理有限公司 Text classification model processing method and device, computer equipment and storage medium
US20210192387A1 (en) * 2019-12-21 2021-06-24 Aptology, Inc. Machine and deep learning process modeling of performance and behavioral data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490242A (en) * 2019-08-12 2019-11-22 腾讯医疗健康(深圳)有限公司 Training method, eye fundus image classification method and the relevant device of image classification network
US20210192387A1 (en) * 2019-12-21 2021-06-24 Aptology, Inc. Machine and deep learning process modeling of performance and behavioral data
CN112528029A (en) * 2020-12-29 2021-03-19 平安普惠企业管理有限公司 Text classification model processing method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEBASTIAN RASCHKA: "深度|机器学习中的模型评价、模型选择及算法选择", pages 1 - 6, Retrieved from the Internet <URL:https://cloud.tencent.com/developer/article/1110144> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098680A (en) * 2022-06-29 2022-09-23 腾讯科技(深圳)有限公司 Data processing method, data processing apparatus, electronic device, medium, and program product
CN117371428A (en) * 2023-09-25 2024-01-09 百度国际科技(深圳)有限公司 Text processing method and device based on large language model

Similar Documents

Publication Publication Date Title
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN108197098B (en) Method, device and equipment for generating keyword combination strategy and expanding keywords
CN110782123B (en) Matching method and device of decision scheme, computer equipment and storage medium
CN113918720A (en) Training method, device and equipment of text classification model and storage medium
CN112732899A (en) Abstract statement extraction method, device, server and computer readable storage medium
CN112069795A (en) Corpus detection method, apparatus, device and medium based on mask language model
CN108959550B (en) User focus mining method, device, equipment and computer readable medium
Fazayeli et al. Towards auto-labelling issue reports for pull-based software development using text mining approach
CN113190675A (en) Text abstract generation method and device, computer equipment and storage medium
Leonandya et al. A semi-supervised algorithm for Indonesian named entity recognition
CN116402630A (en) Financial risk prediction method and system based on characterization learning
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
CN111400340A (en) Natural language processing method and device, computer equipment and storage medium
CN112988964A (en) Text prosody boundary prediction method, device, equipment and storage medium
CN114238740A (en) Method and device for determining agent brand of agent main body
CN111967253A (en) Entity disambiguation method and device, computer equipment and storage medium
CN116451688A (en) Chinese word segmentation method, device, server and storage medium
CN113919357A (en) Method, device and equipment for training address entity recognition model and storage medium
CN115859121A (en) Text processing model training method and device
CN114817523A (en) Abstract generation method and device, computer equipment and storage medium
CN113722446A (en) Power system operation data generation method and device and computer equipment
CN117688140B (en) Document query method, device, computer equipment and storage medium
CN112632981A (en) New word discovery method and device
CN112579774B (en) Model training method, model training device and terminal equipment
CN111309572B (en) Test analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination