CN113010678A

CN113010678A - Training method of classification model, text classification method and device

Info

Publication number: CN113010678A
Application number: CN202110288663.8A
Authority: CN
Inventors: 刘晨晖; 钟辉强; 黄强; 徐思琪; 周厚谦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-06-22

Abstract

The disclosure discloses a training method of a classification model, and relates to the field of natural language processing, in particular to the field of machine learning. The specific implementation scheme is as follows: defining a sequence of tags as text information, wherein the sequence of tags is composed of predefined class tags; acquiring a first data set, and defining a sample in the first data set as problem information; and training the preset model by respectively using the text information and the question information as the text input and the question input of the preset model so as to obtain a classification model.

Description

Training method of classification model, text classification method and device

Technical Field

The present disclosure relates to the field of natural language processing, and more particularly, to the field of machine learning. In particular, it relates to a training method of a classification model, a text classification method, a training apparatus of a classification model, a text classification apparatus, an electronic device, a non-transitory computer-readable storage medium having stored thereon computer instructions, and a computer program product.

Background

An abstract class may be understood as an aggregate result of samples with similar characteristics, while a real class may be understood as a particular, unambiguous class definition. Assuming that an existing batch of phrase text needs to be classified, the category labels include fruit, vegetables, meat, and so on. In the definition of the categories, the real category "fruit" includes watermelon, cantaloupe and papaya. We hope that the classification model can classify watermelon, Hami melon and pawpaw into the real category of fruit when prediction is carried out. In fact, after a large amount of data training, the classification model can really aggregate watermelon, cantaloupe and papaya into the same category. This class resulting from the aggregation is referred to as the "abstract class". The abstract class is just the product of the classification model aggregating samples with similar features, and has no clear class definition. So the "abstract class" obtained by the classification model can be "fruit", or "melon" or other logical class definitions.

Disclosure of Invention

The present disclosure provides a training method of a classification model, a text classification method, an apparatus, a device, a storage medium and a computer program product.

According to an aspect of the present disclosure, a training method of a classification model is provided. The method comprises the following steps: defining a sequence of tags as textual information, wherein the sequence of tags is composed of predefined category tags; acquiring a first data set, and defining a sample in the first data set as problem information; and training a preset model by respectively using the text information and the question information as the text input and the question input of the preset model so as to obtain the classification model.

According to another aspect of the present disclosure, a method of text classification is provided. The method comprises the following steps: defining a sequence of tags as textual information, wherein the sequence of tags is composed of predefined category tags; acquiring a text to be classified, and defining the text to be classified as problem information; and inputting a classification model by taking the text information and the question information as text input and question input respectively so as to obtain a classification result aiming at the text to be classified.

According to another aspect of the present disclosure, a training apparatus for a classification model is provided. The device includes: the system comprises a first preprocessing module, a first display module and a second preprocessing module, wherein the first preprocessing module is used for defining a label sequence as text information, and the label sequence is formed by predefined classification labels; the second preprocessing module is used for acquiring a first data set and defining samples in the first data set as problem information; and the training module is used for training the preset model by respectively taking the text information and the question information as the text input and the question input of the preset model so as to obtain the classification model.

According to another aspect of the present disclosure, a text classification apparatus is provided. The device includes: a third preprocessing module, configured to define a tag sequence as text information, where the tag sequence is composed of predefined classification tags; the fourth preprocessing module is used for acquiring texts to be classified and defining the texts to be classified as problem information; and the classification module is used for respectively taking the text information and the question information as text input and question input and inputting a classification model so as to obtain a classification result aiming at the text to be classified.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1A illustrates a training method and apparatus system architecture for a classification model suitable for embodiments of the present disclosure;

FIG. 1B illustrates a scene diagram of a text classification method and apparatus in which embodiments of the present disclosure may be implemented;

FIG. 2 illustrates a flow chart of a method of training a classification model according to an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of training a BERT model according to an embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram of a text classification method according to an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a decimated text classification method based on a reading understanding framework according to an embodiment of the present disclosure;

FIG. 6 illustrates a block diagram of a training apparatus for a classification model according to an embodiment of the present disclosure;

fig. 7 illustrates a block diagram of a text classification apparatus according to an embodiment of the present disclosure; and

fig. 8 illustrates a block diagram of an electronic device for implementing a training method and/or a classification method of a classification model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be understood that the "abstract class" predicted by the classification model is not necessarily equal to the "real class", and can be infinitely approximated to the "real class" only if the sample space of the training data is sufficient. However, in a real production environment, the sample space of the training data may not be sufficient, so that a deviation between the "abstract class" and the "real class" is certain to exist.

Currently, the Text classification methods provided in the related art mainly include a rule matching method, a traditional machine learning method (such as svm and xgboost methods), and a deep learning method (such as Text CNN and RNN).

It should be appreciated that the rule matching method described above requires rule summarization based on expert experience, and thus, such a method is time-consuming and labor-intensive. In practical application, the generalization of the classification method based on rule matching is not guaranteed. For example, a matching rule for a forward word or a backward word is often ineffective in a text including a word such as "negative" or "metaphor". This phenomenon indicates that the classification method based on rule matching has poor generalization and the accuracy and coverage of the classification model are very limited.

It should also be understood that the traditional machine learning method belongs to a shallow semantic feature modeling method, and the modeling process can be generalized to two stages combining artificial feature engineering with a shallow classification model. The quality of the artificial characteristic engineering usually has direct influence on the final classification effect of the classification model. Meanwhile, the shallow classification model also has the mining capability of deep semantic features. Therefore, the text classification effect of the traditional machine learning method quickly reaches the ceiling.

It should also be appreciated that deep learning is the most effective text classification solution in recent years. The deep learning method fully automatically excavates deep semantic features of a text through layer-by-layer combination of various different types of neural networks, and can provide more accurate classification prediction compared with the traditional machine learning method. However, because of the problem definition and modeling manner of the conventional text classification, the current text classification method does not effectively add the relevant information of the classification label into the model training as the prior knowledge. Therefore, the text classification method based on deep learning in the related art still cannot solve the problem of deviation between the abstract class and the real class well.

For example, Machine Reading Comprehension (MRC) is a method based on deep Machine learning modeling. MRC enables machines to have the ability to perform reading and understanding tasks. The MRC is mainly divided into three parts, context, query and answer. A context is simply understood to mean a piece of text, which may be a sentence or a paragraph. A query may be understood as a question for a context, and an answer may be understood as an answer corresponding to the query. The machine reading understanding model is that context and query are used as model input, and then the corresponding answer is output as the model.

In recent two years, some scholars have applied a machine-reading understanding framework to text classification problems. Specifically, in the text classification problem under the machine reading understanding frame, a text to be classified is defined as context input of a model, a category label is defined as query input of the model, and a model output, namely answer, is a corresponding binary classification result, namely whether the text to be classified belongs to a current category label. The class labels are used as a priori knowledge input model under the framework, and can help the model to alleviate the deviation problem between the abstract class and the real class to a certain extent under the condition that the sample space of training data is insufficient. However, the above classification method based on machine reading understanding frame can only complete the task of text two classification, i.e. predicting whether the text to be classified belongs to the current class label. For the text multi-label classification task, an effective solution is not provided at present. This is because the conventional two-classification task is a task that defines the category label as query and performs model training, and cannot adapt to the text multi-label classification task.

The present disclosure provides an improved classification method based on a machine-readable understanding framework, which can not only adapt to two text classification tasks, but also adapt to text multi-label classification tasks (including a text multi-label single classification task and a text multi-label multi-classification task).

The present disclosure will be described in detail below with reference to specific examples.

The system architecture of the training method and apparatus for the classification model of the embodiments of the present disclosure is introduced as follows.

FIG. 1A illustrates a training method and apparatus system architecture for a classification model suitable for embodiments of the present disclosure. It should be noted that fig. 1A is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be used in other environments or scenarios.

As shown in fig. 1A, the system architecture 100 may include a client 101, a server 102, and a database 103.

The user may predefine one or more category labels at the client 101 and send predefined label information to the server 102.

The server 102 may combine the one or more predefined category tags into a tag sequence based on the received tag information, and define the tag sequence as text information (context). Meanwhile, the server 102 may obtain training data (e.g., text samples) from a sample space of the database 103, and define the training data as question information (query). Further, the server 102 may input the tag sequence as context of the machine reading understanding model, and input the training data as query of the machine reading understanding model, and train the machine reading understanding model to obtain the classification model of the embodiment of the present disclosure.

It should be understood that the number of clients, servers, and databases in FIG. 1A is merely illustrative. There may be any number of clients, servers, and databases, as desired for implementation.

Application scenarios of the training method and apparatus for classification models suitable for embodiments of the present disclosure are described below.

Fig. 1B illustrates a scene diagram of a training method and apparatus for a classification model, which can implement embodiments of the present disclosure.

As shown in fig. 1B, assuming that an existing set of phrase texts needs to be classified, the category labels include fruits, vegetables, meats, and so on. In the definition of the categories, the real category "fruit" includes watermelon, cantaloupe and papaya. We hope that the classification model can classify watermelon, Hami melon and pawpaw into the real category of fruit when prediction is carried out. In fact, after a large amount of data training, the classification model can really aggregate watermelon, cantaloupe and papaya into the same category. This class resulting from the aggregation is referred to as the "abstract class". The abstract class is just the product of the classification model aggregating samples with similar features, and has no clear class definition. The "abstract class" obtained by the classification model may be "fruit", or "melon" or other logical class definitions (e.g., "fruit with a larger volume of melon" as shown in fig. 1B). That is, the "abstract class" predicted by the model is not necessarily equal to the "real class". In the prior art, only under the condition that the sample space of training data is sufficient, the abstract class obtained by model training can infinitely approximate to the real class. In this scenario, by using the classification model training method provided by the embodiment of the present disclosure, even in a real production environment, in the face of a situation where the sample space of the training data is not sufficient, the problem of the deviation between the "abstract class" and the "real class" can be alleviated as much as possible.

It should be understood that the classification model obtained by the classification model training method provided by the embodiment of the present disclosure can be used in text multi-label classification business scenarios (e.g., government affair news classification, news event classification, etc.) in public opinion analysis projects, and can also be easily extended to other similar text multi-label classification application scenarios.

Experiments prove that the text classification method provided by the embodiment of the disclosure is applied to the classification field of the news and government affairs industry in public opinion analysis projects, and has an obvious improvement effect on the classification effect.

The method is used for public opinion analysis, internet public opinion can be analyzed in all directions, data and analysis capabilities such as real-time public opinion, semantic analysis, search index and event venation are provided, and users are helped to grasp current affair pulse. In a public opinion analysis scenario, classification of news events is often involved. The classification of news events may be abstracted into a text multi-tag classification task. Through the embodiment of the disclosure, the deviation problem between the abstract category and the real category can be well processed when the news events are classified, so that the classification is more accurate, and the public sentiment of the user is better facilitated.

According to an embodiment of the present disclosure, a method for training a classification model is provided.

Fig. 2 illustrates a flow chart of a training method of a classification model according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the classification model may include: operation S210 to operation S230.

In operation S210, a tag sequence is defined as text information. Wherein the sequence of tags is comprised of predefined class tags.

In operation S220, a first data set is acquired, and a sample in the first data set is defined as problem information.

In operation S230, the text information obtained in operation S210 and the question information obtained in operation S220 are used as a text input (i.e., a context input) and a question input (i.e., a query input) of a preset model, respectively, to train the preset model, so as to obtain a classification model.

It should be appreciated that the existing text classification task based on machine-reading understanding framework defines the training data (text sample) as context input of the machine-reading understanding model and the classification label as query input of the machine-reading understanding model, and thus this effectively defines the text classification problem.

The disclosed embodiments are the exact opposite, i.e., defining the training data (text samples) as the query input to the machine-reading understanding model and the one or more classification tags as the context input to the machine-reading understanding model, which has in effect converted the traditional text classification problem into a decimated machine-reading understanding problem.

On one hand, under a machine reading understanding framework, the label sequence is used as prior knowledge to be explicitly input into the model, a multi-head attention mechanism of the model can be optimized, the model is focused on a key area of the text, and therefore the deviation problem between the abstract category and the real category can be effectively relieved.

For example, in the existing text-two classification task, it can be only found that the current text belongs to the current classification label or does not belong to the current classification label, which results in that the real category of the text may not be accurately determined. In the embodiment of the disclosure, a tag sequence formed by combining a plurality of classification tags is defined as context input of the model, a text to be classified is defined as query input of the model, and the classification tags to which the text to be classified belongs can be extracted, so that the real class of the text can be accurately determined, and the deviation problem between the abstract class and the real class can be effectively relieved.

On the other hand, on the basis of a general machine reading understanding framework, the embodiment of the disclosure may define one category label as one label sequence, or may define a plurality of category labels as one label sequence. Under the condition that a plurality of classification labels are defined in the label sequence, an extraction type text classification task can be completed on the basis of a machine reading understanding frame. Thus, embodiments of the present disclosure can address the task of text multi-label classification. Further, the embodiment of the present disclosure may extract only one classification tag at a time, or may extract a plurality of classification tags at a time. Thus, the embodiment of the disclosure can simultaneously solve the text multi-label single classification task and the text multi-label multi-classification task.

For example, in embodiments of the present disclosure, the following category labels may be predefined: fruits, vegetables, meats, etc., and these classification tags are combined into the following tag sequence "fruit # vegetable # meat #. When model training is carried out, a label sequence ' fruit # vegetable # meat #.. > is input into a machine reading understanding model as context, and training data related to ' watermelon, Hami melon, pawpaw and the like ' is also input into the machine reading understanding model as query, so that the machine reading understanding model is trained to obtain a corresponding classification model. By utilizing the classification model, under the condition that the text to be classified relates to watermelon, Hami melon or pawpaw, the text can be clearly classified into a fruit class, so that the problem of deviation between abstract classification and real classification can be effectively relieved.

By the embodiment of the disclosure, the text classification problem is converted into the reading understanding problem, and model training can be performed without marking training data (namely, without marking samples), so that the obtained classification model can realize extraction type text classification, and the problem of deviation between abstract classification and real classification is solved more effectively.

In addition, the embodiment of the disclosure provides an extraction type text classification method for the text multi-label classification problem on the basis of combining a machine reading understanding frame. Therefore, the embodiment of the disclosure can effectively improve the text classification effect and simultaneously can give consideration to different text classification tasks. For example, flexible switching can be performed among a text two-classification task, a text multi-label single-classification task and a text multi-label multi-classification task under the same classification model structure.

In addition, in the existing text classification model based on the machine reading understanding framework, in order to achieve the same classification effect as that of the embodiment of the present disclosure in the text multi-label classification task, the number of parameters of the model is increased along with the increase of the number of classification labels, and therefore, the classification efficiency of the model is also reduced. In contrast, since the parameter quantity of the classification model trained by the embodiment of the present disclosure is constant, text classification using the classification model may improve text classification efficiency.

As an alternative embodiment, the tag sequence may be obtained by the following operations.

A predefined at least one category label is obtained.

The at least one category label is ranked into an initial sequence.

Based on the initial sequence, a tag spacer is inserted between every two adjacent classification tags to obtain a tag sequence.

In one embodiment, the above operations may be performed at a client to obtain a sequence of tags.

Alternatively, in another embodiment, the above operations may also be performed at the server to obtain the tag sequence. Specifically, the user may predefine one or more category labels at the client and send predefined label information to the server. The server may combine the predefined one or more category labels into a label sequence based on the received label information. More specifically, all the predefined classification tags may be randomly or sequentially arranged into an initial sequence, and then a tag spacer may be inserted between every two adjacent classification tags in the initial sequence to obtain a final tag sequence.

It should be noted that the tag spacer may be any character or character string different from the character included in the predefined category tag in the tag sequence, and the embodiment of the disclosure is not limited herein.

Illustratively, n classification labels (e.g., label 1, label 2,. # label n) may be predefined and concatenated using "#" as a label spacer to form a label sequence "label 1# label 2#.. # label n" required to train the model.

The label sequence obtained by the preprocessing can be used for conveniently labeling and identifying the extracted classification labels when a model is trained or a classification result is predicted by using the trained model.

As an alternative embodiment, the preset model may include: BERT model.

The BERT (bidirectional Encoder reproduction from transform) model is a text pre-training model and is also a text Representation model. The BERT model may dynamically compute context-dependent vector representations of words based on contextual changes of the input text. In the structure of the BERT model, a plurality of layers of bidirectional Transformer encoders exist. Taking the basic version of the BERT model as an example, the BERT model comprises a 12-layer bidirectional Transformer structure

As shown in fig. 3, training the BERT model includes two processes, pre-training (pre-training) and fine-tuning (fine-tuning). The pre-training refers to adjusting parameters (called pre-training parameters) of the BERT model through two unsupervised pre-training, and the fine tuning refers to taking the pre-training parameters as initialization parameters of the BERT model and then adjusting the model parameters in a supervised manner according to actual requirements of downstream tasks. It should be understood that, in the embodiment of the present disclosure, in the case that the BERT model is taken as a preset model, performing the related operations in the training method of the classification model described above may perform parameter fine tuning on the BERT model to obtain the classification model required by the embodiment of the present disclosure.

It should be noted that the pre-training process of the BERT model may include two unsupervised pre-training tasks. These two pre-training tasks include: mask Language Model (MLM) and Next Sentence Prediction (NSP) tasks.

For MLM, masking refers to randomly masking some words on the input corpus during training, and then predicting the masked words through the context information of the text. This task has some similarities to the word2vec model in order for the model to be able to learn the associations between words. In the BERT experiment, 15% of the words were randomly masked. In the 15% of the masked words, 80% of the cases are replaced by placeholders (MASKs), 10% of the cases are replaced by other arbitrary words, and the remaining 10% of the cases remain the original words. The reason for this is that if those 15% of the masked words are replaced with placeholders (MASKs), then the (MASKs) will affect the performance of the model at downstream task tweaks.

For the next sentence prediction task, the purpose of the NSP task is to enable the model to learn sentence-level associations. For example, this task may determine whether sentence B is a context of sentence a. If so, "IsNext" is output. Otherwise, output NotNext. The training data is generated by selecting two words A and B from the corpus, wherein 50% of the probabilities A and B conform to the IsNext relationship, and 50% of the probabilities A and B are randomly extracted from the corpus. That is, randomly extracted A and B indicate that they may conform to "IsNext" or "NotNext". After the pre-training of the NSP task, the BERT model can effectively solve downstream tasks related to sentence relations, such as Question Answering (Question Answering) and Natural Language Inference (Natural Language Inference).

Through the embodiment of the disclosure, parameter fine adjustment can be performed on the basis of the BERT model, and the classification model required by the embodiment of the disclosure can be obtained without pre-training.

According to an embodiment of the present disclosure, there is provided a text classification method.

Fig. 4 illustrates a flow chart of a text classification method according to an embodiment of the present disclosure.

As shown in fig. 4, the classification method 400 may include: operation S410 to operation S430.

In operation S410, a tag sequence is defined as text information. Wherein the sequence of tags is comprised of predefined class tags.

In operation S420, a text to be classified is acquired and defined as question information.

In operation S430, the text information and the question information are input as a text input and a question input, respectively, into the classification model to obtain a classification result for the text to be classified.

It should be noted that the classification model used in the embodiment of the present disclosure may be a model obtained by a training method of the classification model described in any one of the above embodiments of the present disclosure, and a specific training process is not described herein again.

The Text Classification method provided by the embodiment of the disclosure is a Reading understanding frame-based Extraction-based Text Classification method (MRCE-TC for short). The MRCE-TC method can be used for converting the traditional text multi-label classification problem into an extraction type machine reading understanding problem.

Under the condition that a plurality of classification labels are predefined in a label sequence, aiming at the application scene of a text multi-label classification task, the traditional text multi-label classification task can be defined as an extraction type labeling task under a machine reading understanding framework through the MRCE-TC text classification method provided by the embodiment of the disclosure. Therefore, the problem of deviation between the abstract category and the real category is effectively relieved, and meanwhile, the processing efficiency of the classification model under the text multi-label classification task can be improved. This is because, in the existing text classification model based on the machine reading understanding framework, in order to achieve the same classification effect as that of the embodiment of the present disclosure in the text multi-label classification task, the number of parameters of the model increases with the increase of the number of classification labels, and thus the classification efficiency of the model also decreases. In contrast, since the parameter quantity of the classification model trained by the embodiment of the present disclosure is constant, text classification using the classification model may improve text classification efficiency.

Illustratively, data pre-processing may be performed prior to making the classification predictions. Wherein, the data preprocessing comprises the following steps: the query and context of the model input are defined. In the MRCE-TC algorithm, a character string corresponding to a text to be classified is defined as query input of a classification model, and a label sequence formed by combining predefined classification labels is defined as context input of the classification model.

Therein, n classification tags may be predefined, such as tag 1, tag 2. And these classification labels can be concatenated using "#" as the label spacer, forming the label sequence "label 1# label 2# -.. # label n" required by the classification model.

Then, the preprocessed query (text to be classified) and context (label sequence) are input into a classification model (such as a BERT model obtained by training based on the foregoing embodiment), and a corresponding answer is obtained through automatic feature learning and predictive computation of the classification model. In the MRCE-TC algorithm, the answer output by the classification model refers to the labeling sequence output by the model for the tag sequence (context). Based on the labeling sequence, a classification result about the text to be classified can be determined.

By the embodiment of the disclosure, the text classification problem is converted into the reading understanding problem, and the extraction type text classification can be realized, so that the deviation problem between the abstract classification and the real classification can be effectively relieved.

In addition, in the existing text classification model based on the machine reading understanding framework, in order to achieve the same classification effect as that of the embodiment of the present disclosure in the text multi-label classification task, the number of parameters of the model is increased along with the increase of the number of classification labels, and therefore, the classification efficiency of the model is also reduced. In contrast, since the number of parameters of the employed classification model is constant (the training method employed to obtain the classification model determines that the number of parameters thereof is constant), the text classification efficiency can be improved by the embodiments of the present disclosure.

It should be understood that in the MRCE-TC algorithm, the tag sequence is also used as a priori knowledge and is input into the classification model in the form of context, and in combination with a multi-head attention mechanism of the classification model such as BERT model, the classification model can effectively put attention on the text and related tags to be classified, so that the problem of deviation between the "abstract class" and the "real class" predicted by the model can be effectively alleviated.

It should also be understood that the task of text multi-label classification is the key direction of natural language processing research, and is also an important technical field for natural language related products such as public sentiment and the like to fall on the ground. The MRCE-TC algorithm provided by the embodiment of the disclosure is efficiently combined with the technical frontier and is simple to apply.

As an alternative embodiment, inputting the text information defined by the label sequence and the question information defined by the text to be classified as the text input and question input of the classification model respectively to obtain the classification result for the text to be classified may include the following operations.

And respectively taking the text information defined by the label sequence and the question information defined by the text to be classified as text input and question input, and inputting a classification model to obtain a labeling sequence aiming at the label sequence.

And determining a classification result aiming at the text to be classified based on the labeling sequence.

In the embodiment of the present disclosure, in the labeling sequence, the first character of the extracted category label may be labeled with one character (first character), the non-first character of the extracted category label may be labeled with another character (second character), and the character string and the label spacer corresponding to the category label that is not extracted may be labeled with yet another character (third character).

For example, the tag sequence may be labeled with three characters, b (begin), i (inside), o (other), to obtain the corresponding labeled sequence. Where B indicates the starting character of the target segment (the first character of the drawn classification label), I indicates the non-starting character of the target segment (the non-first character of the drawn classification label), and O indicates the non-target segment (the character string and the label spacer to which the drawn classification label does not correspond). The target segment comprises character strings corresponding to all the extracted classification labels in the label sequence.

In the embodiment of the disclosure, based on the labeling sequence, the classification result predicted by the classification model can be automatically obtained. As shown in fig. 5, in the multi-label multi-classification task, the classification model aims at the input "news text to be classified" character string and "label 1# label 2#.. # label n", and the given prediction classification labels are "label 1" and "label 2", then the resulting labeling sequence in this case should be "biiobio. Therefore, the prediction classification aiming at the 'news text to be classified' can be analyzed by the marking sequence to be the corresponding classification of 'label 1' and 'label 2'.

By the embodiment of the disclosure, the multi-label text classification task problem is converted into the extraction type reading understanding problem (extraction type text classification problem), and the labeling sequence can be generated aiming at the extraction type target segment, so that the problem of deviation between abstract classification and real classification can be relieved, and the classification efficiency can be improved.

Further, as an optional embodiment, determining a classification result for the text to be classified based on the annotation sequence may include the following operations.

Based on the labeling sequence, at least one extracted classification label in the label sequence is determined.

Based on the at least one classification tag, a classification result of the text to be classified is determined.

Illustratively, with continued reference to the above example, if the tag sequence input by the model is "tag 1# tag 2#.. # tag n", and the tag sequence output after the model prediction is "biiobio..... OOO", then the tag sequence can resolve that the prediction classification for "news text to be classified" is the corresponding category of "tag 1" and "tag 2".

Or, as an optional embodiment, the method further includes: in the labeling sequence, labeling the characters of the extracted classification labels with first characters; and labeling the characters and label spacers of the non-extracted classification labels with a second character.

Illustratively, the first character may include X and the second character may include O. With continued reference to the above example, in a multi-label multi-classification task, the classification model targets the input "news text to be classified" string and "label 1# label 2#. # No. label n", and the given predictive classification labels are "label 1" and "label 2", then the resulting labeled sequence in this case would be "xxxoxxso. Therefore, the prediction classification aiming at the 'news text to be classified' can be analyzed by the marking sequence to be the corresponding classification of 'label 1' and 'label 2'.

According to the embodiment of the disclosure, the disclosure further provides a training device of the classification model.

Fig. 6 illustrates a block diagram of a training apparatus for a classification model according to an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 for classification model includes: a first pre-processing module 610, a second pre-processing module 620, and a training module 630.

A first preprocessing module 610 for defining a sequence of tags as text information, wherein the sequence of tags is composed of predefined category tags.

The second preprocessing module 620 is configured to obtain a first data set, and define a sample in the first data set as problem information.

A training module 630, configured to train the preset model by using the text information and the question information as a text input and a question input of the preset model, respectively, so as to obtain the classification model.

As an alternative embodiment, the apparatus further comprises: an obtaining module, configured to obtain the tag sequence by: acquiring at least one predefined classification label; ranking the at least one category label into an initial sequence; and inserting a tag spacer between every two adjacent classification tags based on the initial sequence to obtain the tag sequence.

As an alternative embodiment, the preset model includes: BERT model.

According to an embodiment of the present disclosure, the present disclosure also provides a classification apparatus.

Fig. 7 illustrates a block diagram of a classification apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the sorting apparatus 700 includes: a third pre-processing module 710, a fourth pre-processing module 720, and a classification module 730.

A third preprocessing module 710 for defining a sequence of tags as text information, wherein the sequence of tags is composed of predefined class tags.

The fourth preprocessing module 720 is configured to obtain a text to be classified, and define the text to be classified as question information.

The classification module 730 is configured to input the text information and the question information as a text input and a question input, respectively, into a classification model, so as to obtain a classification result for the text to be classified.

As an alternative embodiment, the classification module comprises: the classification unit is used for respectively taking the text information and the question information as text input and question input, and inputting a classification model to obtain a labeling sequence aiming at the label sequence; and the determining unit is used for determining a classification result aiming at the text to be classified based on the labeling sequence.

As an alternative embodiment, the determining unit includes: a first determining subunit, configured to determine, based on the tagging sequence, at least one extracted classification tag in the tag sequence; and a second determining subunit, configured to determine a classification result of the text to be classified based on the at least one classification tag.

As an alternative embodiment, the classification module uses the classification model to label the character of the extracted classification label with the first character in the labeling sequence; and labeling the characters and label spacers of the non-extracted classification labels with a second character.

It should be understood that, in the embodiment of the present disclosure, the embodiment of the apparatus portion is the same as or similar to the embodiment of the method portion, and the achieved technical effects and functions are also the same as or similar to each other, which are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the training method and the classification method of the classification model. For example, in some embodiments, both the training method and the classification method of the classification model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method or classification method of the classification model described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method and/or the classification method of the classification model in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a classification model comprises the following steps:

defining a sequence of tags as textual information, wherein the sequence of tags is composed of predefined category tags;

acquiring a first data set, and defining a sample in the first data set as problem information; and

and training the preset model by respectively using the text information and the question information as the text input and the question input of the preset model so as to obtain the classification model.

2. The method of claim 1, wherein the tag sequence is obtained by:

acquiring at least one predefined classification label;

arranging the at least one category label into an initial sequence; and

based on the initial sequence, inserting a tag spacer between every two adjacent classification tags to obtain the tag sequence.

3. The method of claim 1, wherein the preset model comprises: BERT model.

4. A method of text classification, comprising:

acquiring a text to be classified, and defining the text to be classified as problem information; and

and inputting a classification model by respectively taking the text information and the question information as text input and question input so as to obtain a classification result aiming at the text to be classified.

5. The method of claim 4, wherein inputting the text information and the question information as a text input and a question input, respectively, into a classification model to obtain a classification result for the text to be classified comprises:

inputting the text information and the question information as text input and question input respectively into a classification model to obtain a labeling sequence aiming at the label sequence; and

6. The method of claim 5, wherein determining a classification result for the text to be classified based on the annotation sequence comprises:

determining at least one extracted classification tag in the tag sequence based on the labeling sequence; and

determining a classification result of the text to be classified based on the at least one classification tag.

7. The method of claim 4, further comprising: in the noted sequence, the sequence of labels,

labeling the characters of the extracted classification labels with first characters; and

the characters and label spacers of the non-extracted category labels are labeled with a second character.

8. A training apparatus for classification models, comprising:

the system comprises a first preprocessing module, a first display module and a second preprocessing module, wherein the first preprocessing module is used for defining a label sequence as text information, and the label sequence is formed by predefined classification labels;

the second preprocessing module is used for acquiring a first data set and defining samples in the first data set as problem information; and

and the training module is used for training the preset model by respectively taking the text information and the question information as the text input and the question input of the preset model so as to obtain the classification model.

9. The apparatus of claim 8, further comprising: an obtaining module configured to obtain the tag sequence by:

acquiring at least one predefined classification label;

arranging the at least one category label into an initial sequence; and

10. The apparatus of claim 8, wherein the preset model comprises: BERT model.

11. A text classification apparatus comprising:

a third preprocessing module, configured to define a tag sequence as text information, where the tag sequence is composed of predefined classification tags;

the fourth preprocessing module is used for acquiring texts to be classified and defining the texts to be classified as problem information; and

and the classification module is used for respectively taking the text information and the question information as text input and question input and inputting a classification model so as to obtain a classification result aiming at the text to be classified.

12. The apparatus of claim 11, wherein the classification module comprises:

the classification unit is used for respectively taking the text information and the question information as text input and question input and inputting a classification model so as to obtain a labeling sequence aiming at the label sequence; and

and the determining unit is used for determining a classification result aiming at the text to be classified based on the labeling sequence.

13. The apparatus of claim 12, wherein the determining unit comprises:

a first determining subunit, configured to determine, based on the labeling sequence, at least one extracted classification tag in the tag sequence; and

a second determining subunit, configured to determine a classification result of the text to be classified based on the at least one classification tag.

14. The apparatus of claim 11, wherein the classification module utilizes the classification model in the labeling sequence,

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.