CN114117048A

CN114117048A - Text classification method and device, computer equipment and storage medium

Info

Publication number: CN114117048A
Application number: CN202111436398.XA
Authority: CN
Inventors: 龚静; 苏志锋; 詹乐; 吕有才
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-01

Abstract

The application discloses a text classification method, a text classification device, computer equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method includes the steps of constructing a text classification model based on a UDA network framework, obtaining training samples, wherein loss functions of the text classification model comprise a supervised loss item and an unsupervised loss item, marking the unlabeled samples to obtain pseudo labels, leading the training samples into the text classification model for training, calculating prediction errors based on the supervised loss item and the unsupervised loss item, carrying out iterative updating on an initial text classification model based on the prediction errors, leading text data to be classified into the text classification model when text classification is needed, and outputting a classified text result. In addition, the application also relates to a block chain technology, and the text data to be classified can be stored in the block chain. The method and the device improve the training efficiency of the text classification model, and meanwhile can also ensure the accuracy of the model.

Description

Text classification method and device, computer equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a text classification method and device, computer equipment and a storage medium.

Background

At present, the existing text classification technology in the industry is mainly completed by adopting a supervised learning method, the classification process comprises the steps of text preprocessing, feature extraction, classification model, evaluation and the like, and the supervised learning classification algorithm is generally realized by a multivariate bernoulli model, a polynomial model, a support vector machine or a deep learning method based on naive Bayes, wherein the deep learning method comprises a multilayer perceptron MLP, a convolutional neural network CNN and a cyclic neural network CNN. Although supervised learning is efficient and accurate, it is difficult to be practically applied because the supervised learning method needs a large amount of labeled data for training, and since most of natural language processing tasks belong to cognitive level tasks, the difficulty and uncertainty of data labeling are significantly higher than those of perception level tasks, such as image recognition, voice recognition, and the like.

The method is characterized in that a model in training is used for predicting Label-free data, a category with the highest probability is used as a Pseudo Label of the Label-free data, and then an entropy regularization idea is applied to convert the Label-free data into a regular term of an objective function. In practical application, the unlabeled data with the pseudo label is regarded as the labeled data, and then the cross entropy is used to evaluate the error size. However, in the existing semi-supervised learning method, the predicted pseudo labels are all viewed identically, if the model has lower probability values for prediction categories, the maximum probability value is still used as the corresponding pseudo label, and in practical application, an error signal is introduced into the model, so that the accuracy of the model is influenced.

Disclosure of Invention

The embodiment of the application aims to provide a text classification method, a text classification device, computer equipment and a storage medium, so as to solve the technical problem that the accuracy of a text classification result is low due to the fact that the existing text classification scheme of semi-supervised learning looks like the same with a predicted pseudo label.

In order to solve the above technical problem, an embodiment of the present application provides a text classification method, which adopts the following technical solutions:

a method of text classification, comprising:

receiving a classification model training instruction, and constructing an initial text classification model, wherein a loss function of the initial text classification model comprises a supervision loss item and an unsupervised loss item;

acquiring a first training sample from a preset database, wherein the first training sample comprises a labeled sample and an unlabeled sample;

labeling the unlabeled sample based on a preset open source labeling model to obtain a pseudo label of the unlabeled sample, and combining the labeled sample and the unlabeled sample carrying the pseudo label to obtain a second training sample;

importing the second training sample into the initial text classification model for model training to generate a sample classification result;

calculating an error between the sample classification result and a preset standard classification result based on the supervised loss term and the unsupervised loss term to obtain a prediction error;

based on the prediction error, carrying out iterative updating on the initial text classification model by using a back propagation algorithm to obtain a trained text classification model;

receiving a text classification instruction, acquiring text data to be classified, importing the text data to be classified into the trained text classification model, and outputting a text classification result.

Further, after the step of obtaining the first training sample from the preset database, the method further includes:

classifying the unlabeled exemplars;

and performing data amplification on the classified label-free samples by a preset data amplification method to obtain the amplified samples.

Further, the step of performing data amplification on the classified unlabeled sample by using a preset data amplification method to obtain an amplified sample specifically includes:

performing word segmentation processing on the classified unlabeled sample to obtain sample word segmentation;

obtaining synonyms of the sample participles, and randomly replacing the sample participles through the synonyms to obtain a first sample;

randomly inserting the synonym into the sample participle to obtain a second sample;

randomly exchanging the sample participles to obtain a third sample;

randomly deleting the sample word segmentation to obtain a fourth sample;

combining the first sample, the second sample, the third sample, and the fourth sample to obtain the augmented sample.

and randomly deleting the sample participles, and importing the sample participles subjected to random deletion into a pre-trained text generation model to generate an augmentation sample.

Further, the initial text classification model includes an embedding layer, a convolution layer, and a full-link layer, and the step of introducing the second training sample into the initial text classification model for model training to generate a sample classification result specifically includes:

performing vector feature conversion processing on the second training sample through the embedding layer to obtain an initial vector;

performing convolution operation on the initial vector by using the convolution layer to obtain characteristic data corresponding to the initial vector;

and importing the feature data into the full connection layer for similarity calculation, and outputting the recognition result with the maximum similarity as a sample classification result corresponding to the second training sample.

Further, the step of calculating an error between the sample classification result and a preset standard classification result based on the supervised loss term and the unsupervised loss term to obtain a prediction error specifically includes:

calculating an error between the labeled sample classification result and the first standard classification result based on the supervised loss term to obtain a first error;

and calculating the error between the label-free sample classification result and the second standard classification result based on the unsupervised loss term to obtain a second error.

Further, based on the prediction error, using a back propagation algorithm to iteratively update the initial text classification model to obtain a trained text classification model, specifically comprising:

communicating the prediction error in a network layer of the initial text classification model based on the back propagation algorithm;

acquiring an identification error of a network layer in the initial text classification model, and comparing the identification error with a preset error threshold;

if the identification error is larger than a preset error threshold, iteratively updating the initial text classification model until the identification error is smaller than or equal to the preset error threshold;

and outputting the initial text classification model with the recognition error smaller than or equal to a preset error threshold value to obtain the trained text classification model.

In order to solve the above technical problem, an embodiment of the present application further provides a text classification device, which adopts the following technical solutions:

an apparatus for text classification, comprising:

the model building module is used for receiving a classification model training instruction and building an initial text classification model, wherein a loss function of the initial text classification model comprises a supervision loss item and an unsupervised loss item;

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a first training sample from a preset database, and the first training sample comprises a labeled sample and an unlabeled sample;

the sample labeling module is used for labeling the unlabeled sample based on a preset open source labeling model to obtain a pseudo label of the unlabeled sample, and combining the labeled sample and the unlabeled sample carrying the pseudo label to obtain a second training sample;

the model training module is used for importing the second training sample into the initial text classification model for model training to generate a sample classification result;

the loss prediction module is used for calculating the error between the sample classification result and a preset standard classification result based on the supervised loss term and the unsupervised loss term to obtain a prediction error;

the model iteration module is used for carrying out iteration updating on the initial text classification model by using a back propagation algorithm based on the prediction error to obtain a trained text classification model;

and the text classification module is used for receiving a text classification instruction, acquiring text data to be classified, importing the text data to be classified into the trained text classification model, and outputting a classification text result.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a method of text classification as claimed in any one of the preceding claims.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of a method of text classification as claimed in any one of the preceding claims.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application discloses a text classification method, a text classification device, computer equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the steps of constructing a semi-supervised text classification model, carrying out semi-supervised training on the text classification model through labeled samples and non-labeled samples carrying pseudo labels, calculating errors of classification results corresponding to the labeled samples through a supervised loss item and a non-supervised loss item of a model loss function, calculating errors of classification results corresponding to the non-labeled samples through the supervised loss item, and independently processing the classification results corresponding to the labeled samples and the non-labeled samples. The text classification model with the strong generalization ability can be trained through a small amount of labeled data and a large amount of unlabeled data, more service scenes are met, the model training efficiency is improved, and meanwhile the accuracy of the model can be guaranteed.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 illustrates a flow diagram for one embodiment of a method of text classification in accordance with the present application;

FIG. 3 illustrates a schematic structural diagram of one embodiment of an apparatus for text classification in accordance with the present application;

FIG. 4 shows a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the

terminal devices

101, 102, and 103, and may be an independent server, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

It should be noted that the method for text classification provided in the embodiments of the present application is generally performed by a server, and accordingly, the apparatus for text classification is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of a method of text classification in accordance with the present application is shown. The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The text classification method comprises the following steps:

s201, receiving a classification model training instruction, and constructing an initial text classification model, wherein a loss function of the initial text classification model comprises a supervision loss item and an unsupervised loss item.

In a specific embodiment of the present application, the text classification model is built based on a UDA (Unsupervised Data assessment) network framework, the model built by the UDA framework should be flat in a space near input Data, the input Data semantics are unchanged and only in a change form, the output of the model can also be basically unchanged, the consistency of the input and output of the model can be ensured, and the precision of text classification can be improved.

The text classification model considers the requirement of online deployment, and based on the UDA framework, BERT _ base and TextCNN are respectively adopted as sub-classification models. The TextcCNN is a lightweight model, pre-training is carried out in advance through hundred million corpus data of Google, and when the TextcCNN is actually applied, corresponding model effects can be obtained only by adjusting parameters such as the size of a convolution kernel and the like according to actual conditions. In order to obtain stronger generalization capability, the BERT model is subjected to secondary pre-training on more than 1000 ten thousand open-source data. Through the UDA framework, the unlabeled data can be fully utilized, so that the existing model can be further improved on the basis of the original performance.

It should be noted that TextCNN is a pre-training convolutional network, BERT is a pre-training language model, and the pre-training aims to train partial models of the middle and bottom layers and commonalities of the downstream tasks in advance, and then train respective models with respective sample data of the downstream tasks, so that the convergence rate can be greatly increased.

Specifically, after receiving a classification model training instruction, the server constructs an initial text classification model based on a UDA network framework, wherein the initial text classification model comprises pre-trained BERT _ base and TextCNN, and constructs a loss function of the initial text classification model, and the loss function of the initial text classification model comprises a supervision loss item and an unsupervised loss item. Wherein, a Supervised Cross-error Loss (Supervised Loss-error) is used for calculating the error of the labeled data, and the common Cross entropy is adopted as the target; an Unsupervised Loss term (Unsupervised Consistency Loss) is used to calculate the error of the unlabeled data.

In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the text classification method operates may receive the classification model training instruction through a wired connection or a wireless connection. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

S202, obtaining a first training sample from a preset database, wherein the first training sample comprises a labeled sample and an unlabeled sample.

Specifically, after completing the construction of the initial text classification model, the server acquires training samples from a preset database, where the training samples are text data collected in advance, such as short texts in the financial field, and the training samples include labeled samples and unlabeled samples, the labeled samples are text data labeled in advance, and the unlabeled samples are text data that is not labeled.

S203, labeling the unlabeled sample based on a preset open source labeling model to obtain a pseudo label of the unlabeled sample, and combining the labeled sample and the unlabeled sample carrying the pseudo label to obtain a second training sample.

The method for predicting the Label-free data comprises the steps that a preset open source labeling model is a Pseudo-Label model, the core of the Pseudo-Label model method is that the Label-free data is predicted by using a model in training and a deep neural network semi-supervised learning method, and the category with the highest probability is used as a Pseudo Label of the Label-free data.

Specifically, after obtaining a first training sample, the server labels a non-label sample in the first training sample based on a preset open source labeling model to obtain a pseudo label of the non-label sample, and combines the labeled sample and the non-label sample carrying the pseudo label to obtain a second training sample.

And S204, importing the second training sample into the initial text classification model for model training to generate a sample classification result.

The initial text classification model is constructed based on a UDA network framework, BERT _ base and TextCNN are respectively adopted as model cores to form the initial text classification model, the BERT _ base is a pre-training language model, and the convergence rate can be greatly accelerated by training partial models of middle and bottom layers and commonalities of downstream tasks in advance. The TextCNN part includes an embedding layer, a convolution layer, and a full-link layer for feature extraction and class prediction.

Specifically, the server introduces a second training sample into the initial text classification model, wherein the second training sample comprises a labeled sample and an unlabeled sample carrying a pseudo label, model training is respectively performed on the initial text classification model through the labeled sample and the unlabeled sample carrying the pseudo label, and a sample classification result is generated, wherein the sample classification result comprises a labeled sample classification result and an unlabeled sample classification result.

S205, calculating the error between the sample classification result and a preset standard classification result based on the supervised loss term and the unsupervised loss term to obtain a prediction error.

Specifically, in the model training process, an error between a sample classification result and a preset standard classification result is calculated through a loss function of an initial text classification model, wherein the loss function of the initial text classification model comprises a supervised loss item and an unsupervised loss item, the supervised loss item is used for calculating the error of labeled data, a common cross entropy is used as a target, and the unsupervised loss item is used for calculating the error of the unlabeled data.

It should be noted that the sample classification result includes a labeled sample classification result and an unlabeled sample classification result, and the standard classification result includes a first standard classification result and a second standard classification result, where the first standard classification result is generated based on the label of the labeled sample, and the second standard classification result is generated based on the pseudo label of the unlabeled sample.

And S206, based on the prediction error, carrying out iterative updating on the initial text classification model by using a back propagation algorithm to obtain a trained text classification model.

The back propagation algorithm, namely a back propagation algorithm (BP algorithm), is a learning algorithm suitable for a multi-layer neuron network, and is established on the basis of a gradient descent method and used for error calculation of a deep learning network. The input and output relationship of the BP network is essentially a mapping relationship: an n-input m-output BP neural network performs the function of continuous mapping from n-dimensional euclidean space to a finite field in m-dimensional euclidean space, which is highly non-linear. The learning process of the BP algorithm consists of a forward propagation process and a backward propagation process. In the forward propagation process, input information passes through the hidden layer through the input layer, is processed layer by layer and is transmitted to the output layer, the backward propagation is converted, the partial derivatives of the target function to the weight of each neuron are calculated layer by layer, and the gradient of the target function to the weight vector is formed to be used as the basis for modifying the weight.

Specifically, the server transmits prediction errors in network layers of the initial text classification model by using a back propagation algorithm, then obtains the recognition error of each network layer, and iteratively updates the initial text classification model according to the recognition errors and a preset error threshold value to obtain a trained text classification model.

And S207, receiving a text classification instruction, acquiring text data to be classified, importing the text data to be classified into the trained text classification model, and outputting a classification text result.

Specifically, after completing model training, the server obtains a model capable of realizing text classification, and when receiving a text classification instruction, the server obtains text data to be classified, directly imports the text data to be classified into the trained text classification model, and outputs a classification text result.

In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the text classification method operates may receive the text classification instruction through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

In the embodiment, the text classification model with the strong generalization ability can be trained by a small amount of labeled data and a large amount of unlabeled data, so that more service scenes are met, the model training efficiency is improved, and the accuracy of the model can be ensured.

classifying the unlabeled exemplars;

It should be noted that, in order to improve the rate and accuracy of model training, training samples need to be classified in advance, and various types of training samples are balanced through data augmentation. For example, short texts in the financial field are classified according to business types, the classification labels are mainly divided into 25 categories, the classification labels are simplified into 7 categories which are enterprise accounts, payment settlement, electronic government affairs, customer manager basic law, credit and financing, cross-border finance and supply chain finance, about 10000 data are selected as training sets for each category, and 1000 data are selected as test sets. There are about 1000 pieces of label data in training set, and the rest is no label data, according to the degree of difficulty and the hot degree of business of data collection, the data volume of enterprise account is the most, and cross-border finance is the least, and the proportion is about 3: 1, in order to ensure the balance of various training samples, data augmentation can be carried out on the training samples.

Specifically, after obtaining the first training sample, the server classifies the unlabeled sample in the first training sample, and performs data amplification on the classified unlabeled sample by using a preset data amplification method to obtain an amplified sample.

It should be additionally noted that, for the labeled samples in the first training sample, in order to ensure sample balance, the server may also perform sample amplification on the classified labeled samples by using a preset data amplification method after classifying the labeled samples, so that various types of labeled samples are balanced.

randomly exchanging the sample participles to obtain a third sample;

randomly deleting the sample word segmentation to obtain a fourth sample;

Specifically, in a specific embodiment of the present application, synonym replacement, random insertion, random exchange, random deletion, and sentence position random replacement can be performed by EDA technology, respectively, to implement data augmentation. In detail, Synonym Replacement (SR): non-stop words are randomly selected from the sentences. Replacing the words with randomly selected synonyms; random Insertion (RI): and randomly finding out a word which does not belong to the stop word set in the sentence, solving a random synonym of the word, and inserting the synonym into a random position of the sentence. Repeating for n times; random exchange (RS): two words in a sentence are randomly selected and their positions are exchanged. Repeating for n times; random Deletion (RD): each word in the sentence is randomly deleted with a probability p.

Among them, eda (easy data augmentation) is a simple data enhancement technique applied to text classification, and is composed of 4 methods, which are: synonym replacement, random insertion, random replacement and random deletion, and the thesis proves that the data enhancement technology using the EDA can obviously improve the model performance in small sample learning, and the accuracy rate of using all training sets in the traditional method is achieved by using 50% of the training sets.

In another specific embodiment of the present application, the text enhancement technology based on context information is also the mainstream technology in recent years, and firstly a trained language model LM is needed, for the original text needing enhancement, a word or a character in the text is randomly removed, then the rest of the text is input into the language model, and topk words predicted by the language model are selected to replace the removed word in the original text, so as to form k new texts, i.e. augmented samples.

In the embodiment, the training samples are classified, the classified samples are subjected to sample amplification based on a preset data amplification method, sample balance is guaranteed, the features of various training samples can be fully learned through a text classification model trained by the amplified training samples, and a higher text classification effect is obtained.

Specifically, the initial text classification model comprises an embedding layer, a convolution layer and a full-connection layer, wherein the embedding layer is used for carrying out vector feature conversion processing on the training samples through a vector conversion port, and the initial vectors corresponding to the second training samples are obtained by directly leading the second training samples into the vector conversion port in the embedding layer for carrying out vector feature conversion processing. And after the convolution layer receives the initial vector, performing convolution operation on the initial vector by using the preset convolution kernel to obtain corresponding characteristic data. The full connection layer comprises a preset classifier, the feature data are imported into the full connection layer, the set classifier is used for calculating the similarity of the feature data, the recognition result with the maximum similarity is output, and the recognition result with the maximum similarity is used as the sample classification result corresponding to the second training sample.

In the above embodiments, the initial text classification model is trained by the training samples by inputting the training samples into the initial text classification model.

Specifically, the server calculates an error between the labeled sample classification result and the first standard classification result based on the supervised loss term to obtain a first error, and calculates an error between the unlabeled sample classification result and the second standard classification result based on the unsupervised loss term to obtain a second error.

Wherein the node is classified according to the labeled sampleThe loss calculation of the fruit is still targeted through cross entropy, a supervision loss item is constructed, and a supervision loss item expression F₁The following were used:

aiming at loss calculation of label-free sample classification results, an expression F of unsupervised loss terms is constructed₂The following were used:

wherein D is_KLDivergence is also called relative entropy if it is for the same random variable x_iWith two separate probability distributions p (x)_i) And q (x)_i) Then D can be used_KLDivergence measures the difference between the two distributions, D_KLThe smaller the value, the more p (x)_i) And q (x)_i) The closer the distribution, the more closely D is used_KLDivergence estimates the difference between the unlabeled sample classification result and the second standard classification result.

In the above embodiment, the error of the classification result corresponding to the labeled exemplar is calculated by the supervised loss term, and the error of the classification result corresponding to the unlabeled exemplar is calculated by the unsupervised loss term, by separately processing the classification results corresponding to the labeled exemplar and the unlabeled exemplar.

Specifically, a prediction error is transmitted in a network layer of an initial text classification model based on a back propagation algorithm, an identification error of the network layer in the initial text classification model is obtained, the identification error is compared with a preset error threshold, if the identification error is larger than the preset error threshold, iterative updating is carried out on the initial text classification model until the identification error is smaller than or equal to the preset error threshold, the initial text classification model with the identification error smaller than or equal to the preset error threshold is output, and a trained text classification model is obtained.

In the embodiment, the prediction error is transmitted in the network layer of the initial text classification model by using a back propagation algorithm, then the identification error of each network layer is obtained, and the initial text classification model is iteratively updated according to the identification error and a preset error threshold value to obtain a fitted text classification model.

In the embodiment, the application discloses a text classification method, and belongs to the technical field of artificial intelligence. The method comprises the steps of constructing a semi-supervised text classification model through a UDA network framework, carrying out semi-supervised training on the text classification model through labeled samples and unlabelled samples carrying pseudo labels, calculating errors of classification results corresponding to the labeled samples through the supervised loss items and the unsupervised loss items of a model loss function, calculating errors of classification results corresponding to the unlabelled samples through the supervised loss items, and independently processing the classification results corresponding to the labeled samples and the unlabelled samples. The text classification model with the strong generalization ability can be trained through a small amount of labeled data and a large amount of unlabeled data, more service scenes are met, the model training efficiency is improved, and meanwhile the accuracy of the model can be guaranteed.

It should be emphasized that, in order to further ensure the privacy and security of the text data to be classified, the text data to be classified may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for text classification, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 3, the apparatus for text classification according to this embodiment includes:

the model building module 301 is configured to receive a classification model training instruction and build an initial text classification model, where a loss function of the initial text classification model includes a supervised loss term and an unsupervised loss term;

a sample obtaining module 302, configured to obtain a first training sample from a preset database, where the first training sample includes a labeled sample and an unlabeled sample;

a sample labeling module 303, configured to label the unlabeled sample based on a preset open source labeling model to obtain a pseudo label of the unlabeled sample, and combine the labeled sample and the unlabeled sample carrying the pseudo label to obtain a second training sample;

the model training module 304 is configured to import the second training sample into the initial text classification model for model training, and generate a sample classification result;

a loss prediction module 305, configured to calculate an error between the sample classification result and a preset standard classification result based on the supervised loss term and the unsupervised loss term, so as to obtain a prediction error;

a model iteration module 306, configured to iteratively update the initial text classification model by using a back propagation algorithm based on the prediction error to obtain a trained text classification model;

the text classification module 307 is configured to receive a text classification instruction, acquire text data to be classified, import the text data to be classified into the trained text classification model, and output a classification text result.

Further, the text classification device further includes:

the sample classification module is used for classifying the unlabeled samples;

and the sample augmentation module is used for performing data augmentation on the classified label-free samples through a preset data augmentation method to obtain augmented samples.

Further, the sample amplification module specifically includes:

the word segmentation processing unit is used for carrying out word segmentation processing on the classified label-free samples to obtain sample word segmentation;

the random replacement unit is used for acquiring synonyms of the sample participles and randomly replacing the sample participles through the synonyms to obtain a first sample;

a random insertion unit, configured to randomly insert the synonym into the sample segmentation to obtain a second sample;

the random exchange unit is used for carrying out random exchange on the sample participles to obtain a third sample;

the random deleting unit is used for deleting the sample word segmentation at random to obtain a fourth sample;

a sample combining unit, configured to combine the first sample, the second sample, the third sample, and the fourth sample to obtain the augmented sample.

Further, the sample amplification module specifically includes:

and the sample augmentation unit is used for randomly deleting the sample participles, importing the randomly deleted sample participles into a pre-trained text generation model, and generating augmented samples.

Further, the initial text classification model includes an embedding layer, a convolution layer, and a full-link layer, and the model training module 304 specifically includes:

the vector conversion unit is used for carrying out vector feature conversion processing on the second training sample through the embedding layer to obtain an initial vector;

the convolution operation unit is used for performing convolution operation on the initial vector by adopting the convolution layer to obtain characteristic data corresponding to the initial vector;

and the similarity calculation unit is used for importing the feature data into the full connection layer for similarity calculation, and outputting the recognition result with the maximum similarity as the sample classification result corresponding to the second training sample.

Further, the loss prediction module 305 specifically includes:

a first error calculation unit, configured to calculate an error between the labeled sample classification result and the first standard classification result based on the supervised loss term, so as to obtain a first error;

and the second error calculation unit is used for calculating the error between the label-free sample classification result and the second standard classification result based on the unsupervised loss term to obtain a second error.

Further, the model iteration module 306 specifically includes:

an error delivery unit for delivering the prediction error in a network layer of the initial text classification model based on the back propagation algorithm;

the error comparison unit is used for acquiring the identification error of the network layer in the initial text classification model and comparing the identification error with a preset error threshold value;

the model iteration unit is used for carrying out iteration updating on the initial text classification model when the identification error is larger than a preset error threshold value until the identification error is smaller than or equal to the preset error threshold value;

and the model output unit is used for outputting the initial text classification model with the recognition error smaller than or equal to a preset error threshold value to obtain the trained text classification model.

The application discloses text classification's device belongs to artificial intelligence technical field. The method comprises the steps of constructing a semi-supervised text classification model through a UDA network framework, carrying out semi-supervised training on the text classification model through labeled samples and non-labeled samples carrying pseudo labels, calculating errors of classification results corresponding to the labeled samples through the supervised loss items and the unsupervised loss items of a model loss function, calculating errors of the classification results corresponding to the non-labeled samples through the supervised loss items, and independently processing the classification results corresponding to the labeled samples and the non-labeled samples. The text classification model with the strong generalization ability can be trained through a small amount of labeled data and a large amount of unlabeled data, more service scenes are met, the model training efficiency is improved, and meanwhile the accuracy of the model can be guaranteed.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of a text classification method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the text classification method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The application discloses computer equipment belongs to artificial intelligence technical field. The method comprises the steps of constructing a semi-supervised text classification model through a UDA network framework, carrying out semi-supervised training on the text classification model through labeled samples and non-labeled samples carrying pseudo labels, calculating errors of classification results corresponding to the labeled samples through the supervised loss items and the unsupervised loss items of a model loss function, calculating errors of the classification results corresponding to the non-labeled samples through the supervised loss items, and independently processing the classification results corresponding to the labeled samples and the non-labeled samples. The text classification model with the strong generalization ability can be trained through a small amount of labeled data and a large amount of unlabeled data, more service scenes are met, the model training efficiency is improved, and meanwhile the accuracy of the model can be guaranteed.

The present application provides yet another embodiment, which provides a computer-readable storage medium having stored thereon computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the method of text classification as described above.

The application discloses a storage medium belongs to artificial intelligence technical field. The method comprises the steps of constructing a semi-supervised text classification model through a UDA network framework, carrying out semi-supervised training on the text classification model through labeled samples and non-labeled samples carrying pseudo labels, calculating errors of classification results corresponding to the labeled samples through the supervised loss items and the unsupervised loss items of a model loss function, calculating errors of the classification results corresponding to the non-labeled samples through the supervised loss items, and independently processing the classification results corresponding to the labeled samples and the non-labeled samples. The text classification model with the strong generalization ability can be trained through a small amount of labeled data and a large amount of unlabeled data, more service scenes are met, the model training efficiency is improved, and meanwhile the accuracy of the model can be guaranteed.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method of text classification, comprising:

receiving a classification model training instruction, and establishing an initial text classification model, wherein a loss function of the initial text classification model comprises a supervision loss item and an unsupervised loss item;

2. The method of text classification according to claim 1, characterized in that after the step of obtaining the first training sample from the preset database, it further comprises:

classifying the unlabeled exemplars;

3. The method for classifying texts according to claim 2, wherein the step of performing data augmentation on the classified unlabeled samples by a preset data augmentation method to obtain augmented samples specifically comprises:

randomly exchanging the sample participles to obtain a third sample;

randomly deleting the sample word segmentation to obtain a fourth sample;

4. The method for classifying texts according to claim 2, wherein the step of performing data augmentation on the classified unlabeled samples by a preset data augmentation method to obtain augmented samples specifically comprises:

5. The method for classifying texts according to claim 1, wherein the initial text classification model includes an embedded layer, a convolutional layer and a full link layer, and the step of introducing the second training sample into the initial text classification model for model training to generate a sample classification result specifically includes:

6. The method for classifying texts according to claim 1, wherein the step of calculating the error between the sample classification result and the preset standard classification result based on the supervised loss term and the unsupervised loss term to obtain the prediction error comprises:

7. The method according to any one of claims 1 to 6, wherein the step of iteratively updating the initial text classification model using a back propagation algorithm based on the prediction error to obtain a trained text classification model specifically comprises:

8. An apparatus for text classification, comprising:

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the method of text classification according to any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the method of text classification according to any one of claims 1 to 7.