CN113297356A

CN113297356A - Information classification method and system based on BERT model

Info

Publication number: CN113297356A
Application number: CN202110737020.7A
Authority: CN
Inventors: 徐晓健
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-08-24

Abstract

The invention provides a method and a system for information classification based on a BERT model, which relate to the technical field of natural language processing and machine learning, and comprise the following steps: collecting information data, and preprocessing the information data; extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM; according to semantic information of text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model; setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set; and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.

Description

Information classification method and system based on BERT model

Technical Field

The invention relates to the technical field of natural language processing and machine learning, in particular to a method and a system for information classification based on a BERT model.

Background

The mobile banking APP plays an important role in bank digital transformation as an important customer channel. In order to further improve customer experience, the mobile banking adds an information function, and in consideration of various new information generated every day, the mobile banking needs to classify the information according to information content in order to better manage the information; considering the amount of information data, the method relying solely on manual work is costly and inefficient.

In view of the above, a technical solution for efficiently and accurately classifying information that can overcome the above-mentioned drawbacks is needed.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an information classification method and system based on a BERT model. The invention utilizes artificial intelligence technology to process the information title, extracts the information in the title text, processes the information and classifies the information.

In a first aspect of an embodiment of the present invention, an information classification method based on a BERT model is provided, where the method includes:

collecting information data, and preprocessing the information data;

extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM;

according to semantic information of text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;

setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set;

and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.

In a second aspect of the embodiments of the present invention, an information classification system based on a BERT model is provided, the system including:

the preprocessing module is used for acquiring information data and preprocessing the information data;

the text feature processing module is used for extracting text features by using a BERT model according to the preprocessed information data and extracting semantic information of the text features by using an LSTM;

the model building module is used for enhancing the semantic information with the weight larger than a first preset value according to the semantic information of the text characteristics, inhibiting the semantic information with the weight smaller than the preset value and building an information classification model;

the model training module is used for setting a training set and a test set according to the preprocessed information data, training the information classification model and testing the training model by using the test set;

and the information classification module is used for acquiring the information data to be classified, and performing class classification on the information data to be classified by using the trained information classification model to obtain a classification result.

In a third aspect of the embodiments of the present invention, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements a BERT model-based information classification method.

In a fourth aspect of embodiments of the present invention, a computer-readable storage medium is presented, which stores a computer program that, when executed by a processor, implements a BERT model-based information classification method.

Compared with the prior art, the information classification method and system based on the BERT model at least have the following advantages: the method extracts the text features through the BERT model, is simple in usage, convenient and efficient to use, can save a large amount of time cost and labor cost, fully extracts the features in the text by utilizing an information enhancement processing mechanism, improves the information utilization rate, improves the performance of an information classification model, and effectively improves the accuracy of information classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of an information classification method based on a BERT model according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of an information classification process according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of the architectural relationships according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of an information classification system based on a BERT model according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, an information classification method and system based on a BERT model are provided, and the invention relates to the technical field of natural language processing and machine learning.

In the embodiments of the present invention, terms to be described include:

BERT: ***-sourced pre-training model for performing natural language processing tasks learns a good feature representation for words by running a self-supervised learning method on a massive corpus, so-called self-supervised learning refers to supervised learning that runs on data without artificial labels. The characteristics of BERT represent word embedding characteristics as NLP tasks. A model for transfer learning is provided for other tasks, which can be fine-tuned or fixed according to the task and then used as a feature extractor.

LSTM: the long-short term memory artificial neural network is suitable for processing and predicting important events with very long intervals and delays in a time sequence.

Softmax: normalizing the exponential function to normalize the logarithm of the gradient of the finite discrete probability distribution. Softmax is widely used in probability-based multi-classification problem methods such as multinomial logistic regression, multinomial linear discriminant analysis, naive Bayes classifier, artificial neural network, and the like.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

FIG. 1 is a flow chart of an information classification method based on a BERT model according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step S101, collecting information data and preprocessing the information data;

step S102, extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM;

step S103, according to the semantic information of the text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;

step S104, setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set;

step S105, obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.

For a more clear explanation of the information classification method based on the BERT model, each step is described in detail below.

Step S101, data preprocessing:

the preprocessing process comprises the steps of screening and labeling collected information data, extracting a title text of the information and labeling information types.

Step S2, extracting text features:

taking the BERT model as a text feature extraction model, inputting the title text of the information data into the text feature extraction model, and extracting text features; wherein the BERT model is a model completed based on Chinese data set pre-training;

and extracting forward and backward features of the text features by using the LSTM to obtain forward and backward semantic information of the text features.

Integrating forward semantic information and backward semantic information of text features, and splicing according to feature dimensions to obtain new features; wherein each word in the text is represented by the corresponding new feature, forward and backward semantic information of the text feature.

Step S103, information enhancement, information classification model establishment:

processing the new features, selectively increasing the influence of the effective features and inhibiting the influence of the ineffective features, wherein the selection logic of the information enhancement is as follows:

setting input data dimension as [ B, S, H x 2], wherein B represents data batch, S represents text length, and H represents LSTM hidden layer neuron number;

initializing a learnable weight matrix w, wherein the dimension of w is [ H x 2,1 ];

after nonlinear activation is carried out on [ B, S, H x 2] dimensional data output by the LSTM, matrix multiplication is carried out on the [ B, S, H x 2] dimensional data and w, and Softmax normalization is carried out, so that a score at each moment is obtained, the dimensionality is [ B, S,1], and the matrix of the scores represents the weight of each character in the text and is used for enhancing character characteristics;

multiplying hidden layer state [ B, S, H & lt2 ] dimensional data of each moment of the LSTM by a corresponding score [ B, S,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality is [ B, H & lt2 ];

carrying out nonlinear activation on the final hidden layer value [ B, H x 2] dimensional data, and then sending the data into a full connection layer, wherein the full connection layer comprises 2 layers in total, the neuron numbers are B and N respectively, and N is a predicted category number;

by means of Softmax normalization, taking the category corresponding to the largest number in the N numbers as final prediction to obtain a final output result, wherein the dimensionality is [ B,1 ];

enhancing or suppressing the characteristics of each character according to the characteristic importance, and enhancing or suppressing the characteristics of all characters according to the importance;

and building an information classification model according to the text characteristics after the information enhancement processing.

Step S104, model training and model testing:

training the set up information classification model by using a training set;

testing the trained model according to the information titles of the test set, and judging whether the classification result is correct; and if the accuracy reaches a preset value, finishing the model training.

Step S105, information classification:

Compared with the prior art, the information classification method based on the BERT model provided by the invention is provided with the model for information classification; for each character in the text, expressing by using the forward semantic information and the backward semantic information 2 part information between the character and the text; and the features of each character in the text are subjected to feature enhancement and suppression, and the features of all characters in the text are subjected to feature enhancement and suppression. Text features are extracted through a convolution kernel, the use method is simple, the use is convenient and fast, the efficiency is high, and a large amount of time cost and labor cost can be saved; and the data processing mechanism can fully extract the features in the text, so that the information utilization rate is higher and the accuracy is higher.

It should be noted that although the operations of the method of the present invention have been described in the above embodiments and the accompanying drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the operations shown must be performed, to achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

For a more clear explanation of the information classification method based on the BERT model, a specific embodiment is described below, but it should be noted that the embodiment is only for better explaining the present invention and should not be construed as an undue limitation to the present invention.

Referring to fig. 2, a schematic diagram of an information classification process according to an embodiment of the invention is shown. As shown in fig. 2, the specific process is as follows:

s201, training data preprocessing:

firstly, manually screening and labeling collected information data;

s202, establishing a neural network model:

and building a model for information classification.

Considering that the target input is an information title, the input text is short in length, and therefore, the BERT model is selected as the text feature extraction model. It should be noted that, unlike the conventional BERT model, the present invention does not directly perform Softmax operation on the features for classification after extracting the features, but performs classification after further processing and information extraction on the features in combination with the task features.

Referring to the schematic diagram of the architecture relationship shown in fig. 3, after the title text is input into the model, the model will extract the text features by using the pre-trained BERT model based on the chinese data set; in FIG. 3, Text: a text; embedding: converting discrete variables into continuous vectors; concat: splicing input data by the Concat layer; max Pooling: maximizing pooling, namely maximizing the feature points in the neighborhood; average Pooling: global average pooling; class: a category.

Further, forward and backward feature extraction is further carried out on the text features extracted by the BERT by using the LSTM so as to extract forward and backward semantic information of the text features;

integrating the extracted 2 parts of information, and splicing the 2 parts of characteristics according to characteristic dimensions to obtain new characteristics, namely, each character in the text is represented by the forward semantic information and the backward semantic information between the character and the text 2 parts of information;

and processing the extracted text features, and selectively increasing the influence of partial effective features and simultaneously inhibiting the influence of partial ineffective features.

The information enhancement specific selection logic is as follows:

assuming that the input data dimension is [ batch _ size, seq _ len, hidden _ size × 2], batch _ size represents the data batch, seq _ len represents the text length, and hidden _ size represents the LSTM hidden layer neuron number;

a learnable weight matrix w is initialized, with w dimension [ hidden _ size × 2,1 ].

Firstly, nonlinear activation is carried out on output [ batch _ size, seq _ len and hidden _ size 2] dimensional data of an LSTM, then matrix multiplication is carried out on the data and w, and the data is normalized by softmax, so that a score at each moment is obtained, the dimension of the score is [ batch _ size, seq _ len and 1], the matrix represents the weight of each character in the text and is used for enhancing character features;

multiplying hidden layer state [ batch _ size, seq _ len, hidden _ size x 2] dimensional data of each time of the LSTM by a corresponding score [ batch _ size, seq _ len,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality of the final hidden layer value is [ batch _ size, hidden _ size x 2 ];

carrying out nonlinear activation on the dimensional data of the final hidden layer value [ batch _ size, hidden _ size × 2] and sending the data into a full-connection layer, wherein the full-connection layer comprises 2 layers, and the number of neurons is batch _ size and num _ class respectively; num _ class is the number of predicted classes.

And finally, by means of Softmax normalization, taking the class corresponding to the largest number in num _ class numbers as a final prediction to obtain a final output result, wherein the dimensionality of the final output result is [ batch _ size,1 ].

By the above operation on the text features, the features of each character are enhanced and suppressed according to the feature importance, and the features of all characters are enhanced and suppressed according to the importance. The performance of the model is further improved through effective characteristic enhancement and ineffective characteristic suppression.

S203, model training:

training the model established in the step S202 by using the data obtained in the step S201;

s204, information classification:

the trained model of S203 is used to determine the information data for testing.

Having described the method of an exemplary embodiment of the present invention, the BERT model-based information classification system of an exemplary embodiment of the present invention is described next with reference to fig. 4.

The implementation of the information classification system based on the BERT model can be referred to the implementation of the above method, and repeated details are not repeated. The term "module" or "unit" used hereinafter may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Based on the same inventive concept, the invention also provides an information classification system based on the BERT model, as shown in FIG. 4, the system comprises:

the preprocessing module 410 is used for collecting information data and preprocessing the information data;

the text feature processing module 420 is configured to extract text features by using a BERT model according to the preprocessed information data, and extract semantic information of the text features by using an LSTM;

the model building module 430 is used for enhancing the semantic information with the weight larger than a first preset value according to the semantic information of the text characteristics, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;

the model training module 440 is configured to set a training set and a test set according to the preprocessed information data, train the information classification model, and test the training model by using the test set;

the information classification module 450 is configured to obtain information data to be classified, and perform classification on the information data to be classified by using the trained information classification model to obtain a classification result.

In this embodiment, the preprocessing module 410 is specifically configured to:

and screening and labeling the collected information data, extracting a title text of the information and labeling the information category.

In this embodiment, the text feature processing module 420 is specifically configured to:

In this embodiment, the model building module 430 is specifically configured to:

In this embodiment, the model training module 440 is specifically configured to:

training the set up information classification model by using a training set;

It should be noted that although several modules of the BERT model-based information classification system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.

Based on the aforementioned inventive concept, as shown in fig. 5, the present invention further provides a computer apparatus 500, which includes a memory 510, a processor 520, and a computer program 530 stored in the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 530 to implement the aforementioned information classification method based on the BERT model.

Based on the above inventive concept, the present invention provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the method for classifying information based on a BERT model.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for classifying information based on a BERT model is characterized by comprising the following steps:

collecting information data, and preprocessing the information data;

2. The BERT model-based information classification method of claim 1, wherein collecting information data and preprocessing the information data comprises:

3. The method of claim 2, wherein the extracting text features using the BERT model and semantic information of the text features using the LSTM according to the preprocessed information data comprises:

4. The method of claim 3, further comprising:

5. The information classification method based on the BERT model as claimed in claim 4, wherein the information classification model is constructed by enhancing the semantic information with the weight larger than the first preset value and suppressing the semantic information with the weight smaller than the preset value according to the semantic information of the text features, and comprises the following steps:

6. The method of claim 5, wherein the information classification model is trained by setting a training set and a test set according to the preprocessed information data, and the training model is tested by using the test set, comprising:

training the set up information classification model by using a training set;

7. A BERT model-based information classification system, comprising:

8. The BERT model-based information classification system of claim 7, wherein the preprocessing module is specifically configured to:

9. The BERT model-based information classification system of claim 8, wherein the text feature processing module is specifically configured to:

10. The BERT model-based information classification system of claim 9, wherein the text feature processing module is specifically configured to:

11. The BERT model-based information classification system according to claim 10, wherein the model building module is specifically configured to:

12. The BERT model-based information classification system of claim 11, wherein the model training module is specifically configured to:

training the set up information classification model by using a training set;

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 6.