CN113297356A - Information classification method and system based on BERT model - Google Patents

Information classification method and system based on BERT model Download PDF

Info

Publication number
CN113297356A
CN113297356A CN202110737020.7A CN202110737020A CN113297356A CN 113297356 A CN113297356 A CN 113297356A CN 202110737020 A CN202110737020 A CN 202110737020A CN 113297356 A CN113297356 A CN 113297356A
Authority
CN
China
Prior art keywords
information
model
text
data
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110737020.7A
Other languages
Chinese (zh)
Inventor
徐晓健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202110737020.7A priority Critical patent/CN113297356A/en
Publication of CN113297356A publication Critical patent/CN113297356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method and a system for information classification based on a BERT model, which relate to the technical field of natural language processing and machine learning, and comprise the following steps: collecting information data, and preprocessing the information data; extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM; according to semantic information of text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model; setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set; and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.

Description

Information classification method and system based on BERT model
Technical Field
The invention relates to the technical field of natural language processing and machine learning, in particular to a method and a system for information classification based on a BERT model.
Background
The mobile banking APP plays an important role in bank digital transformation as an important customer channel. In order to further improve customer experience, the mobile banking adds an information function, and in consideration of various new information generated every day, the mobile banking needs to classify the information according to information content in order to better manage the information; considering the amount of information data, the method relying solely on manual work is costly and inefficient.
In view of the above, a technical solution for efficiently and accurately classifying information that can overcome the above-mentioned drawbacks is needed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an information classification method and system based on a BERT model. The invention utilizes artificial intelligence technology to process the information title, extracts the information in the title text, processes the information and classifies the information.
In a first aspect of an embodiment of the present invention, an information classification method based on a BERT model is provided, where the method includes:
collecting information data, and preprocessing the information data;
extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM;
according to semantic information of text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;
setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set;
and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.
In a second aspect of the embodiments of the present invention, an information classification system based on a BERT model is provided, the system including:
the preprocessing module is used for acquiring information data and preprocessing the information data;
the text feature processing module is used for extracting text features by using a BERT model according to the preprocessed information data and extracting semantic information of the text features by using an LSTM;
the model building module is used for enhancing the semantic information with the weight larger than a first preset value according to the semantic information of the text characteristics, inhibiting the semantic information with the weight smaller than the preset value and building an information classification model;
the model training module is used for setting a training set and a test set according to the preprocessed information data, training the information classification model and testing the training model by using the test set;
and the information classification module is used for acquiring the information data to be classified, and performing class classification on the information data to be classified by using the trained information classification model to obtain a classification result.
In a third aspect of the embodiments of the present invention, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements a BERT model-based information classification method.
In a fourth aspect of embodiments of the present invention, a computer-readable storage medium is presented, which stores a computer program that, when executed by a processor, implements a BERT model-based information classification method.
Compared with the prior art, the information classification method and system based on the BERT model at least have the following advantages: the method extracts the text features through the BERT model, is simple in usage, convenient and efficient to use, can save a large amount of time cost and labor cost, fully extracts the features in the text by utilizing an information enhancement processing mechanism, improves the information utilization rate, improves the performance of an information classification model, and effectively improves the accuracy of information classification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of an information classification method based on a BERT model according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of an information classification process according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of the architectural relationships according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of an information classification system based on a BERT model according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, an information classification method and system based on a BERT model are provided, and the invention relates to the technical field of natural language processing and machine learning.
In the embodiments of the present invention, terms to be described include:
BERT: ***-sourced pre-training model for performing natural language processing tasks learns a good feature representation for words by running a self-supervised learning method on a massive corpus, so-called self-supervised learning refers to supervised learning that runs on data without artificial labels. The characteristics of BERT represent word embedding characteristics as NLP tasks. A model for transfer learning is provided for other tasks, which can be fine-tuned or fixed according to the task and then used as a feature extractor.
LSTM: the long-short term memory artificial neural network is suitable for processing and predicting important events with very long intervals and delays in a time sequence.
Softmax: normalizing the exponential function to normalize the logarithm of the gradient of the finite discrete probability distribution. Softmax is widely used in probability-based multi-classification problem methods such as multinomial logistic regression, multinomial linear discriminant analysis, naive Bayes classifier, artificial neural network, and the like.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
FIG. 1 is a flow chart of an information classification method based on a BERT model according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S101, collecting information data and preprocessing the information data;
step S102, extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM;
step S103, according to the semantic information of the text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;
step S104, setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set;
step S105, obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.
For a more clear explanation of the information classification method based on the BERT model, each step is described in detail below.
Step S101, data preprocessing:
the preprocessing process comprises the steps of screening and labeling collected information data, extracting a title text of the information and labeling information types.
Step S2, extracting text features:
taking the BERT model as a text feature extraction model, inputting the title text of the information data into the text feature extraction model, and extracting text features; wherein the BERT model is a model completed based on Chinese data set pre-training;
and extracting forward and backward features of the text features by using the LSTM to obtain forward and backward semantic information of the text features.
Integrating forward semantic information and backward semantic information of text features, and splicing according to feature dimensions to obtain new features; wherein each word in the text is represented by the corresponding new feature, forward and backward semantic information of the text feature.
Step S103, information enhancement, information classification model establishment:
processing the new features, selectively increasing the influence of the effective features and inhibiting the influence of the ineffective features, wherein the selection logic of the information enhancement is as follows:
setting input data dimension as [ B, S, H x 2], wherein B represents data batch, S represents text length, and H represents LSTM hidden layer neuron number;
initializing a learnable weight matrix w, wherein the dimension of w is [ H x 2,1 ];
after nonlinear activation is carried out on [ B, S, H x 2] dimensional data output by the LSTM, matrix multiplication is carried out on the [ B, S, H x 2] dimensional data and w, and Softmax normalization is carried out, so that a score at each moment is obtained, the dimensionality is [ B, S,1], and the matrix of the scores represents the weight of each character in the text and is used for enhancing character characteristics;
multiplying hidden layer state [ B, S, H & lt2 ] dimensional data of each moment of the LSTM by a corresponding score [ B, S,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality is [ B, H & lt2 ];
carrying out nonlinear activation on the final hidden layer value [ B, H x 2] dimensional data, and then sending the data into a full connection layer, wherein the full connection layer comprises 2 layers in total, the neuron numbers are B and N respectively, and N is a predicted category number;
by means of Softmax normalization, taking the category corresponding to the largest number in the N numbers as final prediction to obtain a final output result, wherein the dimensionality is [ B,1 ];
enhancing or suppressing the characteristics of each character according to the characteristic importance, and enhancing or suppressing the characteristics of all characters according to the importance;
and building an information classification model according to the text characteristics after the information enhancement processing.
Step S104, model training and model testing:
training the set up information classification model by using a training set;
testing the trained model according to the information titles of the test set, and judging whether the classification result is correct; and if the accuracy reaches a preset value, finishing the model training.
Step S105, information classification:
and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.
Compared with the prior art, the information classification method based on the BERT model provided by the invention is provided with the model for information classification; for each character in the text, expressing by using the forward semantic information and the backward semantic information 2 part information between the character and the text; and the features of each character in the text are subjected to feature enhancement and suppression, and the features of all characters in the text are subjected to feature enhancement and suppression. Text features are extracted through a convolution kernel, the use method is simple, the use is convenient and fast, the efficiency is high, and a large amount of time cost and labor cost can be saved; and the data processing mechanism can fully extract the features in the text, so that the information utilization rate is higher and the accuracy is higher.
It should be noted that although the operations of the method of the present invention have been described in the above embodiments and the accompanying drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the operations shown must be performed, to achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
For a more clear explanation of the information classification method based on the BERT model, a specific embodiment is described below, but it should be noted that the embodiment is only for better explaining the present invention and should not be construed as an undue limitation to the present invention.
Referring to fig. 2, a schematic diagram of an information classification process according to an embodiment of the invention is shown. As shown in fig. 2, the specific process is as follows:
s201, training data preprocessing:
firstly, manually screening and labeling collected information data;
s202, establishing a neural network model:
and building a model for information classification.
Considering that the target input is an information title, the input text is short in length, and therefore, the BERT model is selected as the text feature extraction model. It should be noted that, unlike the conventional BERT model, the present invention does not directly perform Softmax operation on the features for classification after extracting the features, but performs classification after further processing and information extraction on the features in combination with the task features.
Referring to the schematic diagram of the architecture relationship shown in fig. 3, after the title text is input into the model, the model will extract the text features by using the pre-trained BERT model based on the chinese data set; in FIG. 3, Text: a text; embedding: converting discrete variables into continuous vectors; concat: splicing input data by the Concat layer; max Pooling: maximizing pooling, namely maximizing the feature points in the neighborhood; average Pooling: global average pooling; class: a category.
Further, forward and backward feature extraction is further carried out on the text features extracted by the BERT by using the LSTM so as to extract forward and backward semantic information of the text features;
integrating the extracted 2 parts of information, and splicing the 2 parts of characteristics according to characteristic dimensions to obtain new characteristics, namely, each character in the text is represented by the forward semantic information and the backward semantic information between the character and the text 2 parts of information;
and processing the extracted text features, and selectively increasing the influence of partial effective features and simultaneously inhibiting the influence of partial ineffective features.
The information enhancement specific selection logic is as follows:
assuming that the input data dimension is [ batch _ size, seq _ len, hidden _ size × 2], batch _ size represents the data batch, seq _ len represents the text length, and hidden _ size represents the LSTM hidden layer neuron number;
a learnable weight matrix w is initialized, with w dimension [ hidden _ size × 2,1 ].
Firstly, nonlinear activation is carried out on output [ batch _ size, seq _ len and hidden _ size 2] dimensional data of an LSTM, then matrix multiplication is carried out on the data and w, and the data is normalized by softmax, so that a score at each moment is obtained, the dimension of the score is [ batch _ size, seq _ len and 1], the matrix represents the weight of each character in the text and is used for enhancing character features;
multiplying hidden layer state [ batch _ size, seq _ len, hidden _ size x 2] dimensional data of each time of the LSTM by a corresponding score [ batch _ size, seq _ len,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality of the final hidden layer value is [ batch _ size, hidden _ size x 2 ];
carrying out nonlinear activation on the dimensional data of the final hidden layer value [ batch _ size, hidden _ size × 2] and sending the data into a full-connection layer, wherein the full-connection layer comprises 2 layers, and the number of neurons is batch _ size and num _ class respectively; num _ class is the number of predicted classes.
And finally, by means of Softmax normalization, taking the class corresponding to the largest number in num _ class numbers as a final prediction to obtain a final output result, wherein the dimensionality of the final output result is [ batch _ size,1 ].
By the above operation on the text features, the features of each character are enhanced and suppressed according to the feature importance, and the features of all characters are enhanced and suppressed according to the importance. The performance of the model is further improved through effective characteristic enhancement and ineffective characteristic suppression.
S203, model training:
training the model established in the step S202 by using the data obtained in the step S201;
s204, information classification:
the trained model of S203 is used to determine the information data for testing.
Having described the method of an exemplary embodiment of the present invention, the BERT model-based information classification system of an exemplary embodiment of the present invention is described next with reference to fig. 4.
The implementation of the information classification system based on the BERT model can be referred to the implementation of the above method, and repeated details are not repeated. The term "module" or "unit" used hereinafter may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Based on the same inventive concept, the invention also provides an information classification system based on the BERT model, as shown in FIG. 4, the system comprises:
the preprocessing module 410 is used for collecting information data and preprocessing the information data;
the text feature processing module 420 is configured to extract text features by using a BERT model according to the preprocessed information data, and extract semantic information of the text features by using an LSTM;
the model building module 430 is used for enhancing the semantic information with the weight larger than a first preset value according to the semantic information of the text characteristics, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;
the model training module 440 is configured to set a training set and a test set according to the preprocessed information data, train the information classification model, and test the training model by using the test set;
the information classification module 450 is configured to obtain information data to be classified, and perform classification on the information data to be classified by using the trained information classification model to obtain a classification result.
In this embodiment, the preprocessing module 410 is specifically configured to:
and screening and labeling the collected information data, extracting a title text of the information and labeling the information category.
In this embodiment, the text feature processing module 420 is specifically configured to:
taking the BERT model as a text feature extraction model, inputting the title text of the information data into the text feature extraction model, and extracting text features; wherein the BERT model is a model completed based on Chinese data set pre-training;
and extracting forward and backward features of the text features by using the LSTM to obtain forward and backward semantic information of the text features.
In this embodiment, the text feature processing module 420 is specifically configured to:
integrating forward semantic information and backward semantic information of text features, and splicing according to feature dimensions to obtain new features; wherein each word in the text is represented by the corresponding new feature, forward and backward semantic information of the text feature.
In this embodiment, the model building module 430 is specifically configured to:
processing the new features, selectively increasing the influence of the effective features and inhibiting the influence of the ineffective features, wherein the selection logic of the information enhancement is as follows:
setting input data dimension as [ B, S, H x 2], wherein B represents data batch, S represents text length, and H represents LSTM hidden layer neuron number;
initializing a learnable weight matrix w, wherein the dimension of w is [ H x 2,1 ];
after nonlinear activation is carried out on [ B, S, H x 2] dimensional data output by the LSTM, matrix multiplication is carried out on the [ B, S, H x 2] dimensional data and w, and Softmax normalization is carried out, so that a score at each moment is obtained, the dimensionality is [ B, S,1], and the matrix of the scores represents the weight of each character in the text and is used for enhancing character characteristics;
multiplying hidden layer state [ B, S, H & lt2 ] dimensional data of each moment of the LSTM by a corresponding score [ B, S,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality is [ B, H & lt2 ];
carrying out nonlinear activation on the final hidden layer value [ B, H x 2] dimensional data, and then sending the data into a full connection layer, wherein the full connection layer comprises 2 layers in total, the neuron numbers are B and N respectively, and N is a predicted category number;
by means of Softmax normalization, taking the category corresponding to the largest number in the N numbers as final prediction to obtain a final output result, wherein the dimensionality is [ B,1 ];
enhancing or suppressing the characteristics of each character according to the characteristic importance, and enhancing or suppressing the characteristics of all characters according to the importance;
and building an information classification model according to the text characteristics after the information enhancement processing.
In this embodiment, the model training module 440 is specifically configured to:
training the set up information classification model by using a training set;
testing the trained model according to the information titles of the test set, and judging whether the classification result is correct; and if the accuracy reaches a preset value, finishing the model training.
It should be noted that although several modules of the BERT model-based information classification system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Based on the aforementioned inventive concept, as shown in fig. 5, the present invention further provides a computer apparatus 500, which includes a memory 510, a processor 520, and a computer program 530 stored in the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 530 to implement the aforementioned information classification method based on the BERT model.
Based on the above inventive concept, the present invention provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the method for classifying information based on a BERT model.
Compared with the prior art, the information classification method and system based on the BERT model at least have the following advantages: the method extracts the text features through the BERT model, is simple in usage, convenient and efficient to use, can save a large amount of time cost and labor cost, fully extracts the features in the text by utilizing an information enhancement processing mechanism, improves the information utilization rate, improves the performance of an information classification model, and effectively improves the accuracy of information classification.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (14)

1. A method for classifying information based on a BERT model is characterized by comprising the following steps:
collecting information data, and preprocessing the information data;
extracting text features by using a BERT model according to the preprocessed information data, and extracting semantic information of the text features by using an LSTM;
according to semantic information of text features, enhancing the semantic information with the weight larger than a first preset value, inhibiting the semantic information with the weight smaller than the preset value, and building an information classification model;
setting a training set and a test set according to the preprocessed information data, training the information classification model, and testing the training model by using the test set;
and obtaining the information data to be classified, and classifying the information data to be classified by using the trained information classification model to obtain a classification result.
2. The BERT model-based information classification method of claim 1, wherein collecting information data and preprocessing the information data comprises:
and screening and labeling the collected information data, extracting a title text of the information and labeling the information category.
3. The method of claim 2, wherein the extracting text features using the BERT model and semantic information of the text features using the LSTM according to the preprocessed information data comprises:
taking the BERT model as a text feature extraction model, inputting the title text of the information data into the text feature extraction model, and extracting text features; wherein the BERT model is a model completed based on Chinese data set pre-training;
and extracting forward and backward features of the text features by using the LSTM to obtain forward and backward semantic information of the text features.
4. The method of claim 3, further comprising:
integrating forward semantic information and backward semantic information of text features, and splicing according to feature dimensions to obtain new features; wherein each word in the text is represented by the corresponding new feature, forward and backward semantic information of the text feature.
5. The information classification method based on the BERT model as claimed in claim 4, wherein the information classification model is constructed by enhancing the semantic information with the weight larger than the first preset value and suppressing the semantic information with the weight smaller than the preset value according to the semantic information of the text features, and comprises the following steps:
processing the new features, selectively increasing the influence of the effective features and inhibiting the influence of the ineffective features, wherein the selection logic of the information enhancement is as follows:
setting input data dimension as [ B, S, H x 2], wherein B represents data batch, S represents text length, and H represents LSTM hidden layer neuron number;
initializing a learnable weight matrix w, wherein the dimension of w is [ H x 2,1 ];
after nonlinear activation is carried out on [ B, S, H x 2] dimensional data output by the LSTM, matrix multiplication is carried out on the [ B, S, H x 2] dimensional data and w, and Softmax normalization is carried out, so that a score at each moment is obtained, the dimensionality is [ B, S,1], and the matrix of the scores represents the weight of each character in the text and is used for enhancing character characteristics;
multiplying hidden layer state [ B, S, H & lt2 ] dimensional data of each moment of the LSTM by a corresponding score [ B, S,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality is [ B, H & lt2 ];
carrying out nonlinear activation on the final hidden layer value [ B, H x 2] dimensional data, and then sending the data into a full connection layer, wherein the full connection layer comprises 2 layers in total, the neuron numbers are B and N respectively, and N is a predicted category number;
by means of Softmax normalization, taking the category corresponding to the largest number in the N numbers as final prediction to obtain a final output result, wherein the dimensionality is [ B,1 ];
enhancing or suppressing the characteristics of each character according to the characteristic importance, and enhancing or suppressing the characteristics of all characters according to the importance;
and building an information classification model according to the text characteristics after the information enhancement processing.
6. The method of claim 5, wherein the information classification model is trained by setting a training set and a test set according to the preprocessed information data, and the training model is tested by using the test set, comprising:
training the set up information classification model by using a training set;
testing the trained model according to the information titles of the test set, and judging whether the classification result is correct; and if the accuracy reaches a preset value, finishing the model training.
7. A BERT model-based information classification system, comprising:
the preprocessing module is used for acquiring information data and preprocessing the information data;
the text feature processing module is used for extracting text features by using a BERT model according to the preprocessed information data and extracting semantic information of the text features by using an LSTM;
the model building module is used for enhancing the semantic information with the weight larger than a first preset value according to the semantic information of the text characteristics, inhibiting the semantic information with the weight smaller than the preset value and building an information classification model;
the model training module is used for setting a training set and a test set according to the preprocessed information data, training the information classification model and testing the training model by using the test set;
and the information classification module is used for acquiring the information data to be classified, and performing class classification on the information data to be classified by using the trained information classification model to obtain a classification result.
8. The BERT model-based information classification system of claim 7, wherein the preprocessing module is specifically configured to:
and screening and labeling the collected information data, extracting a title text of the information and labeling the information category.
9. The BERT model-based information classification system of claim 8, wherein the text feature processing module is specifically configured to:
taking the BERT model as a text feature extraction model, inputting the title text of the information data into the text feature extraction model, and extracting text features; wherein the BERT model is a model completed based on Chinese data set pre-training;
and extracting forward and backward features of the text features by using the LSTM to obtain forward and backward semantic information of the text features.
10. The BERT model-based information classification system of claim 9, wherein the text feature processing module is specifically configured to:
integrating forward semantic information and backward semantic information of text features, and splicing according to feature dimensions to obtain new features; wherein each word in the text is represented by the corresponding new feature, forward and backward semantic information of the text feature.
11. The BERT model-based information classification system according to claim 10, wherein the model building module is specifically configured to:
processing the new features, selectively increasing the influence of the effective features and inhibiting the influence of the ineffective features, wherein the selection logic of the information enhancement is as follows:
setting input data dimension as [ B, S, H x 2], wherein B represents data batch, S represents text length, and H represents LSTM hidden layer neuron number;
initializing a learnable weight matrix w, wherein the dimension of w is [ H x 2,1 ];
after nonlinear activation is carried out on [ B, S, H x 2] dimensional data output by the LSTM, matrix multiplication is carried out on the [ B, S, H x 2] dimensional data and w, and Softmax normalization is carried out, so that a score at each moment is obtained, the dimensionality is [ B, S,1], and the matrix of the scores represents the weight of each character in the text and is used for enhancing character characteristics;
multiplying hidden layer state [ B, S, H & lt2 ] dimensional data of each moment of the LSTM by a corresponding score [ B, S,1] dimensional matrix, and then summing to obtain a final hidden layer value after weighted average, wherein the dimensionality is [ B, H & lt2 ];
carrying out nonlinear activation on the final hidden layer value [ B, H x 2] dimensional data, and then sending the data into a full connection layer, wherein the full connection layer comprises 2 layers in total, the neuron numbers are B and N respectively, and N is a predicted category number;
by means of Softmax normalization, taking the category corresponding to the largest number in the N numbers as final prediction to obtain a final output result, wherein the dimensionality is [ B,1 ];
enhancing or suppressing the characteristics of each character according to the characteristic importance, and enhancing or suppressing the characteristics of all characters according to the importance;
and building an information classification model according to the text characteristics after the information enhancement processing.
12. The BERT model-based information classification system of claim 11, wherein the model training module is specifically configured to:
training the set up information classification model by using a training set;
testing the trained model according to the information titles of the test set, and judging whether the classification result is correct; and if the accuracy reaches a preset value, finishing the model training.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 6.
CN202110737020.7A 2021-06-30 2021-06-30 Information classification method and system based on BERT model Pending CN113297356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110737020.7A CN113297356A (en) 2021-06-30 2021-06-30 Information classification method and system based on BERT model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110737020.7A CN113297356A (en) 2021-06-30 2021-06-30 Information classification method and system based on BERT model

Publications (1)

Publication Number Publication Date
CN113297356A true CN113297356A (en) 2021-08-24

Family

ID=77329964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110737020.7A Pending CN113297356A (en) 2021-06-30 2021-06-30 Information classification method and system based on BERT model

Country Status (1)

Country Link
CN (1) CN113297356A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216345A1 (en) * 2003-10-06 2005-09-29 Ebbe Altberg Methods and apparatuses for offline selection of pay-per-call advertisers
CN111428026A (en) * 2020-02-20 2020-07-17 西安电子科技大学 Multi-label text classification processing method and system and information data processing terminal
CN112347766A (en) * 2020-11-27 2021-02-09 北京工业大学 Multi-label classification method for processing microblog text cognition distortion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216345A1 (en) * 2003-10-06 2005-09-29 Ebbe Altberg Methods and apparatuses for offline selection of pay-per-call advertisers
CN111428026A (en) * 2020-02-20 2020-07-17 西安电子科技大学 Multi-label text classification processing method and system and information data processing terminal
CN112347766A (en) * 2020-11-27 2021-02-09 北京工业大学 Multi-label classification method for processing microblog text cognition distortion

Similar Documents

Publication Publication Date Title
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN110968660B (en) Information extraction method and system based on joint training model
CN109947931B (en) Method, system, device and medium for automatically abstracting text based on unsupervised learning
CN111783993A (en) Intelligent labeling method and device, intelligent platform and storage medium
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN116361801B (en) Malicious software detection method and system based on semantic information of application program interface
CN111858878B (en) Method, system and storage medium for automatically extracting answer from natural language text
CN113434685B (en) Information classification processing method and system
CN112328475B (en) Defect positioning method for multiple suspicious code files
CN111653275A (en) Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
CN112507124A (en) Chapter-level event causal relationship extraction method based on graph model
CN112036705A (en) Quality inspection result data acquisition method, device and equipment
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN112712166A (en) Prediction method and device based on time series
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN116627490A (en) Intelligent contract byte code similarity detection method
CN112966115B (en) Active learning event extraction method based on memory loss prediction and delay training
CN111160526A (en) Online testing method and device for deep learning system based on MAPE-D annular structure
CN116306606A (en) Financial contract term extraction method and system based on incremental learning
CN110472231A (en) It is a kind of identification legal documents case by method and apparatus
CN113297356A (en) Information classification method and system based on BERT model
US20230063686A1 (en) Fine-grained stochastic neural architecture search
Sethi et al. Deep Learning-based Binary Classification for Spam Detection in SMS Data: Addressing Imbalanced Data with Sampling Techniques
CN115062769A (en) Knowledge distillation-based model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination