CN114443845A - BERT-based multi-feature fine-granularity Chinese short text sentiment classification method - Google Patents

BERT-based multi-feature fine-granularity Chinese short text sentiment classification method Download PDF

Info

Publication number
CN114443845A
CN114443845A CN202210066218.1A CN202210066218A CN114443845A CN 114443845 A CN114443845 A CN 114443845A CN 202210066218 A CN202210066218 A CN 202210066218A CN 114443845 A CN114443845 A CN 114443845A
Authority
CN
China
Prior art keywords
bert
model
features
probability
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210066218.1A
Other languages
Chinese (zh)
Inventor
丁晓静
卓胜祥
范华俊
左宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuxu Network Technology Shanghai Co ltd
Original Assignee
Xuxu Network Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuxu Network Technology Shanghai Co ltd filed Critical Xuxu Network Technology Shanghai Co ltd
Priority to CN202210066218.1A priority Critical patent/CN114443845A/en
Publication of CN114443845A publication Critical patent/CN114443845A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a BERT-based multi-feature fine-grained Chinese short text sentiment classification method, which comprises the following steps of: step A, comprehensive expression of multi-dimensional characteristics: the input valid features of the model include 4 types: unique heat coding characteristics, position coding characteristics, font characteristics and pinyin characteristics; the four characteristics have the same dimension, further averaging is carried out to obtain a comprehensive characteristic expression, and the characteristic is subjected to a BERT model to obtain a final characteristic expression; the BERT transform superposes a plurality of multi-head self-attention and forward neural network modules, the added font and pinyin characteristics can be compatible with errors of font similarity or homophone in an input text to a certain extent, and related semantics can be correctly extracted even if the errors occur, so that the model can be adaptive to wrong texts in the real world, and the accuracy of model prediction is improved.

Description

BERT-based multi-feature fine-granularity Chinese short text sentiment classification method
Technical Field
The invention relates to the technical field of networks, in particular to a BERT-based multi-feature fine-grained Chinese short text sentiment classification method.
Background
The goal of sentiment analysis was to analyze the sentiment tendency expressed by people on entities and their attributes from text, and the earliest research of this technology began in the article review by two scholars, Nasukawa and Yi in 2003. A large amount of content with emotional tendency is generated along with the development of social media such as microblogs and the like and an e-commerce platform, and a required data basis is provided for emotion analysis. Today, sentiment analysis has been widely used in a number of fields. For example: in the field of commodity retail, the evaluation of users is very important feedback information for retailers and manufacturers, and the recognition and depreciation degree of the products and the competitive products thereof by the users can be quantified by carrying out emotion analysis on the evaluation of massive users, so that the appeal of the products and the comparison superiority and inferiority of the products and the competitive products of the users are known; in the social public opinion field, public opinion trend can be effectively mastered by analyzing public opinion on social hotspot events; in the aspect of enterprise public sentiment, the evaluation of the society on the enterprise can be quickly known by utilizing sentiment analysis, a decision basis is provided for strategic planning of the enterprise, and the competitiveness of the enterprise in the market is improved; in the field of financial transactions, the attitudes of traders on stocks and other financial derivatives are analyzed, and auxiliary bases are provided for market trading.
The existing popular emotion analysis model can be roughly divided into two parts:
1. and performing feature extraction, namely performing coded representation on the text. The coding method is divided into two types, i.e., autoregressive and autorecoding. Autoregression is a one-way model, based on the decoding part in the Transformer model; self-coding is a bi-directional model, based on the coding part in the transform model.
Transformer Is the classic task of NLP by the Google team in 2017, 6, as proposed by Ashish Vaswani et al in the article Attention Is All You Need published in 2017. Its model structure is shown in fig. 1:
the mapping from the features to the emotion categories is realized, generally, a full connection layer and a softmax layer are externally connected, the features are converted into the features with emotion category number dimensionality and then normalized to obtain the probability of each category.
In the prior art, a classification layer is added to fine tuning directly on the basis of an original BERT pre-training model. And performing fine tuning training on the BERT model obtained by pre-training a large amount of general linguistic data by using linguistic data in a specific field and labeled linguistic data in a specific task, and fully extracting the intrinsic meaning of token of the specific linguistic data under the characteristic task.
The first prior art has the following disadvantages:
a) the original BERT model features are too single, the features of the input encode part are only a one-hot encoding vector, a position encoding vector and a token type vector, wherein the token type vector is fixed and has no effective information because only a single sentence is input in an emotion analysis scene.
b) The prior art is adopted to directly carry out fine adjustment under the condition of rare labeled data, so that overfitting is easy to happen, and the robustness of the model cannot be guaranteed.
c) The model is influenced by other super parameters such as classification layer initialization and learning rate, batch size, weight attenuation rate and the like to fall into different extreme points, and the performance of each extreme point on different test sets is different, so that the effect may be biased if only a single model result is finally adopted.
d) Most emotion classifications in the industry are labeled as 2-3 categories, such as positive, negative, neutral, etc. In practical application, the classification is too crude, and the human emotion expression and tendency are more detailed and complicated, so that the amount of coarse-grained emotion classification information is too small, and the subsequent deep analysis is not facilitated.
Disclosure of Invention
The invention aims to provide a BERT-based multi-feature fine-grained Chinese short text sentiment classification method to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a BERT-based multi-feature fine-grained Chinese short text sentiment classification method comprises the following steps:
step A, comprehensive expression of multi-dimensional characteristics: the input valid features of the model include 4 types: one-hot coding characteristics, position coding characteristics, character shape characteristics and pinyin characteristics; the four characteristics have the same dimension, and a comprehensive characteristic is obtained after further averagingThe characteristic is expressed, and the final characteristic expression is obtained after the characteristic passes through a BERT model; the BERT Transformer superimposes a plurality of multi-head self-Attention and forward neural network modules, wherein the self-Attention module uses a bidirectional Attention mechanism, that is, each token focuses on the context of the left and right sides of the self-Attention module, and the multi-head self-Attention module formula is MultiHead (Q, K, V) ═ Concat (head 1.., head) W0, head ═ Attention (QWi)Q,KWiK,VWiV) W0 is a weight matrix for dimension reduction after head stitching, WiQ,WiK,WiVWeight matrices of Q, K, V, respectively, wherein the formula of the middle Attention is
Figure 619288DEST_PATH_IMAGE001
Q, K and V are input query, key and value vectors respectively, dk is vector degree, and the multi-head self-attention module reduces resources consumed by calculation by reducing dimensionality;
b, mapping text vector features to emotion classification probabilistic features: mapping of the text vector features obtained in the previous step to emotion classification features is achieved through a classification layer, a class _ size feature representation is obtained at the moment, the representation is subjected to probability transformation through a softmax layer, namely, each dimension value is between 0 and 1, the sum of all dimension values is equal to 1, the classification layer is represented by the formula S ═ X + b, W is a full-connection weight matrix of nxj, b is a bias term, X is a vector output by a feature extraction layer, and the obtained S enters the softmax layer
Figure 536297DEST_PATH_IMAGE002
Wherein Pi is the probability of a text category i, Si is the numerical value in the ith neuron output by the classification layer, and j is the number of prediction categories;
step C, model fusion: the first 3 model predictions obtained by blind parameter settings were weighted averaged.
As a further technical scheme of the invention: the one-hot coding characteristic is that a vocab _ size _ embedding _ size coding matrix is generated and is obtained by querying dictionary id of token in the coding matrix.
As a further technical scheme of the invention: the position coding features inherit from a coding matrix of 512 × embedding _ size in a BERT pre-training model, and can code text with 512 length at most.
As a further technical scheme of the invention: the glyph features use three fonts: imitating Song, running script and clerical script, and performing convolution and pooling operation on the graphical expression.
As a further technical scheme of the invention: the pinyin features are obtained by carrying out embedding mapping on the full pinyin letters of the Chinese characters and then averaging.
As a further technical scheme of the invention: the BERT model structure is a superposition of a series of transform encoders, aiming to pre-train the deep bi-directional representation by jointly adjusting the context in all layers.
As a further technical scheme of the invention: the classification layer is a forward network of embedding _ size _ class _ size.
As a further technical scheme of the invention: the training process of the model comprises two steps:
1) performing Mask ML unsupervised training, constructing training data through a Mask ML strategy for an unlabeled text in a specific field, and pre-training a model, namely for token in each sentence:
the probability of 85 percent, and the original word is kept unchanged;
15% probability, replace with:
80% probability, using the character [ MASK ], replacing the current token;
replacing the current token with the token randomly extracted by the vocabulary with the probability of 10%;
the probability of 10 percent, and the original word is kept unchanged;
2) and (3) carrying out supervised training on text classification: and calculating the cross entropy loss of the probability result output at the [ CLS ] position and the real labeling result according to the labeling label, calculating the gradient of each parameter through the back propagation of the gradient, and updating the parameters.
Compared with the prior art, the invention has the beneficial effects that: the added font and pinyin characteristics can be compatible with errors of font similarity or homophone in an input text to a certain extent, and related semantics can be correctly extracted even if the errors occur, so that the model can be adaptive to error texts in the real world, and the accuracy of model prediction is improved; in the model training process, unsupervised text pre-training is fully utilized as the basis of supervised training, the semantic expression characteristics of the text in a specific field are learned from the unsupervised text, overfitting of the model is avoided under the condition that labeled text is rare, and the robustness of the model is improved. The final model fusion takes into account the tendency of a single model caused by initialization and hyper-parameter selection, and the model effect is more stable through the averaging of the model results.
Drawings
FIG. 1 is a diagram of a current popular emotion analysis model.
FIG. 2 is a schematic diagram of fine tuning by adding a classification layer directly on the basis of an original BERT pre-training model.
FIG. 3 is a schematic diagram of an overlay of a transform encoder.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1, example 1: a multi-feature fine-grained Chinese short text sentiment classification method based on BERT comprises the following steps:
step A: comprehensively expressing the multidimensional characteristics: the input valid features of the model include 4 types: unique hot coding characteristics, position coding characteristics, character shape characteristics and pinyin characteristics. The one-hot coding feature is that a vocab _ size _ embedding _ size coding matrix is generated, and the dictionary id of token is inquired in the coding matrix to obtain the unique-hot coding feature; the position coding characteristics are inherited from a coding matrix of 512 × embedding _ size in a BERT pre-training model, and the text with the length of 512 at most can be coded; the character style characteristics adopt three fonts: imitating Song, running script and clerical script, and performing convolution and pooling operation on the graphical expression; the pinyin features are obtained by performing embedding mapping on the full pinyin letters of the Chinese characters and then averaging. The four characteristics have the same dimension size, and a comprehensive characteristic expression is obtained after further averaging. The characteristics are subjected to a BERT model to obtain final characteristic expression. The BERT model structure is a superposition of a series of transform encoders (as shown in the following figure), aiming to pre-train the deep bi-directional representation by jointly adjusting the context in all layers.
The BERT Transformer superimposes a plurality of multi-head self-Attention and forward neural network modules, wherein the self-Attention module uses a bidirectional Attention mechanism, that is, each token focuses on the context of the left and right sides of the self-Attention module, and the multi-head self-Attention module formula is MultiHead (Q, K, V) ═ Concat (head 1.., head) W0, head ═ Attention (QWi)Q,KWiK,VWiV) W0 is a weight matrix for dimension reduction after head stitching, WiQ,WiK,WiVWeight matrices of Q, K, V, respectively, wherein the formula of the middle Attention is
Figure 914496DEST_PATH_IMAGE001
Q, K and V are input query, key and value vectors respectively, dk is vector degree, and the multi-head self-attention module reduces resources consumed by calculation by reducing dimensionality.
B, mapping text vector features to emotion classification probabilistic features:
and realizing the mapping of the text vector features obtained in the last step to emotion classification features through a classification layer (forward network of embedding _ size _ class _ size), obtaining a feature representation of class _ size at the moment, and performing probability representation through a softmax layer, namely realizing that each dimension value is between 0 and 1 and the sum of all dimension values is equal to 1. The classification layer formula is S ═ (WT X + b), W is a full-connection weight matrix of n × j, b is a bias term, and X is a vector output by the feature extraction layer. The obtained S enters a softmax layer, and the formula is
Figure 875237DEST_PATH_IMAGE002
Wherein Pi is a text classThe probability of class i, Si is the value in the ith neuron output by the classification layer, and j is the number of prediction classes.
Step C, model fusion: the first 3 model predictions obtained by blind parameter settings were weighted averaged.
Adding softmax of all emotion classes to the [ CLS ] position of the model output, and predicting whether the output is: is happy.
Example 2, on the basis of example 1, the training process of the model is divided into two steps:
1) mask ML unsupervised training: for the unlabeled text in the specific field, constructing training data through a Mask ML strategy, and pre-training the model, namely for token in each sentence:
the probability of 85 percent, and the original word is kept unchanged;
15% probability, replace with:
80% probability, replace the current token with the character [ MASK ].
The 10% probability, the token randomly drawn using the vocabulary, replaces the current token.
The probability of 10 percent, and the original word is kept unchanged;
such as:
original sentence: i like it.
Inputting a model: [ CLS ] I like to be happy [ MASK ].
Adding softmax of all words in a layer at the [ MASK ] position of model output, and predicting whether the output is: it is a mixture of;
2) and (3) carrying out supervised training on text classification: and calculating the cross entropy loss of the probability result output at the [ CLS ] position and the real labeling result according to the labeling label, calculating the gradient of each parameter through the back propagation of the gradient, and updating the parameters.
Such as:
original sentence: i like it.
Inputting a model: [ CLS ] I like to take it.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (8)

1. A BERT-based multi-feature fine-grained Chinese short text sentiment classification method is characterized by comprising the following steps:
step A, comprehensive expression of multi-dimensional characteristics: the input valid features of the model include 4 types: one-hot coding characteristics, position coding characteristics, character shape characteristics and pinyin characteristics; the four characteristics have the same dimension, further averaging is carried out to obtain a comprehensive characteristic expression, and the characteristic is subjected to a BERT model to obtain a final characteristic expression; the BERT Transformer superimposes a plurality of multi-head self-Attention and forward neural network modules, wherein the self-Attention module uses a bidirectional Attention mechanism, that is, each token focuses on the context of the left and right sides of the self-Attention module, and the multi-head self-Attention module formula is MultiHead (Q, K, V) ═ Concat (head 1.., head) W0, head ═ Attention (QWi)Q ,KWiK,VWiV) W0 is a weight matrix for dimension reduction after head stitching, WiQ,WiK,WiVWeight matrices of Q, K, V, respectively, wherein the formula of the middle Attention is
Figure DEST_PATH_IMAGE001
Q, K, V are inputs query, Q, K, V, respectively,The key and value vectors and dk are vector degrees, and the multi-head self-attention module reduces the resources consumed by calculation by reducing the dimensionality;
b, mapping text vector features to emotion classification probabilistic features: mapping of the text vector features obtained in the previous step to emotion classification features is achieved through a classification layer, a class _ size feature representation is obtained at the moment, the representation is subjected to probability transformation through a softmax layer, namely, each dimension value is between 0 and 1, the sum of all dimension values is equal to 1, the classification layer is represented by the formula S ═ X + b, W is a full-connection weight matrix of nxj, b is a bias term, X is a vector output by a feature extraction layer, and the obtained S enters the softmax layer
Figure DEST_PATH_IMAGE002
Wherein Pi is the probability of a text category i, Si is the numerical value in the ith neuron output by the classification layer, and j is the number of prediction categories;
step C, model fusion: the first 3 model predictions obtained by blind parameter settings were weighted averaged.
2. The method as claimed in claim 1, wherein the characteristic features of one-hot coding are a vocoded matrix of vocab _ size _ embedding _ size generated by querying token's dictionary id in the vocoded matrix.
3. The method as claimed in claim 1, wherein the position coding features are inherited from the coding matrix of 512 × embedding _ size in the BERT pre-training model, and can code text with 512 length at most.
4. The BERT-based multi-feature fine-grained Chinese short text sentiment classification method of claim 3, wherein the font features adopt three fonts: imitating Song, running script and clerical script, and performing convolution and pooling operation on the graphical expression to obtain the character string.
5. The BERT-based multi-feature fine-grained Chinese short text sentiment classification method of claim 4, wherein the Pinyin features are obtained by performing embedding mapping on full Pinyin letters of Chinese characters and then averaging.
6. The method of claim 1, wherein the BERT model structure is a superposition of a series of transform coders, aiming at pre-training a deep bi-directional representation by jointly adjusting contexts in all layers.
7. The BERT-based multi-feature fine-grained chinese short text sentiment classification method of claim 1 wherein the classification layer is a forward network of embedding _ size _ class _ size.
8. The BERT-based multi-feature fine-grained Chinese short text sentiment classification method according to claim 1, characterized in that a model training process is divided into two steps:
performing Mask ML unsupervised training, constructing training data through a Mask ML strategy for an unlabeled text in a specific field, and pre-training a model, namely for token in each sentence:
the probability of 85 percent, the original word is kept unchanged;
15% probability, replace with:
80% probability, using the character [ MASK ], replacing the current token;
replacing the current token with the token randomly extracted by the vocabulary with the probability of 10%;
the probability of 10 percent, and the original word is kept unchanged;
2) and (3) carrying out supervised training on text classification: and calculating the cross entropy loss of the probability result output at the [ CLS ] position and the real labeling result according to the labeling label, calculating the gradient of each parameter through the back propagation of the gradient, and updating the parameters.
CN202210066218.1A 2022-01-20 2022-01-20 BERT-based multi-feature fine-granularity Chinese short text sentiment classification method Pending CN114443845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210066218.1A CN114443845A (en) 2022-01-20 2022-01-20 BERT-based multi-feature fine-granularity Chinese short text sentiment classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210066218.1A CN114443845A (en) 2022-01-20 2022-01-20 BERT-based multi-feature fine-granularity Chinese short text sentiment classification method

Publications (1)

Publication Number Publication Date
CN114443845A true CN114443845A (en) 2022-05-06

Family

ID=81367463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210066218.1A Pending CN114443845A (en) 2022-01-20 2022-01-20 BERT-based multi-feature fine-granularity Chinese short text sentiment classification method

Country Status (1)

Country Link
CN (1) CN114443845A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012199A1 (en) * 2019-07-04 2021-01-14 Zhejiang University Address information feature extraction method based on deep neural network model
CN112395417A (en) * 2020-11-18 2021-02-23 长沙学院 Network public opinion evolution simulation method and system based on deep learning
KR20210040851A (en) * 2020-06-03 2021-04-14 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Text recognition method, electronic device, and storage medium
CN113239690A (en) * 2021-03-24 2021-08-10 浙江工业大学 Chinese text intention identification method based on integration of Bert and fully-connected neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012199A1 (en) * 2019-07-04 2021-01-14 Zhejiang University Address information feature extraction method based on deep neural network model
KR20210040851A (en) * 2020-06-03 2021-04-14 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Text recognition method, electronic device, and storage medium
CN112395417A (en) * 2020-11-18 2021-02-23 长沙学院 Network public opinion evolution simulation method and system based on deep learning
CN113239690A (en) * 2021-03-24 2021-08-10 浙江工业大学 Chinese text intention identification method based on integration of Bert and fully-connected neural network

Similar Documents

Publication Publication Date Title
Poria et al. Aspect extraction for opinion mining with a deep convolutional neural network
CN113128229B (en) Chinese entity relation joint extraction method
CN111797898B (en) Online comment automatic reply method based on deep semantic matching
CN109598586B (en) Recommendation method based on attention model
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN112417854A (en) Chinese document abstraction type abstract method
CN110781290A (en) Extraction method of structured text abstract of long chapter
CN115687626A (en) Legal document classification method based on prompt learning fusion key words
CN114462420A (en) False news detection method based on feature fusion model
CN113869055A (en) Power grid project characteristic attribute identification method based on deep learning
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
Zeng et al. Pyramid hybrid pooling quantization for efficient fine-grained image retrieval
CN116029305A (en) Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
Shao et al. Controllable image caption with an encoder-decoder optimization structure
Yong et al. A new emotion analysis fusion and complementary model based on online food reviews
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
CN111858930A (en) Method for establishing social e-commerce user portrait
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN116663566A (en) Aspect-level emotion analysis method and system based on commodity evaluation
CN114443845A (en) BERT-based multi-feature fine-granularity Chinese short text sentiment classification method
CN113032558B (en) Variable semi-supervised hundred degree encyclopedia classification method integrating wiki knowledge
CN112733526B (en) Extraction method for automatically identifying tax collection object in financial file
CN114925689A (en) Medical text classification method and device based on BI-LSTM-MHSA
CN114925658A (en) Open text generation method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination