CN115292490A - Analysis algorithm for policy interpretation semantics - Google Patents

Analysis algorithm for policy interpretation semantics Download PDF

Info

Publication number
CN115292490A
CN115292490A CN202210921753.0A CN202210921753A CN115292490A CN 115292490 A CN115292490 A CN 115292490A CN 202210921753 A CN202210921753 A CN 202210921753A CN 115292490 A CN115292490 A CN 115292490A
Authority
CN
China
Prior art keywords
model
word
analysis
policy
analysis algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210921753.0A
Other languages
Chinese (zh)
Inventor
黄明明
施东晓
廖晓洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Kelifang Technology Co ltd
Original Assignee
Fujian Kelifang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Kelifang Technology Co ltd filed Critical Fujian Kelifang Technology Co ltd
Priority to CN202210921753.0A priority Critical patent/CN115292490A/en
Publication of CN115292490A publication Critical patent/CN115292490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an analysis algorithm for policy interpretation semantics, which comprises an analysis model, wherein the analysis model comprises a BERT language model, a TCN time sequence model and a CRF probability model, and the analysis algorithm comprises the following steps: 1. inputting a policy file to be identified into an analysis model; 2. converting the file input in the step one into a word vector containing context information by using a BERT language model; 3. the TCN time sequence model classifies the word vectors obtained in the second step; 4. the CRF probability model adjusts sentence sequence of the word vectors classified in the third step; 5. cleaning the result output by the model by using regular matching; 6. and extracting and displaying the identified entities to complete the analysis of the policy interpretation semantics. The method and the system can analyze and research the policy document by using the named entity recognition technology, automatically recognize and classify valuable information in the policy, solve the problem of cleaning and warehousing of recognition results, and simultaneously judge and record the fields with wrong recognition.

Description

Analysis algorithm for policy interpretation semantics
Technical Field
The invention belongs to the technical field of computers, and particularly relates to an analysis algorithm for policy interpretation semantics.
Background
In different regions, a great deal of policy documents are introduced and managed for enterprises, the documents contain a lot of information which is very important for the enterprises, such as subsidy conditions, loan policies, project declaration conditions and the like, and with the development of society and science and technology, more and more enterprises have the requirement of reading the policy documents. Since policy documents are mostly in semi-structured and unstructured states, analysis processing and data mining thereof are severely restricted.
In recent years, deep learning has been significantly advanced in the fields of NLP, image recognition, and the like, and a large number of researchers have also applied deep learning to named entity recognition. The named entity recognition method based on deep learning needs to convert text information into serialized vectors through a word embedding method. The Word embedding methods proposed at present, such as Word2Vec, have the problem that the ambiguity of Chinese characters cannot be handled, for example, the Word disease may be represented by 'disease' in different contexts, and the speed of adjective Word may also be represented. To address this problem, many scholars propose different context-dependent word embedding methods, such as the ELMO (embedding from language models) method and the OpenAI-GPT (generating pre-tracing) method. However, the language representation of the word vector embedding method combined with the context is unidirectional, and the information of the front and back semantics cannot be obtained at the same time.
The method and the system can analyze and research the policy document by using the named entity recognition technology, automatically recognize and classify valuable information in the policy, solve the problem of cleaning and warehousing of recognition results, and simultaneously judge and record the fields with wrong recognition.
Disclosure of Invention
The invention discloses an analysis algorithm for policy interpretation semantics, which mainly aims to overcome the defects and shortcomings in the prior art.
The technical scheme adopted by the invention is as follows:
an analytical algorithm for policy interpretation semantics comprising an analytical model comprising a BERT language model, a TCN timing model and a CRF probability model, the analytical algorithm comprising the following specific analytical steps:
the method comprises the following steps: inputting a policy file to be identified into an analysis model;
step two: the BERT language model converts the file input in the step one into a word vector containing context information;
step three: the TCN time sequence model classifies the word vectors obtained in the second step;
step four: the CRF probability model adjusts sentence sequence of the word vectors classified in the third step;
step five: cleaning the result output by the model by using regular matching;
step six: and extracting and displaying the identified entities to complete the analysis of the policy interpretation semantics.
Further, the transformation process in the second step comprises:
(1) Labeling data by using a BIOES labeling method, wherein B represents the beginning of a sentence, I represents an entity in the sentence, O represents irrelevant content, E represents the end of the sentence, and S represents an entity formed by a single word;
(2) Training the BERT language model by using the data marked in the step (1), wherein the training process is as follows: firstly, the labeled data passes through a BERT network, and then the input data is converted into an embedded word vector containing upper and lower semantics.
Furthermore, the whole framework of the BERT network in the step (2) is formed by stacking a plurality of layers of transform encoders, each layer of encoder is composed of one layer of muti-head-entry and one layer of feed-word, and each entry recodes the target word through the relevance of the target word and all words in the sentence to obtain a new code of each word.
Further, the calculation of attention includes the following three steps:
the method comprises the following steps: calculating the correlation between words, performing linear transformation on the input sequence vectors (512 × 768) through three weight matrixes, respectively generating three new sequence vectors of query, key and value, and multiplying the query vector of each word with the key vectors of all words in the sequence to obtain the correlation between the words;
step two: normalizing the correlation degree, and normalizing the correlation degree obtained in the step one through softmax;
step three: and D, performing weighted summation on the correlation and the codes of all the words, and performing weighted summation on the normalized weight obtained in the step two and the value to obtain a new code of each word.
Still further, the BERT network comprises 24 layers of transformers, each having 16 attentions.
Further, the entity identified in the sixth step is an entity belonging to class I.
Furthermore, the specific process of word vector classification by the TCN timing model in step three includes:
(1) Firstly, inputting the word vector input in the step two into a TCN network;
(2) And (3) classifying the word vectors input in the step (1) by utilizing a TCN time sequence convolution network.
Furthermore, the sentence order in step four is adjusted by: and inputting the classified word vectors into a CRF conditional random field, and then smoothly adjusting to meet the sentence order requirement to finish the sentence order adjustment.
Furthermore, the CRF conditional random field is a discriminant probability distribution model and is a Markov random field of another set of output random variables Y under the condition of a set of input random variables X.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
according to the invention, the BERT-TCN-CRF model is used for realizing named entity recognition of the policy document, the BERT network pre-training model is used, and the static word vector generated by the traditional method is replaced by the dynamic word vector obtained by training in the large-scale corpus, so that the problem of word ambiguity existing in the traditional word embedding method is effectively solved, and the semantic representation is more accurate. The F1 value of the BERT network pre-training model in the policy document corpus marked by the BERT network pre-training model reaches 94.72%, and compared with other models, the BERT network pre-training model has a better recognition effect, can well complete the task of recognizing the policy document named entities, and can meet the requirements of enterprises on the aspect of recognizing the policy text named entities. Meanwhile, the invention provides complete data cleaning and warehousing work, and can clean the identification result with finer granularity.
Using TCN network, the traditional named entity recognition model usually adopts LSTM model, but experiments prove that TCN network can keep more extended memory, and the performance in the recognition model is higher than that of LSTM model.
Using CRF conditional random fields, consider that in the sequence labeling task, adjacent words or phrases need to follow certain rules, such as that an I label is preceded by a B label, cannot be an O label, etc. The CRF model can reasonably consider the dependency relationship existing between information and model the tag sequence so as to obtain the optimal sequence.
Meanwhile, the invention also solves the problem of the work of cleaning and warehousing of the recognition result, such as the growth rate recognized by a model, judges whether the growth rate is the growth rate of the last year or the growth rate of the last two years according to the situation of regular matching, stores the judgment result into the corresponding field of the database, and judges and records the field with the recognition error.
Drawings
Fig. 1 is a schematic diagram of the architecture of the BERT network of the present invention.
Fig. 2 is a schematic diagram of the TCN network of the present invention.
FIG. 3 is a schematic diagram of the structure of a conditional random field of the CRF of the present invention.
Detailed Description
Embodiments of the present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, fig. 2 and fig. 3, an analysis algorithm for policy interpretation semantics includes an analysis model, the analysis model includes a BERT language model, a TCN timing model and a CRF probability model, and the analysis algorithm includes the following specific analysis steps:
the method comprises the following steps: inputting policy files needing to be identified into an analysis model;
step two: the BERT language model converts the file input in the step one into a word vector containing context information;
step three: the TCN time sequence model classifies the word vectors obtained in the second step;
step four: the CRF probability model adjusts sentence sequence of the word vectors classified in the third step;
step five: cleaning the result output by the model by using regular matching;
step six: and extracting and displaying the identified entities to complete the analysis of the policy interpretation semantics.
Further, the transformation process in the second step comprises:
(1) Marking the data by using a BIOES marking method, wherein B represents the beginning of a sentence, I represents an entity in the sentence, O represents irrelevant content, E represents the end of the sentence, and S represents an entity formed by a single character;
(2) Training the BERT language model by using the data marked in the step (1), wherein the training process comprises the following steps: firstly, the labeled data passes through a BERT network, and then the input data is converted into an embedded word vector containing upper and lower semantics.
Furthermore, the whole framework of the BERT network in the step (2) is formed by stacking a plurality of layers of transform encoders, each layer of encoder is composed of one layer of muti-head-entry and one layer of feed-word, and each entry recodes the target word through the relevance of the target word and all words in the sentence to obtain a new code of each word.
Further, the calculation of attention includes the following three steps:
the method comprises the following steps: calculating the correlation between words, performing linear transformation on the input sequence vectors (512 × 768) through three weight matrixes, respectively generating three new sequence vectors of query, key and value, and multiplying the query vector of each word with the key vectors of all words in the sequence to obtain the correlation between the words;
step two: normalizing the correlation degree, and normalizing the correlation degree obtained in the step one through softmax;
step three: and D, performing weighted summation on the correlation and the codes of all the words, and performing weighted summation on the normalized weight obtained in the step two and the value to obtain a new code of each word.
Still further, the BERT network includes 24 transformers, each having 16 attentions.
Further, the entity identified in the sixth step is an entity belonging to class I.
Furthermore, the specific process of word vector classification by the TCN timing model in step three includes:
(1) Firstly, inputting the word vector input in the step two into a TCN network;
(2) And (3) classifying the word vectors input in the step (1) by utilizing a TCN time sequence convolution network.
Furthermore, the sentence order in step four is adjusted by: and inputting the classified word vectors into a CRF conditional random field, and then smoothly adjusting to meet the sentence order requirement to finish the sentence order adjustment.
Furthermore, the CRF conditional random field is a discriminant probability distribution model and is a Markov random field of another set of output random variables Y under the condition of a set of input random variables X.
The following is a detailed description of each model of the present embodiment:
1. high quality data sets labeled using BIOS labeling
In this embodiment, a BIOS text labeling method is used to perform entity labeling on a large number of policy documents for training and testing of models.
2. BERT network
The first part of the model in this embodiment uses BERT network for word embedding, BERT is transform-based bi-directional coding representation, and is a pre-training model, and two tasks of model training are to predict the words covered in sentences and to determine whether the input two sentences are top and bottom sentences. And adding a corresponding network behind the pre-trained BERT model according to a specific task, and finishing downstream tasks of the NLP, such as text classification, machine translation and the like.
Although BERT is based on a transformer, it uses only the encoder portion of the transformer, and its overall frame is formed by stacking multiple layers of the transformer's encoders. The encoder of each layer consists of one layer of muti-head-attentions and one layer of feed-form, and our model uses a larger BERT network with 24 layers of 16 attentions each. The main role of each attention is to re-encode the target word by its relatedness to all words in the sentence. The calculation of each attention therefore includes three steps: and calculating the correlation between the words, normalizing the correlation, and performing weighted summation on the correlation and the codes of all the words to obtain the code of the target word. When the correlation degree between words is calculated through attribute, firstly, linear transformation is carried out on an input sequence vector (512 × 768) through three weight matrixes, three new sequence vectors of query, key and value are respectively generated, the query vector of each word is respectively multiplied by the key vectors of all words in the sequence to obtain the correlation degree between the words, then the correlation degree is normalized through softmax, and the normalized weight and the value are subjected to weighted summation to obtain new codes of each word.
3. TCN network
The TCN is a model proposed in 2018 and is called temporalconvolutionnetwork, which is a time-series convolutional network, and can be used for time-series data processing. Convolutional networks have proven to be good at extracting high-level features in structured data. The time convolution network is a neural network model utilizing causal convolution and hole convolution, can adapt to the time sequence of time sequence data, and can provide a visual field for time sequence modeling.
(1) Causal convolution
Causal convolution means that for the value at time t of the previous layer, only the value at and before time t of the next layer is relied upon. Unlike conventional convolutional neural networks, causal convolution does not see future data, is a unidirectional structure, and is not bidirectional. That is, the prior antecedent, which is a strictly time-constrained model, can have consequences, and is therefore referred to as causal convolution.
(2) Convolution of holes
Unlike conventional convolution, the dilated convolution allows the input to be sampled at intervals during convolution, and the sample rate parameter d controls. The bottom layer d =1 indicates that each point is sampled during the input, and the middle layer d =2 indicates that every 2 points are sampled once during the input as input. Generally, the higher the hierarchy, the larger the value of d. Therefore, the dilation convolution causes the effective window size to grow exponentially with the number of layers. In this way, the convolutional network can use fewer layers and can obtain a large field of view.
(3) Residual concatenation
According to the scheme, residual connection is used in the TCN network, and a residual block is constructed to replace a convolutional layer. One residual block contains two layers of convolution and nonlinear mapping and weighted norm and Dropout are added to each layer to regularize the network.
4. Conditional random field for CRF
Conditional random fields are a discriminative probabilistic model, which is a type of random field commonly used to label or analyze sequence data, such as natural language text or biological sequences. The conditional random field is a conditional probability distribution model P (Y | X) representing a markov random field of a set of output random variables Y given a set of input random variables X, i.e., the CRF is characterized by assuming that the output random variables constitute a markov random field. Conditional random fields can be viewed as a generalization of the maximum entropy Markov model over the labeling problem.
Like a Markov random field, a conditional random field is a graph model with no direction, in which the distribution of a random variable Y is the conditional probability and a given observation is the random variable X. In principle, the graph model layout of the conditional random field can be arbitrarily given, and a general layout is a chained architecture, which has a more efficient algorithm for calculation in training (training), inference (inference), or decoding (decoding). The conditional random field is a typical discriminant model, and the joint probability thereof can be written in the form of multiplication of several potential functions, wherein the most common is the linear chain element random field. The model of the invention uses conditional random fields to ensure that the classification data output by the model is in the same order as the BIOS labeling method.
5. Canonical matching
The method and the system use regular matching to clean the output result of the model, for example, when the category of the entity is judged to be the growth rate, the system can position the paragraph of the sentence, and then extract the specific year in a regular matching mode, such as the growth rate of the previous year or the growth rate of the previous two years.
As can be seen from the above description of the present invention, compared with the prior art, the advantages of the present invention are:
the method realizes named entity recognition of the policy document by using the BERT-TCN-CRF model, uses the BERT network pre-training model, and uses the dynamic word vectors obtained by training in the large-scale corpus to replace the static word vectors generated by the traditional method, thereby effectively solving the problem of word ambiguity existing in the traditional word embedding method and ensuring that the semantic representation is more accurate. The F1 value of the BERT network pre-training model in the policy document corpus marked by the BERT network pre-training model reaches 94.72%, and compared with other models, the BERT network pre-training model has a better recognition effect, can well complete the task of recognizing the policy document named entities, and can meet the requirements of enterprises on the aspect of recognizing the policy text named entities. Meanwhile, the invention provides complete data cleaning and warehousing work, and can clean the identification result with finer granularity.
Using TCN network, the traditional named entity recognition model usually adopts LSTM model, but experiments prove that TCN network can keep more extended memory, and the performance in the recognition model is higher than that of LSTM model.
Using a CRF conditional random field, consider that in the sequence labeling task, adjacent words or phrases need to follow certain rules, such as an I label preceded by a B label, not an O label, etc. The CRF model can reasonably consider the dependency relationship existing between information and model the tag sequence so as to obtain the optimal sequence.
Meanwhile, the invention also solves the problem of the work of cleaning and warehousing of the recognition result, such as the growth rate recognized by a model, judges whether the growth rate is the growth rate of the last year or the growth rate of the last two years according to the situation of regular matching, stores the judgment result into the corresponding field of the database, and judges and records the field with the recognition error.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications of the present invention using this concept shall fall within the scope of infringing the present invention.

Claims (9)

1. An analysis algorithm for policy interpretation semantics, characterized by: the method comprises an analysis model, wherein the analysis model comprises a BERT language model, a TCN time sequence model and a CRF probability model, and the analysis algorithm comprises the following specific analysis steps:
the method comprises the following steps: inputting a policy file to be identified into an analysis model;
step two: the BERT language model converts the file input in the step one into a word vector containing context information;
step three: the TCN time sequence model classifies the word vectors obtained in the second step;
step four: the CRF probability model adjusts sentence sequence of the word vectors classified in the third step;
step five: cleaning the result output by the model by using regular matching;
step six: and extracting and displaying the identified entities to complete the analysis of the policy interpretation semantics.
2. An analysis algorithm for policy interpretation semantics according to claim 1, wherein: the transformation process in the second step comprises the following steps:
(1) Labeling data by using a BIOES labeling method, wherein B represents the beginning of a sentence, I represents an entity in the sentence, O represents irrelevant content, E represents the end of the sentence, and S represents an entity formed by a single word;
(2) Training the BERT language model by using the data marked in the step (1), wherein the training process comprises the following steps: firstly, the labeled data passes through a BERT network, and then the input data is converted into an embedded word vector containing upper and lower semantics.
3. An analysis algorithm for policy interpretation semantics according to claim 2, wherein: and (3) stacking the integral frame of the BERT network in the step (2) by a plurality of layers of transform encoders, wherein each layer of encoder consists of one layer of muti-head-entry and one layer of feed-word, and each entry recodes the target word through the relevance of the target word and all words in the sentence to obtain a new code of each word.
4. An analysis algorithm for policy interpretation semantics according to claim 3, wherein: the calculation of the attention comprises the following three steps:
the method comprises the following steps: calculating the correlation between words, performing linear transformation on the input sequence vectors (512 × 768) through three weight matrixes, respectively generating three new sequence vectors of query, key and value, and multiplying the query vector of each word with the key vectors of all words in the sequence to obtain the correlation between the words;
step two: normalizing the correlation degree, and normalizing the correlation degree obtained in the first step through softmax;
step three: and D, performing weighted summation on the correlation and the codes of all the words, and performing weighted summation on the normalized weight obtained in the step two and the value to obtain a new code of each word.
5. An analysis algorithm for policy interpretation semantics according to claim 3, wherein: the BERT network includes 24 transformers, each having 16 attentions.
6. An analysis algorithm for policy interpretation semantics according to claim 2, wherein: and the entity identified in the sixth step is an entity belonging to the class I.
7. An analysis algorithm for policy interpretation semantics according to claim 1, wherein: the specific process of carrying out word vector classification by the TCN timing model in the third step comprises the following steps:
(1) Firstly, inputting the word vector input in the step two into a TCN network;
(2) And (2) classifying the word vectors input in the step (1) by using a TCN time sequence convolution network.
8. An analysis algorithm for policy interpretation semantics according to claim 1, wherein: the sentence sequence in the fourth step is adjusted in the following way: and inputting the classified word vectors into a CRF conditional random field, and then smoothly adjusting to ensure that the word vectors meet the sequence requirement of sentences, thereby completing the adjustment of the sentence sequence.
9. An analysis algorithm for policy interpretation semantics according to claim 5, wherein: the CRF conditional random field is a discriminant probability distribution model and is a Markov random field of another group of output random variables Y under the condition of giving a group of input random variables X.
CN202210921753.0A 2022-08-02 2022-08-02 Analysis algorithm for policy interpretation semantics Pending CN115292490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210921753.0A CN115292490A (en) 2022-08-02 2022-08-02 Analysis algorithm for policy interpretation semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210921753.0A CN115292490A (en) 2022-08-02 2022-08-02 Analysis algorithm for policy interpretation semantics

Publications (1)

Publication Number Publication Date
CN115292490A true CN115292490A (en) 2022-11-04

Family

ID=83827068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210921753.0A Pending CN115292490A (en) 2022-08-02 2022-08-02 Analysis algorithm for policy interpretation semantics

Country Status (1)

Country Link
CN (1) CN115292490A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562265A (en) * 2023-07-04 2023-08-08 南京航空航天大学 Information intelligent analysis method, system and storage medium
CN117077682A (en) * 2023-05-06 2023-11-17 西安公路研究院南京院 Document analysis method and system based on semantic recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077682A (en) * 2023-05-06 2023-11-17 西安公路研究院南京院 Document analysis method and system based on semantic recognition
CN117077682B (en) * 2023-05-06 2024-06-07 西安公路研究院南京院 Document analysis method and system based on semantic recognition
CN116562265A (en) * 2023-07-04 2023-08-08 南京航空航天大学 Information intelligent analysis method, system and storage medium
CN116562265B (en) * 2023-07-04 2023-12-01 南京航空航天大学 Information intelligent analysis method, system and storage medium

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN111177394B (en) Knowledge map relation data classification method based on syntactic attention neural network
US20220147836A1 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN109271529B (en) Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian
CN112732934B (en) Power grid equipment word segmentation dictionary and fault case library construction method
CN115292490A (en) Analysis algorithm for policy interpretation semantics
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN113946677B (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN113191148A (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN113343690B (en) Text readability automatic evaluation method and device
CN113947161A (en) Attention mechanism-based multi-label text classification method and system
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
CN114897167A (en) Method and device for constructing knowledge graph in biological field
CN111026880A (en) Joint learning-based judicial knowledge graph construction method
CN113869055A (en) Power grid project characteristic attribute identification method based on deep learning
CN116010619A (en) Knowledge extraction method in complex equipment knowledge graph construction process
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN112559741B (en) Nuclear power equipment defect record text classification method, system, medium and electronic equipment
Skondras et al. Efficient Resume Classification through Rapid Dataset Creation Using ChatGPT
CN113204975A (en) Sensitive character wind identification method based on remote supervision
CN117474010A (en) Power grid language model-oriented power transmission and transformation equipment defect corpus construction method
Zheng et al. Pretrained domain-specific language model for general information retrieval tasks in the aec domain
Hua et al. A character-level method for text classification
CN116843175A (en) Contract term risk checking method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination