CN112101009A - Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions - Google Patents

Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions Download PDF

Info

Publication number
CN112101009A
CN112101009A CN202011008324.1A CN202011008324A CN112101009A CN 112101009 A CN112101009 A CN 112101009A CN 202011008324 A CN202011008324 A CN 202011008324A CN 112101009 A CN112101009 A CN 112101009A
Authority
CN
China
Prior art keywords
layer
cnn
bilstm
character
dream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011008324.1A
Other languages
Chinese (zh)
Other versions
CN112101009B (en
Inventor
郑丽敏
吕庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202011008324.1A priority Critical patent/CN112101009B/en
Publication of CN112101009A publication Critical patent/CN112101009A/en
Application granted granted Critical
Publication of CN112101009B publication Critical patent/CN112101009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method for judging similarity of a relation frame with a dream of red mansions based on a knowledge graph, which comprises the following steps: collecting and processing data; adding an attention mechanism on the basis of BERT to obtain WBERT; constructing a WBERT + BILSTM + CNN + ATTENTION mechanism + CRF as a named entity recognition model; constructing a WBERT + dynamic IDCNN + FC as a relational extraction model; training a named entity recognition model and a relation extraction model to obtain an optimal model, obtaining a dream of Red mansions character relation triple through the optimal model, marking the entity with a serial number according to the occurrence frequency and the entrance and exit degree of the entity, classifying, and continuously changing the entity serial number according to the character relation; adding Weights on the basis of the triples to form quadruplets, wherein the Weights are the importance degree of the relationship; storing the quadruple into NEO4J, and compiling an algorithm to align the entities; extracting a novel tetrad with a comparison, and comparing the tetrad with the similarity of the tetrad of the dream of the red chamber; compared with the method of comparing each sentence one by one to obtain the similarity of two novels, the method can compare the character relation frames to obtain the similarity.

Description

Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions
Technical Field
The invention relates to a method for judging similarity of a relation frame between a novel and a dream of red mansions, in particular to a method for judging similarity of relation frames between other novel and dream of red mansions based on a knowledge graph.
Background
In recent years, network attack-enduring events are frequent, which causes serious contusion to the power and enthusiasm of literature innovation, forms the expulsion of concept of integrity, originality and other equivalent values, and is harmful to the cultural originality of a nation.
The dream of red mansions, as one of four famous works in China, is a classical literature giant work, becomes the object of numerous assailants, the assailants modify the relationship frame of the characters in the dream of red mansions, some of the assailants even just change one name, the relationship frame of the characters in the novel is compared with the relationship frame of the characters in the dream of red mansions, the similarity degree comparison analysis is carried out, and the similarity of the character relationship structure can be judged.
With the rapid development of machine learning, the NLP technology is applied to more and more fields, and the NLP technology is used for detecting a frame which inherits the dream of red mansions, firstly, a knowledge graph of the relationship of people in the dream of red mansions is required to be constructed, and then, the input text is compared with the relationship frame of people in the dream of red mansions in similarity. In the process, data needs to be collected, a character relation dictionary needs to be built, data is labeled to obtain training data of a named entity recognition model and a relation extraction model, and the key point lies in the building of the named entity recognition model and the relation extraction model, then a knowledge graph is built, and the similarity of a novel to be compared and a dream of Red mansions character framework is compared.
Then, the names and the relations of the new texts are extracted, and the accuracy is low on the premise of no large amount of training data; the accuracy of entity and relationship extraction of the named entity identification and relationship extraction model in the specific field can be further improved; the traditional text similarity comparison is to compare each sentence by sentence so as to obtain the similarity of two novels, and the similarity of a character relation frame cannot be obtained.
Disclosure of Invention
The invention aims to construct a knowledge graph by using a people relationship framework of a dream red mansions, provides a new entity relationship extraction method, can extract a frame of an untrained fiction entity relationship, and then judges the degree of attack by using a self-defined similarity comparison method, and comprises the following steps:
1. gathering data
Collecting the people, the relations and the main places of the dream of red mansions, sorting and integrating data with multiple sources, and checking for missing and leakage to obtain relatively comprehensive data; the first character of the character name with higher frequency in the novel is collected, and if the first character of the character name with higher frequency is not in the common names, the first character is added.
2. Data processing
(1) Constructing a character dictionary for the sorted people, places and new people in the dream of red mansions, which specifically comprises the following steps: constructing a character dictionary by using the methods of character + PER label and location + LOC label for the characters and the locations; adding B-PER labels to surnames in the new family names; adding a character dictionary; writing a python code matching character dictionary, and converting the txt file of the whole dream of the red building into a txt file in a standard BIO form; dividing the txt file in the BIO form into a training set and a testing set according to the proportion of 7:3, and cutting by k-fold to take different parts of the training set as a verification set;
(2) adding an unknown relation into the collected dream relations of the red buildings, and constructing a relation dictionary in a number + relation mode; labeling sentences by a method of numbers and sentences according to a relational dictionary, wherein the names of people in the sentences are represented by masks and wildcards; dividing the labeled data set file into a training set and a test set according to the proportion of 8:2, and cutting by k-fold to take different parts of the training set as a verification set;
3. building models
(1) Constructing a WBERT model: experiments prove that the text understanding of each layer of BERT is different, so that the BERT model is finely adjusted;
1) the 12-layer transport generated representation of BERT is given a weight, which is initialized as: a isi=Denseunit=1(representi) (where ai denotes the initial weight of the ith layer, Dense denotes the fully-connected layer, presentiRepresents the output of the ith layer, unit 1 represents the final dimensionality reduction of the vector to one dimension,thereby obtaining a1-a12These 12 initialization weights;
2) determining the weight values by training, a1-a12Comparing the 12 initialization weights to obtain the value a with the maximum weight value0
3) A is toi(representi) (where i is not equal to 0, ai represents the weight of the ith layer, presentiRepresenting the output of the ith layer) is maximally pooled by one pooling layer, which is a 3 × 3 × 768 core;
4) a is to0(represent0)(a0Denotes a1-a12Middle maximum weight value, present0Representing the output corresponding to this value) is spliced with the pooled vector;
5) reducing the dimension of the splicing vector obtained in the step 4) to 512 dimensions through a layer of full-connection layer: output ═ senseunit=512(where output represents the final output, density represents the fully-connected layer, and unit 512 represents the final dimensionality reduction of the vector to 512 dimensions);
(2) constructing a named entity recognition model:
1) the input part is WBERT (WBERT is a model obtained by fine-tuning BERT in the description 3 (1)) and the input sequence code and the output of the named entity identification model are spliced, wherein the output of the named entity identification model is converted into a sequence with fixed dimensionality, which has the same length as the input sequence through an argmax function;
2) processing a BIO text (the BIO text is a training set of (1) in the description 2) by WBERT to obtain a word vector code (WBERT is a model obtained by finely adjusting BERTt of (1) in the description 3);
3) inputting the word vectors obtained in the step 2) into CNN and BILSTM in parallel, wherein CNN is used for extracting local features, and BILSTM is used for extracting global features; some local features are more reasonably represented, and some global features are better represented, so that a weight is given to the features extracted by CNN and BILSTM, and the weight is initialized as follows: a isCNN/BILSTM=Denseunit=1(representCNN/BILSTM) (wherein a)CNN/BILSTMDenotes the initial weight of CNN/BILSTM, Dense denotes the fully-connected layer,representCNN/BILSTMRepresenting the output of the CNN/BILSTM layer, and unit 1 representing the final dimensionality reduction of the vector to one dimension);
4) determining weight values by training, using a pooling layer pair a with a kernel size of 3 × 3 × 512CNN=(representCNN) And aBILSTM=(representBILSTM) Respectively performing maximum pooling (wherein aCNN/BILSTMRepresenting the weight, present, after CNN/BILSTM trainingCNN/BILSTMOutput representing the CNN/BILSTM layer);
5) splicing the output of the pooling layer obtained in the step (4);
6) the CRF layer adds some constraints to the last predicted label to ensure that the predicted label is legitimate: when a predicted sequence has a high score (or a maximum probability), the labels corresponding to the output maximum probability values are not taken from all positions, and the transition probability is considered to be added to the maximum, namely, the output rule is met (B-Per cannot be followed by I-Loc; B-Per represents the beginning of a name of a person, followed by the end of the name of the person, and I-Loc represents the end of a name of a place), for example, the most likely sequence of output is (I-L, I-P, O, I-L, I-P; I-L represents the end of the name of the place, I-P represents the end of the name of the person, and 0 represents redundant irrelevant characters), but because the probability of I-O- > I-P in the transition probability matrix is very small or even negative, according to the comprehensive score (probability), such sequences do not get the highest score (probability), i.e., are not the desired sequences; in order to solve the problem, a CRF layer is added after the splicing layer;
(3) constructing a relation extraction model:
1) stitching the input sequence encoding with the output of the named entity recognition model with WBERT: performing WBERT processing on a training data set (the training data set is obtained by (2) in the description 2) to obtain a feature sequence; splicing with the output of the named entity recognition model, wherein the output of the named entity recognition model is converted into a sequence with fixed dimensionality (the named entity recognition model is (2) in the description 3) which has the same length as the input sequence through an argmax function;
2) extracting features by using the dynamic IDCNN, setting the expansion coefficient of the IDCNN as a variable by using the dynamic IDCNN layer, and obtaining the optimal value of the expansion coefficient through training; the IDCNN layer expands the visual field of feature extraction through convolution on the basis of CNN, compared with the CNN layer which is spliced after feature extraction and then pooled, the IDCNN layer does not need pooling operation, and therefore loss of features is reduced; however, the convolution expansion coefficient of IDCNN has different effects when set to different values in different texts: setting the convolution expansion value of the initial IDCNN to 1, i ═ 1 (equivalent to the feature extraction effect of CNN when i is equal to 1); setting circulation: i is i + 1; finding an optimal i value through training;
3) the FC layer splices the local features;
3. construction of dream of Red mansions knowledge map
(1) Extracting characters, places and relations by taking the total fiction txt text of the dream of red mansions as the input of a named entity recognition model and a relation extraction model to obtain a character relation triple;
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; on the basis of the triples, Weights (the Weights are the importance degree of human-object relationship and are determined by the levels of two entities in the triples) are added to form quadruplets:
1) defining the importance of the characters according to the frequency and the entrance and exit, and numbering the characters from 1 to n according to the importance degree (the size of n is determined by the number of the extracted entities);
2) each character is assigned with an importance parameter, and the importance parameters with the serial numbers from 1 to n are respectively n to 1;
3) dividing the characters marked with the serial numbers from 1 to n into levels from 1 to 5 according to the proportion of 1:2:3:4: 5;
4) adding importance parameters of the characters according to the character relationship triples, wherein the importance parameters are respectively added with 5 to 1 size on the basis of the original characters related to the characters with 1 to 5 levels, such as: the importance parameter of the entity 1 is n, a triple related to the entity 1 is found, and if the found triple finds that the entity 1 is related to the entity 2 and the entity 2 belongs to a person at the level 1, the importance parameter of the entity 1 is added with 5 on the basis of n;
5) sorting the characters according to the importance parameters again;
6) repeating the steps (2) to (5) until the serial number of the person is unchanged;
(3) storing the obtained quadruple into an NEO4J graph database, compiling an alignment algorithm, setting a threshold value to be 70%, and performing entity fusion if the similarity is greater than 70% (the similarity threshold value can be set according to the requirement);
5. similarity discrimination
(1) Extracting the characters, the relevant positions and the relations of the characters by using the trained named entity recognition model and the relation extraction model for the novels to be compared (the trained named entity recognition model and the relation extraction model are obtained by inputting data and training the named entity recognition model and the relation extraction model which are constructed in the description 3 (2) and the description 3) respectively);
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; adding Weights on the basis of the triples to form quadruplets (wherein the method for obtaining the entity serial numbers is the same as the process in the step (2) in the power 5, and the Weights are the importance degree of the relationship of the persons and are determined by the levels of the two entities in the triples);
(3) finding out the quadruple where the entity is located, and obtaining the similarity percentage of 0% -100% according to the entity level and the Weights contrast frame relation similarity of all the quadruple where the entity is located.
It is worth to say that the method can be used for detecting the belonged to the dream of the red chamber, and can be applied to detecting the belonged to other novel frames after being slightly modified.
Drawings
The invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, and the following detailed description.
FIG. 1 is a flow chart of a method for judging similarity of a character relationship frame with a dream of Red mansions based on a knowledge graph;
FIG. 2 is a WBERT model diagram;
FIG. 3 is a diagram of a named entity recognition model;
FIG. 4 is a diagram of a relational extraction model.
Detailed Description
Embodiments of the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for judging similarity of a character relationship frame with a dream of Red mansions based on a knowledge graph, which is explained as follows:
a method for judging similarity of a character relationship frame with a dream of Red mansions based on a knowledge graph mainly comprises the following five parts: collecting data, processing the data, constructing a model, constructing a dream of Red mansions knowledge graph and judging the similarity;
1. gathering data
(1) Collecting the people, the relations and the main places of the dream of the red mansions, sorting and integrating the data of multiple sources, checking the missing and the omission, and obtaining relatively comprehensive data capable of expressing the relation frame of the people of the dream of the red mansions;
(2) the method includes the steps of collecting common names, collecting the first character in the character names with high occurrence frequency in the novel, wherein the first character is, for example, solitary defeat, the solitary is not found in the common names, but the occurrence frequency in the novel is high, and only the single character is taken to be added into the common names.
2. Data processing
(1) Adding an unknown relation into the collected dream relations of the red buildings, and constructing a relation dictionary in a number + relation mode; labeling sentences by a method of numbers and sentences according to a relational dictionary, wherein the names of people in the sentences are represented by masks and wildcards; dividing the labeled data set file into a training set and a test set according to the proportion of 8:2, and cutting by k-fold to take different parts of the training set as a verification set;
(2) constructing a character dictionary for the sorted people, places and new people in the dream of red mansions, which specifically comprises the following steps: constructing a character dictionary by using the methods of character + PER label and location + LOC label for the characters and the locations; adding B-PER labels to surnames in the new family names; adding a character dictionary; writing a python code matching character dictionary, and converting the txt file of the whole dream of the red building into a txt file in a standard BIO form; dividing the txt file in the BIO form into a training set and a testing set according to the proportion of 7:3, and cutting by k-fold to take different parts of the training set as a verification set;
3. building models
(1) Constructing a WBERT model: experiments prove that the text understanding of each layer of BERT is different, so that the BERT model is finely adjusted;
1) the 12-layer transport generated representation of BERT is given a weight, which is initialized as: a isi=Denseunit=1(representi) (where ai denotes the initial weight of the ith layer, Dense denotes the fully-connected layer, presentiRepresents the output of the ith layer, and unit 1 represents the final dimensionality reduction of the vector to one dimension, resulting in a1-a12These 12 initialization weights;
2) determining the weight values by training, a1-a12Comparing the 12 initialization weights to obtain the value a with the maximum weight value0
3) A is toi(representi) (where i is not equal to 0, ai represents the weight of the ith layer, presentiRepresenting the output of the ith layer) is maximally pooled by one pooling layer, which is a 3 × 3 × 768 core;
4) a is to0(represent0)(a0Denotes a1-a12Middle maximum weight value, present0Representing the output corresponding to this value) is spliced with the pooled vector;
5) reducing the dimension of the spliced vector obtained in the step 4) to 512 dimensions through a layer of full-connection layer: output ═ senseunit=512(where output represents the final output, density represents the fully-connected layer, and unit 512 represents the final dimensionality reduction of the vector to 512 dimensions);
(2) constructing a named entity recognition model:
1) the input part is WBERT and splices the input sequence code and the output of the named entity identification model, wherein the output of the named entity identification model is converted into a sequence with fixed dimensionality, and the length of the sequence is the same as that of the input sequence through an argmax function;
2) performing WBERT processing on a BIO text (the BIO text is a training set of (2) in the description 2 of the attached drawing 1) to obtain word vector codes (WBERT is obtained by fine-tuning (1) to BERT in the description 3 of the attached drawing 1);
3) inputting the word vectors obtained in the step 2) into CNN and BILSTM in parallel, wherein CNN is used for extracting local features, and BILSTM is used for extracting global features; some local features are more reasonably represented, and some global features are better represented, so that a weight is given to the features extracted by CNN and BILSTM, and the weight is initialized as follows: a isCNN/BILSTM=Denseunit=1(representCNN/BILSTM) (wherein a)CNN/BILSTMDenotes the initial weight of CNN/BILSTM, Dense denotes the full connection layer, presentCNN/BILSTMRepresenting the output of the CNN/BILSTM layer, and unit 1 representing the final dimensionality reduction of the vector to one dimension);
4) determining weight values by training, using a pooling layer pair a with a kernel size of 3 × 3 × 512CNN=(representCNN) And aBILSTM=(representBILSTM) Respectively performing maximum pooling (wherein aCNN/BILSTMRepresenting the weight, present, after CNN/BILSTM trainingCNN/BILSTMOutput representing the CNN/BILSTM layer);
5) splicing the output of the pooling layer obtained in the step 4);
6) the CRF layer adds some constraints to the last predicted label to ensure that the predicted label is legitimate: when a predicted sequence has a high score (or a maximum probability), the label corresponding to the maximum probability value is not selected from all positions, the transition probability is considered to be added to the maximum, and the output rules are met (B-Per cannot be followed by I-Loc; B-Per represents the beginning of a name of a person, followed by the end of the name of the person, and I-Loc represents the end of the name of a place), for example, the most likely sequence of output is (I-L, I-P, O, I-L, I-P; I-L represents the end of the name of the place, I-P represents the end of the name of the person, and 0 represents redundant irrelevant characters), but because the probability of I-O- > I-P in the transition probability matrix is very small or even negative, according to the comprehensive score (probability), such sequences do not get the highest score (probability), i.e., are not the desired sequences; in order to solve the problem, a CRF layer is added after the splicing layer;
(3) constructing a relationship extraction model
1) Stitching the input sequence encoding with the output of the named entity recognition model with WBERT: processing a training data set (the training data set is obtained by (2) in the description 2 of the attached drawing 1, and WBERT is obtained by fine-tuning BERT in (1) in the description 3 of the attached drawing 1) through WBERT to obtain a feature sequence; splicing with the output of the named entity recognition model, wherein the output of the named entity recognition model is converted into a sequence with fixed dimensionality, which has the same length as the input sequence and is obtained through an argmax function;
2) extracting features by using the dynamic IDCNN, setting the expansion coefficient of the IDCNN as a variable by using the dynamic IDCNN layer, and obtaining the optimal value of the expansion coefficient through training; the IDCNN layer expands the visual field of feature extraction through convolution on the basis of CNN, compared with the CNN layer which is spliced after feature extraction and then pooled, the IDCNN layer does not need pooling operation, and therefore loss of features is reduced; however, the convolution expansion coefficient of IDCNN has different effects when set to different values in different texts: setting the convolution expansion value of the initial IDCNN to 1, i ═ 1 (equivalent to the feature extraction effect of CNN when i is equal to 1); setting circulation: i is i + 1; finding an optimal i value through training;
3) the FC layer splices the local features.
4. Construction of dream of Red mansions knowledge map
(1) Taking the text of the full-text novel in the dream of red mansions txt as the input of a named entity recognition model and a relation extraction model to extract people, places and relations, and obtaining a people relation triple (the named entity recognition model and the relation extraction model are (2) and (3) in the description 3 of the attached drawing 1 respectively);
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; on the basis of the triples, Weights (the Weights are the importance degree of human-object relationship and are determined by the levels of two entities in the triples) are added to form quadruplets:
1) defining the importance of the characters according to the frequency and the entrance and exit, and numbering the characters from 1 to n according to the importance degree (the size of n is determined by the number of the extracted entities);
2) each character is assigned with an importance parameter, and the importance parameters with the serial numbers from 1 to n are respectively n to 1;
3) dividing the characters marked with the serial numbers from 1 to n into levels from 1 to 5 according to the proportion of 1:2:3:4: 5;
4) adding importance parameters of the characters according to the character relationship triples, wherein the importance parameters are respectively added with 5 to 1 size on the basis of the original characters related to the characters with 1 to 5 levels, such as: the importance parameter of the entity 1 is n, a triple related to the entity 1 is found, and if the found triple finds that the entity 1 is related to the entity 2 and the entity 2 belongs to a person at the level 1, the importance parameter of the entity 1 is added with 5 on the basis of n;
5) sorting the characters according to the importance parameters again;
6) and repeatedly executing 2) to 5) until the human figure number is unchanged.
(3) And storing the obtained quadruple in an NEO4J graph database, writing an alignment algorithm, setting a threshold value to be 70%, and performing entity fusion when the similarity is greater than 70% (the similarity threshold value can be set according to the requirement).
5. Similarity discrimination
(1) Extracting the characters, the relevant places of the characters and the relations by using the trained named entity recognition model and the relation extraction model for the novels to be compared (the trained named entity recognition model and the relation extraction model are respectively obtained by inputting data and training (2) and (3) in the description 3 of the attached drawing 1);
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; adding Weights on the basis of the triples to form quadruplets (wherein the method for obtaining the entity serial numbers is the same as the process (2) in the 4 in the description of the attached figure 1, and Weights are the importance degree of the character relationship and are determined by the levels of two entities in the triples);
(3) finding out the quadruple where the entity is located, and obtaining the similarity percentage of 0% -100% according to the entity level and the Weights contrast frame relation similarity of all the quadruple where the entity is located.
FIG. 2 is a WBERT model diagram illustrating the improvement to the native BERT model:
constructing a WBERT model: experiments prove that the text understanding of each layer of BERT is different, so that the BERT model is finely adjusted;
1. the 12-layer transport generated representation of BERT is given a weight, which is initialized as: a isi=Denseunit=1(representi) (wherein a)iDenotes the initial weight of the ith layer, Dense denotes the fully connected layer, presentiRepresents the output of the ith layer, and unit 1 represents the final dimensionality reduction of the vector to one dimension, resulting in a1-a12These 12 initialization weights;
2. determining the weight values by training, a1-a12Comparing the 12 initialization weights to obtain the value a with the maximum weight value0
3. A is toi(representi) (where i is not equal to 0, ai represents the weight of the ith layer, presentiRepresenting the output of the ith layer) is maximally pooled by one pooling layer, which is a 3 × 3 × 768 core;
4. a is to0(represent0)(a0Denotes a1-a12Middle maximum weight value, present0Representing the output corresponding to this value) is spliced with the pooled vector;
5. and then reducing the dimension to 512 dimensions through a layer of full connection layer: output ═ senseunit=512((4) get the stitching vector) (where output represents the final output, density represents the fully connected layer, and unit 512 represents the final dimensionality reduction of the vector to 512 dimensions).
FIG. 3 is a diagram of a named entity recognition model illustrating the structure of the named entity recognition model:
1. the input part is WBERT, and the input sequence code and the output of the named entity recognition model are spliced, wherein the output of the named entity recognition model is converted into a sequence with fixed dimensionality (WBERT is the model of the attached figure 2) which has the same length with the input sequence through an argmax function;
2. carrying out WBERT processing on a BIO text (the BIO text is a training set of (2) in the description 2 of the attached drawing 1) to obtain word vector codes (WBERT is a model of the attached drawing 2);
3. inputting the word vectors obtained in step 2 into CNN and BILSTM in parallel, wherein CNN is used for extracting local features, and BILSTM is used for extracting global features; some local features are more reasonably represented, and some global features are better represented, so that a weight is given to the features extracted by CNN and BILSTM, and the weight is initialized as follows: a isCNN/BILSTM=Denseunit=1(representCNN/BILSTM)(wherein a)CNN/BILSTMDenotes the initial weight of CNN/BILSTM, Dense denotes the full connection layer, presentCNN/BILSTMRepresenting the output of the CNN/BILSTM layer, and unit 1 representing the final dimensionality reduction of the vector to one dimension);
4. determining weight values by training, using a pooling layer pair a with a kernel size of 3 × 3 × 512CNN=(representCNN) And aBILSTM=(representBILSTM) Respectively performing maximum pooling (wherein aCNN/BILSTMRepresenting the weight, present, after CNN/BILSTM trainingCNN/BILSTMOutput representing the CNN/BILSTM layer);
5. splicing the output of the pooling layer obtained in the step 4;
6. the CRF layer adds some constraints to the last predicted label to ensure that the predicted label is legitimate: when a predicted sequence has a high score (or a maximum probability), the labels corresponding to the output maximum probability values are not taken from all positions, and the transition probability is considered to be added to the maximum, namely, the output rule is met (B-Per cannot be followed by I-Loc; B-Per represents the beginning of a name of a person, followed by the end of the name of the person, and I-Loc represents the end of a name of a place), for example, the most likely sequence of output is (I-L, I-P, O, I-L, I-P; I-L represents the end of the name of the place, I-P represents the end of the name of the person, and 0 represents redundant irrelevant characters), but because the probability of I-O- > I-P in the transition probability matrix is very small or even negative, according to the comprehensive score (probability), such sequences do not get the highest score (probability), i.e., are not the desired sequences; to solve this problem, a CRF layer is added after the splice layer.
FIG. 4 is a relational extraction model illustrating the structure of the relational extraction model:
1. stitching the input sequence encoding with the output of the named entity recognition model using WBERT (WBERT is the model in fig. 2):
(1) performing WBERT processing on a training data set (the training data set is obtained in (2) in the description 2 of the attached drawing 1) to obtain a feature sequence;
(2) splicing with the output of the named entity recognition model, wherein the output of the named entity recognition model is converted into a sequence with fixed dimensionality, which has the same length as the input sequence and is obtained through an argmax function;
2. extracting features by using the dynamic IDCNN, setting the expansion coefficient of the IDCNN as a variable by using the dynamic IDCNN layer, and obtaining the optimal value of the expansion coefficient through training; the IDCNN layer expands the visual field of feature extraction through convolution on the basis of CNN, compared with the CNN layer which is spliced after feature extraction and then pooled, the IDCNN layer does not need pooling operation, and therefore loss of features is reduced; however, the convolution expansion coefficient of IDCNN has different effects when set to different values in different texts:
(1) setting the convolution expansion value of the initial IDCNN to 1, i ═ 1 (equivalent to the feature extraction effect of CNN when i is equal to 1);
(2) setting circulation: i is i + 1;
(3) and finding an optimal i value through training.
3. The FC layer splices the local features;
the embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: the method for collecting completions of data and self-defining structures of common names and character dictionaries and labeling the data set comprises the following steps:
(1) collecting the people and the relation of the dream of Red mansions and the places related to the people, sorting and integrating the data of multiple sources, and checking for missing and filling leaks to obtain relatively comprehensive data;
(2) the method comprises the steps of collecting common names, collecting the first character of the character name with higher occurrence frequency in the novel, if the common names do not exist, adding the character name into the novel, wherein for example, the single alone is negated, the single alone does not exist in the common names, but the frequency of the character name appearing in the novel is very high, and only the single character is taken to be added into the common names;
(3) constructing a character dictionary for the sorted people, places and new people in the dream of red mansions, which specifically comprises the following steps: constructing a character dictionary by using the methods of character + PER label and location + LOC label for the characters and the locations; adding B-PER labels to surnames in the new family names; adding a character dictionary; writing a python code matching character dictionary, and converting the txt file of the whole dream of the red building into a txt file in a standard BIO form; dividing the txt file in the BIO form into a training set and a testing set according to the proportion of 7:3, and cutting by k-fold to take different parts of the training set as a verification set;
(4) adding an unknown relation into the collected dream relations of the red buildings, and constructing a relation dictionary in a number + relation mode; labeling sentences by a method of numbers and sentences according to a relational dictionary, wherein the names of people in the sentences are represented by masks and wildcards; and dividing the labeled data set files into a training set and a testing set according to the ratio of 8:2, and using k-fold cutting to take different parts of the training set as a verification set.
2. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: improvement of the native BERT model resulted in WBERT:
(1) experiments prove that the text understanding of each layer of BERT is different, so that the BERT model is finely adjusted;
(2) the 12-layer transport generated representation of BERT is given a weight, which is initialized as: a isi=Denseunit=1(representi) (wherein a)iDenotes the initial weight of the ith layer, Dense denotes the fully connected layer, presentiRepresents the output of the ith layer, and unit 1 represents the final dimensionality reduction of the vector to one dimension, resulting in a1-a12These 12 initialization weights;
(3) determining the weight values by training, a1-a12Comparing the 12 initialization weights to obtain the value a with the maximum weight value0
(4) Ai (present)i) (where i is not equal to 0, ai represents the weight of the ith layer, presentiRepresenting the output of the ith layer) are maximally pooled by one pooling layer, which is a 3 × 3 × 768 core;
(5) a is to0(represent0)(a0Denotes a1-a12Middle maximum weight value, present0Representing the output corresponding to this value) is spliced with the pooled vector;
(6) and reducing the dimension of the splicing vector obtained in the step (5) to 512 dimensions through a full connecting layer: output ═ senseunit=512(where output represents the final output, density represents the fully-connected layer, and unit 512 represents the final dimensionality reduction of the vector to 512 dimensions).
3. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: the named entity recognition model consists of four parts: WBERT, BILSTM + CNN, ATTENTION mechanism, CRF layer:
(1) processing a BIO text (the BIO text is a training set of (3) in the power 1) by WBERT to obtain a word vector code (WBERT is obtained by carrying out fine adjustment on BERT in the power 2);
(2) inputting the word vectors obtained in the step (1) into CNN and BILSTM in parallel, wherein CNN is used for extracting local features, and BILSTM is used for extracting global features; some local features are more reasonably represented, and some global features are better represented, so that a weight is given to the features extracted by CNN and BILSTM, and the weight is initialized as follows: a isCNN/BILSTM=Denseunit=1(representCNN/BILSTM) (wherein a)CNN/BILSTMDenotes the initial weight of CNN/BILSTM, Dense denotes the full connection layer, presentCNN/BILSTMRepresenting the output of the CNN/BILSTM layer, and unit 1 representing the final dimensionality reduction of the vector to one dimension);
(3) determining weight values by training, using a pooling layer pair a with a kernel size of 3 × 3 × 512CNN=(representCNN) And aBILSTM=(representBILSTM) Maximum pooling is performed separately (where CNN/BILSTM denotes the weight after CNN/BILSTM training, representCNN/BILSTMOutput representing the CNN/BILSTM layer);
(4) splicing the output of the pooling layer obtained in the step (3);
(5) the CRF layer adds some constraints to the last predicted label to ensure that the predicted label is legitimate: when a predicted sequence has a high score (or a maximum probability), the labels corresponding to the output maximum probability values are not taken from all positions, and the transition probability is considered to be added to the maximum, namely, the output rule is met (B-Per cannot be followed by I-Loc; B-Per represents the beginning of a name of a person, followed by the end of the name of the person, and I-Loc represents the end of a name of a place), for example, the most likely sequence of output is (I-L, I-P, O, I-L, I-P; I-L represents the end of the name of the place, I-P represents the end of the name of the person, and 0 represents redundant irrelevant characters), but because the probability of I-O- > I-P in the transition probability matrix is very small or even negative, according to the comprehensive score (probability), such sequences do not get the highest score (probability), and are not the desired sequences; to solve this problem, a CRF layer is added after the splice layer.
4. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: the relation extraction model consists of three parts: input layer, dynamic IDCNN layer, FC:
(1) the input part is WBERT (WBERT is obtained by trimming BERT in the authority 2) and the output of the input sequence coding and named entity identification module (named entity identification model is a model constructed by the authority 3) is spliced:
1) processing a training data set (the training data set is obtained by (4) in the power 1) through WBERT (the WBERT is obtained by carrying out BERT fine adjustment in the power 2) to obtain a characteristic sequence;
2) splicing with the output of a named entity recognition model (the named entity recognition model is a model constructed by a right 3), wherein the output of the named entity recognition model is converted into a sequence with fixed dimensionality, and the length of the sequence is the same as that of the input sequence through an argmax function;
(2) the expansion coefficient of the IDCNN is set as a variable by the dynamic IDCNN layer, and the optimal value of the expansion coefficient is obtained through training; the IDCNN layer expands the visual field of feature extraction through convolution on the basis of CNN, compared with the CNN layer which is spliced after feature extraction and then pooled, the IDCNN layer does not need pooling operation, and therefore loss of features is reduced; however, the convolution expansion coefficient of IDCNN has different effects when set to different values in different texts:
1) setting the convolution expansion value of the initial IDCNN to 1, i ═ 1 (equivalent to the feature extraction effect of CNN when i is equal to 1);
2) setting circulation: i is i + 1;
3) finding an optimal i value through training;
(3) the FC layer splices the local features.
5. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: constructing a dream of Red mansions knowledge graph:
(1) taking the text of the full-text of the Honghou dream as the input of a named entity identification model and a relation extraction model (the named entity identification model is a model constructed by a power 3; the relation extraction model is a model constructed by a power 4) to extract people, places and relations, and obtaining a person relation triple;
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; on the basis of the triples, Weights (the Weights are the importance degree of human-object relationship and are determined by the levels of two entities in the triples) are added to form quadruplets:
1) defining the importance of the characters according to the frequency and the entrance and exit, and numbering the characters from 1 to n according to the importance degree (the size of n is determined by the number of the extracted entities);
2) each character is assigned with an importance parameter, and the importance parameters with the serial numbers from 1 to n are respectively n to 1;
3) dividing the characters marked with the serial numbers from 1 to n into levels from 1 to 5 according to the proportion of 1:2:3:4: 5;
4) adding importance parameters of the characters according to the character relationship triples, wherein the importance parameters are respectively added with 5 to 1 size on the basis of the original characters related to the characters with 1 to 5 levels, such as: the importance parameter of the entity 1 is n, a triple related to the entity 1 is found, and if the found triple finds that the entity 1 is related to the entity 2 and the entity 2 belongs to a person at the level 1, the importance parameter of the entity 1 is added with 5 on the basis of n;
5) sorting the characters according to the importance parameters again;
6) repeating the steps (2) to (5) until the serial number of the person is unchanged;
and storing the obtained quadruple in an NEO4J graph database, writing an alignment algorithm, setting a threshold value to be 70%, and performing entity fusion when the similarity is greater than 70% (the similarity threshold value can be set according to the requirement).
6. A method for judging similarity of a character relationship framework of a dream of Red mansions based on a knowledge graph is characterized by comprising the following steps: the method for comparing the similarity with the frame of the dream of red mansions comprises the following steps:
(1) extracting the characters, the relevant places of the characters and the relations by using the trained named entity recognition model and the relation extraction model (the trained named entity recognition model and the relation extraction model are obtained by inputting data and training the named entity recognition model and the relation extraction model which are respectively constructed by the right 3 and the right 4);
(2) marking sequence numbers of entities by taking the occurrence frequency and the access degree of the entities as the entity, dividing the entities into 1 to 5 levels, and continuously changing the sequence numbers of the entities according to the character relationship; adding Weights on the basis of the triples to form quadruplets (wherein the method for obtaining the entity serial numbers is the same as the process in the step (2) in the power 5, and the Weights are the importance degree of the relationship of the persons and are determined by the levels of the two entities in the triples);
(3) finding out the quadruple where the entity is located, and obtaining the similarity percentage of 0% -100% according to the entity level and the Weights contrast frame relation similarity of all the quadruple where the entity is located.
CN202011008324.1A 2020-09-23 2020-09-23 Method for judging similarity of red-building dream character relationship frames based on knowledge graph Active CN112101009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011008324.1A CN112101009B (en) 2020-09-23 2020-09-23 Method for judging similarity of red-building dream character relationship frames based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011008324.1A CN112101009B (en) 2020-09-23 2020-09-23 Method for judging similarity of red-building dream character relationship frames based on knowledge graph

Publications (2)

Publication Number Publication Date
CN112101009A true CN112101009A (en) 2020-12-18
CN112101009B CN112101009B (en) 2024-03-26

Family

ID=73755934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011008324.1A Active CN112101009B (en) 2020-09-23 2020-09-23 Method for judging similarity of red-building dream character relationship frames based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112101009B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204970A (en) * 2021-06-07 2021-08-03 吉林大学 BERT-BilSTM-CRF named entity detection model and device
CN113221569A (en) * 2021-05-27 2021-08-06 中国人民解放军军事科学院国防工程研究院工程防护研究所 Method for extracting text information of damage test
CN113220871A (en) * 2021-05-31 2021-08-06 北京语言大学 Literature character relation identification method based on deep learning
CN113535979A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for constructing knowledge graph in mineral field
CN113836943A (en) * 2021-11-25 2021-12-24 中国电子科技集团公司第二十八研究所 Relation extraction method and device based on semantic level
CN114610819A (en) * 2022-03-17 2022-06-10 中科世通亨奇(北京)科技有限公司 Establishment method of character attribute relation extraction database in long text, entity extraction method, device and database

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
WO2019024704A1 (en) * 2017-08-03 2019-02-07 阿里巴巴集团控股有限公司 Entity annotation method, intention recognition method and corresponding devices, and computer storage medium
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110298042A (en) * 2019-06-26 2019-10-01 四川长虹电器股份有限公司 Based on Bilstm-crf and knowledge mapping video display entity recognition method
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110377903A (en) * 2019-06-24 2019-10-25 浙江大学 A kind of Sentence-level entity and relationship combine abstracting method
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
US20200073933A1 (en) * 2018-08-29 2020-03-05 National University Of Defense Technology Multi-triplet extraction method based on entity-relation joint extraction model
CN110969020A (en) * 2019-11-21 2020-04-07 中国人民解放军国防科技大学 CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111125367A (en) * 2019-12-26 2020-05-08 华南理工大学 Multi-character relation extraction method based on multi-level attention mechanism
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019024704A1 (en) * 2017-08-03 2019-02-07 阿里巴巴集团控股有限公司 Entity annotation method, intention recognition method and corresponding devices, and computer storage medium
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
US20200073933A1 (en) * 2018-08-29 2020-03-05 National University Of Defense Technology Multi-triplet extraction method based on entity-relation joint extraction model
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles
CN110377903A (en) * 2019-06-24 2019-10-25 浙江大学 A kind of Sentence-level entity and relationship combine abstracting method
CN110298042A (en) * 2019-06-26 2019-10-01 四川长虹电器股份有限公司 Based on Bilstm-crf and knowledge mapping video display entity recognition method
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN110969020A (en) * 2019-11-21 2020-04-07 中国人民解放军国防科技大学 CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111125367A (en) * 2019-12-26 2020-05-08 华南理工大学 Multi-character relation extraction method based on multi-level attention mechanism
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄培馨;赵翔;方阳;朱慧明;肖卫东;: "融合对抗训练的端到端知识三元组联合抽取", 计算机研究与发展, no. 12, 15 December 2019 (2019-12-15) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221569A (en) * 2021-05-27 2021-08-06 中国人民解放军军事科学院国防工程研究院工程防护研究所 Method for extracting text information of damage test
CN113220871A (en) * 2021-05-31 2021-08-06 北京语言大学 Literature character relation identification method based on deep learning
CN113220871B (en) * 2021-05-31 2023-10-20 山东外国语职业技术大学 Literature character relation recognition method based on deep learning
CN113204970A (en) * 2021-06-07 2021-08-03 吉林大学 BERT-BilSTM-CRF named entity detection model and device
CN113535979A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for constructing knowledge graph in mineral field
CN113836943A (en) * 2021-11-25 2021-12-24 中国电子科技集团公司第二十八研究所 Relation extraction method and device based on semantic level
CN113836943B (en) * 2021-11-25 2022-03-04 中国电子科技集团公司第二十八研究所 Relation extraction method and device based on semantic level
CN114610819A (en) * 2022-03-17 2022-06-10 中科世通亨奇(北京)科技有限公司 Establishment method of character attribute relation extraction database in long text, entity extraction method, device and database

Also Published As

Publication number Publication date
CN112101009B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN112101009A (en) Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN110196906B (en) Deep learning text similarity detection method oriented to financial industry
CN112417153B (en) Text classification method, apparatus, terminal device and readable storage medium
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN114429132A (en) Named entity identification method and device based on mixed lattice self-attention network
CN112100212A (en) Case scenario extraction method based on machine learning and rule matching
CN113065349A (en) Named entity recognition method based on conditional random field
CN112131453A (en) Method, device and storage medium for detecting network bad short text based on BERT
CN114925702A (en) Text similarity recognition method and device, electronic equipment and storage medium
CN115017879A (en) Text comparison method, computer device and computer storage medium
CN114510946A (en) Chinese named entity recognition method and system based on deep neural network
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN112632978A (en) End-to-end-based substation multi-event relation extraction method
CN116680407A (en) Knowledge graph construction method and device
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN113434698B (en) Relation extraction model establishing method based on full-hierarchy attention and application thereof
CN115840815A (en) Automatic abstract generation method based on pointer key information
CN115358227A (en) Open domain relation joint extraction method and system based on phrase enhancement
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
CN113282746B (en) Method for generating variant comment countermeasure text of network media platform
CN114611489A (en) Text logic condition extraction AI model construction method, extraction method and system
CN111431863B (en) Host intrusion detection method based on relational network
CN114943229B (en) Multi-level feature fusion-based software defect named entity identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant