CN112487812A - Nested entity identification method and system based on boundary identification - Google Patents

Nested entity identification method and system based on boundary identification Download PDF

Info

Publication number
CN112487812A
CN112487812A CN202011134652.6A CN202011134652A CN112487812A CN 112487812 A CN112487812 A CN 112487812A CN 202011134652 A CN202011134652 A CN 202011134652A CN 112487812 A CN112487812 A CN 112487812A
Authority
CN
China
Prior art keywords
entity
boundary
vector
information
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011134652.6A
Other languages
Chinese (zh)
Other versions
CN112487812B (en
Inventor
姜华
田济东
郦一天
姜晨昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minpu Technology Co ltd
Original Assignee
Shanghai Minpu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minpu Technology Co ltd filed Critical Shanghai Minpu Technology Co ltd
Priority to CN202011134652.6A priority Critical patent/CN112487812B/en
Publication of CN112487812A publication Critical patent/CN112487812A/en
Application granted granted Critical
Publication of CN112487812B publication Critical patent/CN112487812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a nested entity recognition method and a system based on boundary recognition, which are used for preprocessing input text data and converting the preprocessed text data into a multi-dimensional vector; performing feature coding on the obtained multidimensional vector to obtain a coding vector with context information; extracting entity boundary information from the coded vector, then decoding the extracted entity boundary information, identifying the boundary of the entity segment, and obtaining entity boundary information; masking the coding vector by adopting the entity boundary information obtained by identification to obtain an alternative entity fragment vector, and classifying the characteristics of the alternative entity fragment by entity classification decoding to obtain entity classification information; and combining the obtained entity classification information and the entity boundary information to further obtain the nested entity to be extracted. The nested entity recognition method realizes nested entity recognition by flattening the nested structure and a two-layer boundary recognition method, ensures the recognition accuracy and has generalization capability.

Description

Nested entity identification method and system based on boundary identification
Technical Field
The invention relates to the technical field of natural language processing, in particular to a nested entity identification method and a nested entity identification system based on boundary identification, which are used for identifying nested entities in natural language.
Background
The named entity is a basic unit borne by information in natural language, and entity identification is a basic task of natural language, such as information extraction and reading and understanding, so that the deep research of accurate extraction of the entity has great significance in natural language processing.
Generally, a named entity refers to a noun of special significance in the text, such as a person's name (PER), Location (LOC), geographic area (GPE), organization group (ORG), and other proper or special nouns. Conventional entity recognition can be realized through a sequence labeling model (such as a long-short term memory-conditional random field model) in deep learning, and the model can label each semantic unit to obtain a unique label of the semantic unit, and entity fragments are obtained through combining the labels. However, there is a nesting phenomenon in named entity recognition, so that a one-to-one relationship cannot be established between a word and an entity tag. Therefore, the existing mature sequence labeling model cannot be directly applied to the identification of the nested entities.
For the identification of nested entities, there are currently two main types of methods:
one is to identify the nested entities layer by layer according to a certain rule, and the method has three serious defects: 1) errors generated by identifying entities in different levels are accumulated continuously, so that the effect of the model on entity identification is worse and worse as the levels are deepened; 2) the fuzzy of the level definition causes the distribution difference between the entities in the same layer to be extremely large, and the model is difficult to accurately identify; 3) repeated recognition of the same text segment brings unnecessary calculation and increases the calculation cost. These drawbacks have resulted in such methods being impractical.
The other method is to extract the entity by a sequence labeling method after flattening the nested structure by means of external knowledge. These external knowledge, including regularization, calibration rules, etc., is a generalization of the a priori knowledge contained by the entities in the text. However, in practice, the entity distribution and patterns involved in different domains vary, which results in the need to subscribe to different external knowledge for extraction for different data sets. Thus, such methods tend to be significant on a particular data set, without generalization.
Based on the above background, the main contradiction existing in the current nested entity identification lies in how to balance accuracy and generalization capability, namely, how to construct a method with generalization capability on the premise of ensuring the identification accuracy of the nested entity has important significance for the practical application of the nested entity identification.
At present, no explanation or report of the similar technology of the invention is found, and similar data at home and abroad are not collected.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a nested entity identification method and a nested entity identification system based on boundary identification.
The invention is realized by the following technical scheme.
According to one aspect of the invention, a nested entity identification method based on boundary identification is provided, which comprises the following steps:
carrying out data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
performing feature coding on the obtained multidimensional vector to obtain a coding vector with context information;
extracting entity boundary information from the coding vector with the context information, then decoding the extracted entity boundary information, identifying the boundary of the obtained entity segment, and obtaining entity boundary information;
masking the coding vector with the context information by adopting the entity boundary information obtained by identification to obtain alternative entity fragment vectors, and classifying the characteristics of the alternative entity fragments by entity classification decoding to obtain entity classification information;
and combining the obtained entity classification information and the entity boundary information to further obtain the nested entity to be extracted.
Preferably, the data preprocessing of the input text includes: text preprocessing and vector embedding; wherein:
the text preprocessing is used for capturing the internal information of the input text, including word segmentation, part-of-speech tagging, grammar parsing and semantic parsing, and obtaining text segments taking words as units and grammar dependency trees and semantic parsing trees corresponding to the text segments;
the vector embedding is to embed vocabulary, characters, parts of speech, semantics and grammar on the basis of text preprocessing; wherein:
vocabulary embedding is vectorized through a pre-trained language model, including: calling a pre-trained Chinese pre-training model, coding each vocabulary according to an interface provided by the model to be used as the input of the model, and finally obtaining a vocabulary vector through BERT calculation;
the character embedding is realized by a convolutional neural network learning embedding mode, and the method comprises the following steps: randomly initializing a character embedding table, coding each character, obtaining an initial vector through the embedding table, performing convolution on the vector through a convolution neural network, and obtaining a character-level vector by adopting a maximum pooling method;
part-of-speech embedding is obtained by randomly initializing vectors and training, and comprises the following steps: randomly initializing a part-of-speech embedding table, coding each type of part-of-speech, and obtaining a part-of-speech vector through the embedding table;
embedding semantics and grammar, and convolving the semantic parse tree and the grammar dependency tree through a graph convolution network to obtain corresponding semantic vectors and grammar vectors;
the input text is converted into a multi-dimensional vector through text preprocessing and vector embedding.
Preferably, the performing feature coding on the obtained multidimensional vector to obtain a coded vector with context information includes:
and performing linear transformation and nonlinear distortion on the obtained multidimensional vector by using a bidirectional long-time and short-time memory network, wherein the coded vector contains context information, namely the coded vector with the context information.
Preferably, the extracting information related to the entity boundary for the coding vector with the context information, then decoding the extracted information related to the entity boundary, and identifying the boundary where the entity segment is obtained, to obtain the entity boundary information includes:
using a two-level pointer network, the mesh identifies the left and right boundary sets of coded vectors with context information, which are then decoded into corresponding physical boundaries.
Preferably, the two-level pointer network comprises a group sequence pointer network for identifying a left boundary group and an entity sequence pointer network for identifying a right boundary sequence; wherein:
for a group sequence pointer network, the input of the group sequence pointer network is a coding vector e with context information and a left boundary vector o obtained at the last moment, and the coding vector e is subjected to attention operation through the left boundary vector o to obtain non-standardized positioning probability; for time j, the left bounding bit probability is:
Figure BDA0002736268070000031
wherein u isj,iThe left boundary is the non-standardized positioning probability, v and W are trainable parameters, a subscript l represents the left boundary, and a superscript T is a vector transposition symbol;
at this time, the left boundary vector o selected at the j-th timejComprises the following steps:
oj=argmaxi(uj,i);
for the entity sequence pointer network, the input is the coding vector, the right boundary vector obtained at the last moment and the left boundary vector corresponding to the group, the left boundary vector and the corresponding right boundary vector are spliced, and then the attention operation is performed on the coding vector:
Figure BDA0002736268070000032
wherein u isj,k,iProbability of location for right boundary not normalized, belowMarks p, r and k are respectively a right boundary and a corresponding kth left boundary, and a superscript T is a vector transposition symbol;
the right boundary vector finally obtained is oj,k=argmaxi(uj,k,i)。
Preferably, the masking the coding vector with the context information by using the entity boundary information obtained by the identification to obtain the candidate entity fragment vector, and classifying the features of the candidate entity fragment by entity classification decoding to obtain the entity classification information includes:
masking the coding vector with the context information by adopting the entity boundary information obtained by identification to obtain an alternative entity fragment vector, learning the alternative entity fragment vector through a convolutional neural network, and classifying the obtained characteristics to obtain the category of the entity, namely the entity classification information.
Preferably, the method further comprises: the entity boundary information extraction process and the entity classification information extraction process are optimized,
preferably, the optimizing the entity boundary information extraction process and the entity classification information extraction process includes:
and (3) alternately training an entity boundary information extraction process and an entity classification information extraction process by adopting a cross entropy loss function in a recall rate priority mode, so as to realize the optimization of the extraction process.
Preferably, in the process of optimizing the entity classification information extraction process, a null value class and a negative sample are also added; wherein:
the null value class is used for secondary entity screening, so that the accuracy is improved;
the negative examples are used to ensure that tokens of the null class can be learned;
the negative examples are generated by an entity boundary information extraction process.
According to another aspect of the present invention, there is provided a nested entity recognition system based on boundary recognition, including:
a data preprocessing module: carrying out data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
the characteristic coding module is used for carrying out characteristic coding on the multidimensional vector obtained by the data preprocessing module to obtain a coding vector with context information;
the boundary identification decoding module is used for extracting and obtaining entity boundary information by taking the coding vector with the context information obtained by the characteristic coding module as input, then decoding the extracted entity boundary information, identifying and obtaining the boundary of an entity segment, and outputting and obtaining the entity boundary information;
the entity classification decoding module is used for taking the entity boundary information obtained by the boundary identification decoding module and the coding vector with the context information obtained by the characteristic coding module as input, masking the coding vector with the context information by adopting the entity boundary information to obtain an alternative entity fragment vector, classifying the characteristics of the alternative entity fragment through entity classification decoding, and outputting to obtain entity classification information;
and the entity prediction module is used for combining the entity classification information obtained by the boundary identification decoding module and the entity boundary information obtained by the entity classification decoding module so as to obtain the nested entity to be extracted.
Preferably, the system further comprises:
and the model training module is used for respectively optimizing the boundary recognition decoding module and the entity classification decoding module.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform any of the methods described above.
According to a fourth aspect of the invention, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, is operable to perform the method of any of the above.
Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:
1. according to the nested entity identification method and system based on boundary identification, negative effects caused by accumulated errors and entity distribution differences generated by nested structure layering are avoided through a boundary flattening mode (a mode of obtaining a coding vector with context information and obtaining entity boundary information), and the nested entity identification method and system based on boundary identification have good identification capability in the identification of nested entities at different levels;
2. the nested entity identification method and the system based on boundary identification do not need to introduce a regular or other marking rule flattening entity, can be used on different data in different fields, and have strong generalization capability;
3. the nested entity identification method and the nested entity identification system based on the boundary identification provided by the invention bring other additional gains, such as avoiding repeated operation on texts, improving the identification efficiency and the like.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flowchart of a nested entity identification method based on boundary identification in a preferred embodiment of the present invention.
FIG. 2 is a diagram illustrating a boundary identification decoding process according to a preferred embodiment of the present invention;
FIG. 3 is a diagram illustrating an entity classification decoding process according to a preferred embodiment of the present invention.
Detailed Description
The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
The existing layer-by-layer identification of the nested entity introduces a large amount of accumulated errors and calculation cost, the effectiveness of the method is difficult to ensure, and the method has no practicality. Therefore, the idea of effectively solving the identification of the nested entity needs to firstly flatten the nested structure. However, the biggest challenge of this approach is that the flattening of the text data is often dependent on the structure of the data itself, so that a great deal of prior knowledge is needed for assistance, and generalization is difficult.
An embodiment of the invention provides a nested entity identification method based on boundary identification.
The nested entity identification method based on boundary identification provided by the embodiment comprises the following steps:
step 1, performing data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
step 2, performing feature coding on the obtained multidimensional vector to obtain a coded vector with context information;
step 3, extracting entity boundary information from the coded vector with the context information, then decoding the extracted entity boundary information, identifying the boundary of the obtained entity segment, and obtaining entity boundary information;
step 4, masking the coding vector with the context information by adopting the entity boundary information obtained by identification to obtain an alternative entity fragment vector, and classifying the characteristics of the alternative entity fragment by entity classification decoding to obtain entity classification information;
and 5, combining the obtained entity classification information and the entity boundary information to further obtain the nested entity to be extracted.
In this embodiment, the extraction method includes extracting a left boundary to obtain a list, then extracting a corresponding right boundary for each value in the list to obtain a series of lists, then combining each left boundary and each right boundary in the corresponding bounded list into a boundary pair, where the boundary pair represents a nested entity, and then combining the corresponding entity classes into a triple to represent a nested entity.
As a preferred embodiment, in step 1, the data preprocessing is performed on the input text, and includes: text preprocessing and vector embedding; wherein:
text preprocessing, namely capturing the internal information of an input text, including word segmentation, part-of-speech tagging, grammar parsing and semantic parsing, and obtaining a text segment taking a word as a unit and a grammar dependency tree and a semantic parsing tree corresponding to the text segment;
vector embedding, namely embedding vocabulary, characters, parts of speech, semantics and grammar on the basis of text preprocessing; wherein:
vocabulary embedding is vectorized through a pre-trained language model, specifically: calling a pre-trained Chinese pre-training model, coding each vocabulary according to an interface provided by the model to be used as the input of the model, and finally obtaining a vocabulary vector through BERT calculation;
character embedding is realized in a convolutional neural network learning embedding mode, specifically, a character embedding table is initialized randomly, each character is coded, an initial vector is obtained through the embedding table, the vector is convolved through a convolutional neural network, and a character level vector is obtained by adopting a maximum pooling method;
the part-of-speech embedding is obtained by randomly initializing vectors and training, specifically, randomly initializing a part-of-speech embedding table, encoding each type of part-of-speech, and obtaining part-of-speech vectors through the embedding table;
embedding semantics and grammar, and convolving the semantic parse tree and the grammar dependency tree through a graph convolution network to obtain corresponding semantic vectors and grammar vectors;
the input text is converted into a multi-dimensional vector through text preprocessing and vector embedding.
As a preferred embodiment, step 2 comprises:
and performing linear transformation and nonlinear distortion on the obtained multidimensional vector by using a bidirectional long-time and short-time memory network, wherein the coded vector contains context information, namely the coded vector with the context information.
As a preferred embodiment, step 3 comprises:
using a two-level pointer network, the mesh identifies the left and right boundary sets of coded vectors with context information, which are then decoded into corresponding physical boundaries.
As a preferred embodiment, the two-level pointer network comprises a group sequence pointer network for identifying the left boundary group and an entity sequence pointer network for identifying the right boundary sequence; wherein:
for a group sequence pointer network, the input of the group sequence pointer network is a coding vector e with context information and a left boundary vector o obtained at the last moment, and the coding vector e is subjected to attention operation through the left boundary vector o to obtain non-standardized positioning probability; for time j, the left bounding bit probability is:
Figure BDA0002736268070000071
wherein u isj,iThe left boundary is the non-standardized positioning probability, v and W are trainable parameters, a subscript l represents the left boundary, and a superscript T is a vector transposition symbol;
at this time, the left boundary vector o selected at the j-th timejComprises the following steps:
oj=argmaxi(uj,i);
for the entity sequence pointer network, the input is the coding vector, the right boundary vector obtained at the last moment and the left boundary vector corresponding to the group, the left boundary vector and the corresponding right boundary vector are spliced, and then the attention operation is performed on the coding vector:
Figure BDA0002736268070000072
wherein u isj,k,jSubscripts r and k are respectively a right boundary and a corresponding kth left boundary, and superscript T is a vector transposition symbol;
the right boundary vector finally obtained is oj,k=argmaxi(uj,k,i)。
As a preferred embodiment, step 4 comprises:
masking the coding vector with the context information by using the entity boundary information obtained by identification to obtain an alternative entity fragment vector, learning the alternative entity fragment characteristic (the characteristic is a vector which is an alternative entity fragment vector) through a convolutional neural network, and classifying the obtained characteristic to obtain the entity category (one of two tasks in entity identification is to determine which part is an entity firstly and then judge the entity category, such as people, places, organizations and the like), which is the entity classification information.
As a preferred embodiment, the method provided in this embodiment further includes the following steps:
and optimizing the entity boundary information extraction process and the entity classification information extraction process.
As a preferred embodiment, the optimization method comprises the following steps:
and (3) alternately training an entity boundary information extraction process and an entity classification information extraction process by adopting a cross entropy loss function in a recall rate priority mode, so as to realize the optimization of the extraction process.
As a preferred embodiment, in the process of optimizing the entity classification information extraction process, a null value class and a negative sample are also added; wherein:
the null value class is used for secondary entity screening, so that the accuracy is improved;
negative examples are used to ensure that tokens of the null class can be learned;
the negative examples are generated by an entity boundary information extraction process.
The nested entity identification method based on boundary identification provided by the embodiment mainly comprises the following steps: data preprocessing, text feature encoding, boundary recognition decoding, entity classification decoding, process optimization (training), and entity prediction.
In some embodiments of the invention:
the method comprises the steps of comprehensively capturing the internal information of an input text by text preprocessing methods such as word segmentation, part of speech tagging, grammar parsing, semantic parsing and the like; obtaining a distributed representation with rich semantics by means of a pre-training language model; a multi-dimensional vector is obtained.
And coding the obtained distributed representation by using a bidirectional long-time memory network, wherein the coded representation comprises context information. The subsequent boundary identification decoding and entity classification decoding are characterized as input.
And constructing a boundary identification decoding model by using a two-stage pointer network, thereby identifying a left boundary group and a right boundary sequence in a net shape, and then decoding the left boundary group and the right boundary sequence into corresponding entity boundaries.
Masking vectors obtained after feature coding is carried out through the alternative boundaries after the entity boundary decoding, and classifying the candidate entities through a convolution cyclic network, wherein the process is called entity classification decoding.
And alternately training a boundary recognition decoding process and an entity classification decoding process in a recall rate priority mode to realize process optimization.
And multiplexing the process parameters (model parameters) obtained after training, connecting the boundary recognition decoding process (boundary recognition decoding model) and the entity classification decoding process (entity classification decoding model) in a cascading mode, and extracting the nested entities in the text to be detected.
The method for identifying the nested entity based on the boundary identification provided by the embodiment includes the steps that firstly, the entities are grouped according to the left boundary of the entity, and each group of entities is characterized through the right boundary of the entity to obtain a boundary-based two-layer partially-flattened structure, so that the flattening work of the nested structure is realized.
Data preprocessing: the method mainly comprises two steps of text processing and vector embedding, and realizes the vectorization coding process of text data. The method comprises the steps of firstly adopting a basic method in natural language processing to segment a text, labeling, parsing grammar and the like, and then combining features through different encoding embedding modes to obtain a distributed text vector.
Feature coding: the feature extraction further encodes the text on the basis of the distributed text vector, and captures the context information of the text through a circulating network, thereby obtaining an encoding vector containing the context information. The encoded vector is used as input to two decoding processes.
Boundary identification decoding: the method is the core of the whole method, on one hand, boundary identification decoding needs to capture positioning information of an entity through a coding vector; on the other hand, the boundary identification decoding process also needs to realize flattening of the nested structure according to a certain strategy. Finally, the boundary identification decoding process decodes the candidate boundary of the entity
And (3) entity classification decoding: the method is to construct a classifier to classify the candidate boundary after the boundary is identified, further determine whether the candidate entity is a real entity or not and determine the entity class.
Process optimization (training): and optimizing parameters of the processes of feature coding, boundary recognition decoding and entity classification decoding by adopting an ADAM optimizer in deep learning. In the optimization process, the method of priority of recall rate is adopted to effectively reduce the accumulated error generated in the process of connection.
Entity prediction: directly cascading the processes of feature coding, boundary identification decoding and entity classification decoding, and loading the trained process parameters to realize the extraction of the nested entities of the text to be detected.
The technical solution provided by the present embodiment is further described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic workflow diagram of a nested entity identification method based on boundary identification. The method mainly comprises six processes which are respectively as follows: data preprocessing, feature coding, boundary identification decoding, entity classification decoding, process training and entity prediction.
The data preprocessing comprises two sub-processes of text processing and vector embedding. First, text preprocessing comprises the following steps: word segmentation, part of speech tagging, grammar parsing and semantic parsing. Through the above steps, the text preprocessing outputs a text fragment in terms of words, taking "a cameraman has funeral when a us tank attacks a basistein hotel" as an example, and the text processing outputs a text fragment with part of speech tagging "a m/cameraman n/in p/us ns/tank n/attack v/a m/basistein/hotel n/time n/funeral v". In addition, the text preprocessing also outputs a grammar dependency tree and a semantic parse tree corresponding to the text. Second, vector embedding involves embedding of words, characters, parts of speech, semantics, and syntax. Vectorizing the vocabulary through a pre-trained language model; the information of character level can be learned and embedded by a convolution neural network, part of speech embedding can be obtained by randomly initializing vectors and training the process, and semantic and grammar embedding is carried out on a semantic parsing tree and a grammar dependency tree by convolution through a graph convolution network to obtain corresponding semantic vectors and grammar vectors.
Feature encoding further encodes using the preprocessed text vectors, which provide shared context information for both decoding processes.
As shown in fig. 2, a schematic diagram of a feature encoding process is shown on the left side of fig. 2, and a bidirectional recurrent neural network (specifically, a bidirectional long-term and short-term memory model) is used to encode a text word by word to obtain an encoded vector e.
As shown in fig. 2, the right side of fig. 2 shows a schematic diagram of the boundary identification decoding process. According to the structure shown in the figure, the boundary identification decoding adopts two pointer networks to respectively calculate the group sequence based on the left boundary and the intra-group entity sequence based on the right boundary in a mesh mode. Firstly, for the group sequence pointer network input as a coding vector e and a left boundary vector o obtained at the last moment, the unnormalized positioning probability is obtained by performing attention operation on e by o. Thus, for time j, the left bounding bit probability may pass
Figure BDA0002736268070000101
Wherein v and W are trainable parameters. At this time, the left boundary selected at the j-th time is oj=argmaxi(uj,i)(2). Similarly, the entity right boundary sequence is calculated by using the coding vector, the right boundary vector obtained at the previous moment and the left boundary vector corresponding to the group where the right boundary vector is located as input, and compared with the formula (1), the entity sequence pointer network needs to perform attention operation on the coding vector after splicing the left boundary vector and the corresponding right boundary vector:
Figure BDA0002736268070000102
Figure BDA0002736268070000103
the right boundary finally obtained is oj,k=argmaxi(uj,k,i)(4)。
As shown in fig. 3, a schematic diagram of the entity classification decoding process is shown. First, text information x and a candidate boundary y resulting from the boundary recognition decoding process are taken as input. The text information obtains a coding vector e through a data preprocessing process, masking the e through a boundary y to obtain a vector of a relevant segment of the alternative entity, then learning segment characteristics through a convolutional neural network, and then classifying to obtain the category of the entity.
Course training provides a solution to training the entire course. And respectively defining loss functions for the boundary identification part and the entity classification part, wherein the method adopts the cross entropy loss function for learning. During learning, optimization is performed by a stochastic gradient. Since the two processes of boundary identification and entity classification are in a serial form (the output of the process 1 is used as the input of the process 2), the whole nested entity identification process based on boundary identification is comprehensively trained through strategies of alternately training the two processes during training. In addition, to ensure the accuracy of training, in the training phase, the entity classification decoding process needs to add two additional types of operations: 1) the classification model is added with null value classes for secondary entity screening, so that the improvement accuracy is ensured; 2) the artificial addition of 10% negative examples, which can be generated by the boundary recognition decoding process, ensures that the characterization of the null class can be learned.
And the entity prediction is used for identifying the nested entities of the unlabeled text and outputting nested entity fragments and corresponding classification information after identification.
The method provided by the above embodiment of the present invention is further described in detail below with reference to a specific application example.
Taking "Shanghai university of transportation" as an example, the fragment contains two entities: the geographic entity "shanghai" and the organizational entity "shanghai transportation university". In the identification process, firstly, the position 1- "upper" is identified as a left boundary, and then, the position "2-" sea "and the position 6-" student "are identified as a right boundary, and finally, two entities are obtained. The flattening mode only utilizes the inherent attribute of the nested structure, and does not need to extract the prior knowledge of the data, so that the method can be suitable for different data sets in different fields, and the generalization capability of the method is ensured.
Based on the analysis, the nested entity identification method based on the boundary identification solves the following technical problems in a boundary identification mode of flattening the nested structure:
1) flattening the nested structure by the boundary portion;
2) encoding text data;
3) constructing a decoder based on boundary identification;
4) and training a boundary identification decoding model and an entity classification decoding model.
The accuracy of the obtained result is evaluated through the F1 index pair, and compared with the prior art, the accuracy of the obtained result is improved by 1.3 percentage points.
Another embodiment of the present invention provides a nested entity recognition system based on boundary recognition, including:
a data preprocessing module: carrying out data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
the characteristic coding module is used for carrying out characteristic coding on the multidimensional vector obtained by the data preprocessing module to obtain a coding vector with context information;
the boundary identification decoding module is used for extracting and obtaining entity boundary information by taking the coding vector with the context information obtained by the characteristic coding module as input, then decoding the extracted entity boundary information, identifying and obtaining the boundary of an entity segment, and outputting and obtaining the entity boundary information;
the entity classification decoding module is used for taking the entity boundary information obtained by the boundary identification decoding module and the coding vector with the context information obtained by the characteristic coding module as input, masking the coding vector with the context information by adopting the entity boundary information to obtain an alternative entity fragment vector, classifying the characteristics of the alternative entity fragment through entity classification decoding, and outputting to obtain entity classification information;
and the entity prediction module is used for combining the entity classification information obtained by the boundary identification decoding module and the entity boundary information obtained by the entity classification decoding module so as to obtain the nested entity to be extracted.
As a preferred embodiment, the system provided in this embodiment further includes:
and the model training module is used for respectively optimizing the boundary recognition decoding module and the entity classification decoding module.
A third embodiment of the present invention provides a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to execute the method according to any one of the above embodiments of the present invention when executing the program.
Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (RAM), such as a Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.
The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.
A fourth embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any of the above-mentioned embodiments of the invention.
In the method and system for identifying nested entities based on boundary identification provided in the embodiments of the present invention, two layers of extraction pattern matching based on entity boundaries are performed, entities are grouped according to the left boundary, and entity sequences in each group are matched according to the right boundary; coding the text by adopting a recurrent neural network; generating an entity group sequence in sequence by iteration of a pointer network 1 by taking the left boundary generated in the previous step as input; combining the left boundary in each group and the right boundary generated in the previous step as input, and iteratively generating an entity sequence in the group through a pointer network; decoding the two-layer structure to obtain a candidate entity; the entities are classified by a convolutional neural network. According to the method and the system provided by the embodiment of the invention, on the premise of not introducing external knowledge, the nested structure in the nested information is effectively flattened through the simple two-layer structure, the resolving capability of the deep nested structure can be effectively improved on the basis of ensuring the accurate extraction of the shallow information, and the accuracy of the extraction of the deep nested information is ensured. By flattening the nested structure, the nested entity recognition is realized by a two-layer boundary recognition method, and the method has generalization capability while ensuring the recognition accuracy.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (12)

1. A nested entity identification method based on boundary identification is characterized by comprising the following steps:
carrying out data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
performing feature coding on the obtained multidimensional vector to obtain a coding vector with context information;
extracting entity boundary information from the coding vector with the context information, then decoding the extracted entity boundary information, identifying the boundary of the obtained entity segment, and obtaining entity boundary information;
masking the coding vector with the context information by adopting the entity boundary information obtained by identification to obtain alternative entity fragment vectors, and classifying the characteristics of the alternative entity fragments by entity classification decoding to obtain entity classification information;
and combining the obtained entity classification information and the entity boundary information to further obtain the nested entity to be extracted.
2. The boundary recognition-based nested entity recognition method of claim 1, wherein the data preprocessing of the input text comprises: text preprocessing and vector embedding; wherein:
the text preprocessing is used for capturing the internal information of the input text, including word segmentation, part-of-speech tagging, grammar parsing and semantic parsing, and obtaining text segments taking words as units and grammar dependency trees and semantic parsing trees corresponding to the text segments;
the vector embedding is to embed vocabulary, characters, parts of speech, semantics and grammar on the basis of text preprocessing; wherein:
vocabulary embedding is vectorized through a pre-trained language model, including: calling a pre-trained Chinese pre-training model, coding each vocabulary according to an interface provided by the model to be used as the input of the model, and finally obtaining a vocabulary vector through BERT calculation;
the character embedding is realized by a convolutional neural network learning embedding mode, and the method comprises the following steps: randomly initializing a character embedding table, coding each character, obtaining an initial vector through the embedding table, performing convolution on the vector through a convolution neural network, and obtaining a character-level vector by adopting a maximum pooling method;
part-of-speech embedding is obtained by randomly initializing vectors and training, and comprises the following steps: randomly initializing a part-of-speech embedding table, coding each type of part-of-speech, and obtaining a part-of-speech vector through the embedding table;
embedding semantics and grammar, and convolving the semantic parse tree and the grammar dependency tree through a graph convolution network to obtain corresponding semantic vectors and grammar vectors;
the input text is converted into a multi-dimensional vector through text preprocessing and vector embedding.
3. The boundary identification-based nested entity identification method of claim 1, wherein the feature coding the obtained multidimensional vector to obtain a coded vector with context information comprises:
and performing linear transformation and nonlinear distortion on the obtained multidimensional vector by using a bidirectional long-time and short-time memory network, wherein the coded vector contains context information, namely the coded vector with the context information.
4. The method for identifying nested entities based on boundary identification according to claim 1, wherein the extracting entity boundary related information from the coded vector with context information, then decoding the extracted entity boundary related information, and identifying the boundary of the obtained entity segment to obtain entity boundary information comprises:
using a two-level pointer network, the mesh identifies the left and right boundary sets of coded vectors with context information, which are then decoded into corresponding physical boundaries.
5. The boundary identification-based nested entity identification method of claim 4, wherein the two-level pointer network comprises a group sequence pointer network for identifying a left boundary group and an entity sequence pointer network for identifying a right boundary sequence; wherein:
for a group sequence pointer network, the input of the group sequence pointer network is a coding vector e with context information and a left boundary vector o obtained at the last moment, and the coding vector e is subjected to attention operation through the left boundary vector o to obtain non-standardized positioning probability; for time j, the left bounding bit probability is:
Figure FDA0002736268060000021
wherein u isj,iThe left boundary is the non-standardized positioning probability, v and W are trainable parameters, a subscript l represents the left boundary, and a superscript T is a vector transposition symbol;
at this time, the left boundary vector o selected at the j-th timejComprises the following steps:
oj=argmaxi(uj,i);
for the entity sequence pointer network, the input is the coding vector, the right boundary vector obtained at the last moment and the left boundary vector corresponding to the group, the left boundary vector and the corresponding right boundary vector are spliced, and then the attention operation is performed on the coding vector:
Figure FDA0002736268060000022
wherein u isj,k,iSubscripts r and k are respectively a right boundary and a corresponding kth left boundary, and superscript T is a vector transposition symbol;
the right boundary vector finally obtained is oj,k=argmaxi(uj,k,i)。
6. The method of claim 1, wherein the masking a coding vector with context information by using entity boundary information obtained by identification to obtain a candidate entity fragment vector, and classifying features of the candidate entity fragment by entity classification decoding to obtain entity classification information comprises:
masking the coding vector with the context information by adopting the entity boundary information obtained by identification to obtain an alternative entity fragment vector, learning the alternative entity fragment vector through a convolutional neural network, and classifying the obtained characteristics to obtain the category of the entity, namely the entity classification information.
7. A nested entity recognition method based on boundary recognition according to any one of claims 1 to 6, characterized by further comprising: and optimizing the entity boundary information extraction process and the entity classification information extraction process.
8. The method for identifying nested entities based on boundary identification according to claim 7, wherein the optimizing the entity boundary information extraction process and the entity classification information extraction process comprises:
alternately training an entity boundary information extraction process and an entity classification information extraction process by adopting a cross entropy loss function in a recall rate priority mode to realize the optimization of the extraction process; wherein:
in the process of optimizing the entity classification information extraction process, adding a null value class and a negative sample; wherein:
the null value class is used for secondary entity screening, so that the accuracy is improved;
the negative examples are used to ensure that tokens of the null class can be learned;
the negative examples are generated by an entity boundary information extraction process.
9. A nested entity recognition system based on boundary recognition, comprising:
a data preprocessing module: carrying out data preprocessing on an input text, and converting the preprocessed text data into a multi-dimensional vector;
the characteristic coding module is used for carrying out characteristic coding on the multidimensional vector obtained by the data preprocessing module to obtain a coding vector with context information;
the boundary identification decoding module is used for extracting and obtaining entity boundary information by taking the coding vector with the context information obtained by the characteristic coding module as input, then decoding the extracted entity boundary information, identifying and obtaining the boundary of an entity segment, and outputting and obtaining the entity boundary information;
the entity classification decoding module is used for taking the entity boundary information obtained by the boundary identification decoding module and the coding vector with the context information obtained by the characteristic coding module as input, masking the coding vector with the context information by adopting the entity boundary information to obtain an alternative entity fragment vector, classifying the characteristics of the alternative entity fragment through entity classification decoding, and outputting to obtain entity classification information;
and the entity prediction module is used for combining the entity classification information obtained by the boundary identification decoding module and the entity boundary information obtained by the entity classification decoding module so as to obtain the nested entity to be extracted.
10. The boundary recognition-based nested entity recognition system of claim 9, further comprising:
and the model training module is used for respectively optimizing the boundary recognition decoding module and the entity classification decoding module.
11. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, is operative to perform the method of any of claims 1-8.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 8.
CN202011134652.6A 2020-10-21 2020-10-21 Nested entity identification method and system based on boundary identification Active CN112487812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011134652.6A CN112487812B (en) 2020-10-21 2020-10-21 Nested entity identification method and system based on boundary identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011134652.6A CN112487812B (en) 2020-10-21 2020-10-21 Nested entity identification method and system based on boundary identification

Publications (2)

Publication Number Publication Date
CN112487812A true CN112487812A (en) 2021-03-12
CN112487812B CN112487812B (en) 2021-07-06

Family

ID=74926922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011134652.6A Active CN112487812B (en) 2020-10-21 2020-10-21 Nested entity identification method and system based on boundary identification

Country Status (1)

Country Link
CN (1) CN112487812B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861539A (en) * 2021-03-16 2021-05-28 云知声智能科技股份有限公司 Nested named entity recognition method and device, electronic equipment and storage medium
CN112988979A (en) * 2021-04-29 2021-06-18 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable medium and electronic equipment
CN113326701A (en) * 2021-06-17 2021-08-31 广州华多网络科技有限公司 Nested entity recognition method and device, computer equipment and storage medium
CN113569573A (en) * 2021-06-28 2021-10-29 浙江工业大学 Method and system for identifying generalization entity facing financial field
CN113656544A (en) * 2021-08-11 2021-11-16 云知声智能科技股份有限公司 Training method, device, equipment and medium for nested named entity recognition model
CN113688631A (en) * 2021-07-05 2021-11-23 广州大学 Nested named entity recognition method, system, computer and storage medium
CN114462391A (en) * 2022-03-14 2022-05-10 和美(深圳)信息技术股份有限公司 Nested entity identification method and system based on comparative learning
CN116757216A (en) * 2023-08-15 2023-09-15 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644014A (en) * 2017-09-25 2018-01-30 南京安链数据科技有限公司 A kind of name entity recognition method based on two-way LSTM and CRF
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
CN109710946A (en) * 2019-01-15 2019-05-03 福州大学 A kind of joint debate digging system and method based on dependence analytic tree
CN110032737A (en) * 2019-04-10 2019-07-19 贵州大学 A kind of boundary combinations name entity recognition method neural network based
CN110516257A (en) * 2019-08-30 2019-11-29 贵州大学 It is a kind of based on Boundary Recognition and combined judgement document's evidence abstracting method
CN111126040A (en) * 2019-12-26 2020-05-08 贵州大学 Biomedical named entity identification method based on depth boundary combination
CN111274815A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 Method and device for mining entity attention points in text
CN111339750A (en) * 2020-02-24 2020-06-26 网经科技(苏州)有限公司 Spoken language text processing method for removing stop words and predicting sentence boundaries
CN111339268A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Entity word recognition method and device
CN111429889A (en) * 2019-01-08 2020-07-17 百度在线网络技术(北京)有限公司 Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111738006A (en) * 2020-06-22 2020-10-02 苏州大学 Commodity comment named entity recognition-based problem generation method
CN111767409A (en) * 2020-06-14 2020-10-13 南开大学 Entity relationship extraction method based on multi-head self-attention mechanism

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644014A (en) * 2017-09-25 2018-01-30 南京安链数据科技有限公司 A kind of name entity recognition method based on two-way LSTM and CRF
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
CN111429889A (en) * 2019-01-08 2020-07-17 百度在线网络技术(北京)有限公司 Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention
CN109710946A (en) * 2019-01-15 2019-05-03 福州大学 A kind of joint debate digging system and method based on dependence analytic tree
CN110032737A (en) * 2019-04-10 2019-07-19 贵州大学 A kind of boundary combinations name entity recognition method neural network based
CN110516257A (en) * 2019-08-30 2019-11-29 贵州大学 It is a kind of based on Boundary Recognition and combined judgement document's evidence abstracting method
CN111126040A (en) * 2019-12-26 2020-05-08 贵州大学 Biomedical named entity identification method based on depth boundary combination
CN111274815A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 Method and device for mining entity attention points in text
CN111339268A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Entity word recognition method and device
CN111339750A (en) * 2020-02-24 2020-06-26 网经科技(苏州)有限公司 Spoken language text processing method for removing stop words and predicting sentence boundaries
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111767409A (en) * 2020-06-14 2020-10-13 南开大学 Entity relationship extraction method based on multi-head self-attention mechanism
CN111738006A (en) * 2020-06-22 2020-10-02 苏州大学 Commodity comment named entity recognition-based problem generation method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861539A (en) * 2021-03-16 2021-05-28 云知声智能科技股份有限公司 Nested named entity recognition method and device, electronic equipment and storage medium
CN112861539B (en) * 2021-03-16 2023-12-15 云知声智能科技股份有限公司 Nested named entity recognition method, apparatus, electronic device and storage medium
CN112988979A (en) * 2021-04-29 2021-06-18 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable medium and electronic equipment
CN112988979B (en) * 2021-04-29 2021-10-08 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable medium and electronic equipment
CN113326701A (en) * 2021-06-17 2021-08-31 广州华多网络科技有限公司 Nested entity recognition method and device, computer equipment and storage medium
CN113569573A (en) * 2021-06-28 2021-10-29 浙江工业大学 Method and system for identifying generalization entity facing financial field
CN113688631A (en) * 2021-07-05 2021-11-23 广州大学 Nested named entity recognition method, system, computer and storage medium
CN113688631B (en) * 2021-07-05 2023-06-09 广州大学 Nested named entity identification method, system, computer and storage medium
CN113656544A (en) * 2021-08-11 2021-11-16 云知声智能科技股份有限公司 Training method, device, equipment and medium for nested named entity recognition model
CN113656544B (en) * 2021-08-11 2024-03-15 云知声智能科技股份有限公司 Training method, device, equipment and medium for nested named entity recognition model
CN114462391A (en) * 2022-03-14 2022-05-10 和美(深圳)信息技术股份有限公司 Nested entity identification method and system based on comparative learning
CN114462391B (en) * 2022-03-14 2024-05-14 和美(深圳)信息技术股份有限公司 Nested entity identification method and system based on contrast learning
CN116757216A (en) * 2023-08-15 2023-09-15 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment
CN116757216B (en) * 2023-08-15 2023-11-07 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment

Also Published As

Publication number Publication date
CN112487812B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN112487812B (en) Nested entity identification method and system based on boundary identification
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN111476023B (en) Method and device for identifying entity relationship
CN111666588B (en) Emotion differential privacy protection method based on generation countermeasure network
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN112507039A (en) Text understanding method based on external knowledge embedding
CN111782768A (en) Fine-grained entity identification method based on hyperbolic space representation and label text interaction
CN112163092B (en) Entity and relation extraction method, system, device and medium
CN110349229A (en) A kind of Image Description Methods and device
CN114861600A (en) NER-oriented Chinese clinical text data enhancement method and device
CN110992943B (en) Semantic understanding method and system based on word confusion network
Tian et al. Multi-scale hierarchical residual network for dense captioning
CN114091450A (en) Judicial domain relation extraction method and system based on graph convolution network
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
Zhao et al. Aligned visual semantic scene graph for image captioning
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN113609857A (en) Legal named entity identification method and system based on cascade model and data enhancement
CN116680407A (en) Knowledge graph construction method and device
CN116595189A (en) Zero sample relation triplet extraction method and system based on two stages
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN115358227A (en) Open domain relation joint extraction method and system based on phrase enhancement
CN113626574A (en) Information query method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant