CN112463988A

CN112463988A - Method for extracting Chinese classical garden information

Info

Publication number: CN112463988A
Application number: CN202011450290.1A
Authority: CN
Inventors: 刘耀忠; 黄亦工; 王亚弟; 常少辉; 吕洁; 孙萌; 费晓飞; 谢帅
Original assignee: Beijing Bayi Space Information Engineering Co ltd; Beijing Preparatory Office Of Museum Of Chinese Gardens And Landscape Architecture
Current assignee: Beijing Bayi Space Information Engineering Co ltd; Beijing Preparatory Office Of Museum Of Chinese Gardens And Landscape Architecture
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-03-09

Abstract

The invention provides a method for extracting Chinese classical garden information, which comprises the following steps: 1. calculating a word vector embedding sequence according to input; 2. Bi-LSTM encoding, namely bidirectional long-short term memory encoding, is carried out on the sequence; 3. executing state conversion, judging whether the entity and relationship information are extracted when the state reaches a final state, and ending the process, otherwise, performing the next step according to probability calculation; 4. selecting an entity extraction state transition action or selecting a relationship extraction state transition action; 5. and returning to the step 3 after the execution is finished, and finally obtaining the extracted entities and the extracted relations. The technical scheme of the invention mainly has the following technical advantages: 1. an information extraction algorithm aiming at the knowledge in the field of Chinese classical gardens is proposed for the first time; 2. the utilization rate and the execution efficiency of the information are improved; 3. can be widely applied to national classical gardens.

Description

Method for extracting Chinese classical garden information

Technical Field

The invention relates to the technical field of processing natural language data, information retrieval and database structures thereof, in particular to a method for extracting Chinese classical garden information.

Background

The world that ancient Chinese gardens enjoy the reputation with its exquisite gardening skills and profound cultural connotation is an important component of the traditional Chinese culture. The effective means for protecting and inheriting the digital information is to apply modern information technology to realize digitalization. One important basis for achieving digitization is to achieve the storage of information about the data in a computer. The computer stores a large amount of data such as garden historical archives, videos, pictures, text materials and the like, and the greatest challenge is how to organize massive unstructured data so as to be beneficial to realizing efficient information retrieval. At present, the data storage technology which can support efficient information retrieval belongs to the knowledge graph. Knowledge Graph (KG) stores Knowledge by using Graph structures, and forms a huge semantic network Graph by describing various entities or concepts existing in the real world and the relations thereof, wherein nodes in the Graph represent the entities or concepts, and edges are formed by attributes or relations. Knowledge-graph is usually represented by triples, and the basic form of the triples mainly includes (entity 1-relationship-entity 2) and (entity-attribute value), etc.

The existing famous knowledge maps include knowledge maps established by ***, microsoft, hundredth, dog search and other companies, open Chinese knowledge map (OpenKG) and the like, which store hundreds of millions of orders of magnitude of entities. In the CASIA-KB project of the Chinese academy of sciences, the Baidu encyclopedia and the interactive encyclopedia are extracted to construct the Chinese knowledge map of the Chinese tourist attraction, and the map can be applied to geography, life, entertainment and the like. In the Clinga project of Nanjing university, Chinese Wikipedia is used as a data source, a new geographic ontology is manually constructed, various natural geographic and human geographic entities are classified and automatically linked with the existing knowledge base, and the obtained Chinese geographic knowledge map contains more than 50 ten thousand Chinese geographic entities and can be accessed publicly.

However, the existing knowledge maps do not contain the system knowledge of the Chinese classical gardens through retrieval. The Chinese classical garden knowledge map must be constructed by oneself.

The core of knowledge graph construction is information extraction. Many tools exist to extract information from structured, semi-structured, and unstructured data to obtain knowledge.

D2RQ is a tool for converting a relational database into a virtual RDF database, and comprises three components of D2R Server, D2RQ Engine and D2RQ Mapping Language. But the method is difficult to combine with the knowledge modeling result and map, is difficult to fuse with other types of knowledge, and is difficult to support large-scale data mapping and incremental mapping.

Lixtio and WIE can generate web page wrappers to get knowledge from web page data, but mainly aim at early static page development and need to be extended to support dynamic pages.

The deep dive and the Snorkel provide a set of extraction framework facing specific relation and based on remote supervised learning, the existing knowledge base and rule definition are used for automatically generating linguistic data, the training process of the model is automatically completed, noise and uncertainty are reduced by using a machine learning algorithm, and the learning process is influenced by the rules available to a user so as to improve the result quality. DeepKE is a relation extraction tool developed by Zhejiang university, and uses various deep learning algorithms such as a convolutional neural network, a cyclic neural network, an attention mechanism network, a graph convolutional neural network, a capsule neural network, a language pre-training model and the like. However, deep dive, Snorkel and deep are only used for relationship extraction, and do not provide an extraction function for knowledge of concepts, entities, events, and the like.

The existing knowledge element (entity and relation) extraction technology and method are usually carried out on a data set of limited fields and subjects, and although a good effect is obtained, due to more restriction conditions, the method is not strong enough in expandability and cannot well meet the requirement of extracting the information of the classical gardens in China.

The primary task of knowledge extraction techniques is named entity identification. The prior art generally recognizes three major categories (entity category, time category and number category), seven minor categories (name, organization name, place name, time, date, currency and percentage) named entities in the text to be processed. Most research works are directed to recognizing names of people, places, organizations, proper nouns, etc.

Therefore, the existing knowledge extraction technology can not meet the requirement of building the Chinese classical garden knowledge map.

In summary, aiming at the current situation that the Chinese classical garden knowledge graph does not exist, and the information extraction technology does not meet the requirement of automatically constructing the Chinese classical garden knowledge graph, the invention aims to provide a Chinese classical garden information extraction algorithm and lay a solid foundation for constructing the Chinese classical garden knowledge graph.

The knowledge system of the classical garden knowledge graph mainly comprises core contents of three aspects: the method comprises the steps of classifying concepts, describing concept attributes and defining interrelations among the concepts. The basic form of the knowledge system comprises five different levels of vocabulary, concepts, classification relations, non-classification relations, axioms and the like. Based on a method combining automatic construction and manual construction, a knowledge learning method and technology based on unstructured data, structured data and semi-structured data are researched, entity identification, classification system construction and concept attribute and relation extraction in the field of classical gardens are achieved through a natural language processing tool, and a knowledge system of a classical garden knowledge graph is constructed.

The entity is a basic unit of the knowledge graph and is also an important language unit for bearing information in the text. Entity recognition and analysis are important technologies to support knowledge graph construction and application. The application of an entity identification method and technology based on machine learning in the construction of the classical garden knowledge graph is researched, and a method based on a neural network is mainly researched. Based on deep learning, effective features are automatically captured from the text by utilizing a neural network, and named entity recognition is further completed. The method mainly comprises the following steps: the method mainly comprises the steps of designing and building a neural network model, expressing character symbol characteristics into distributed characteristic information by using the neural network model, and expressing characters and words in a text by using a bidirectional LSTM. Model training: and optimizing network parameters and training a network model by using the labeled data. And optimizing model parameters by using training methods such as random gradient descent and the like, and further training parameters of the whole network. And (3) classifying the models: and classifying the new samples by using the trained model so as to complete entity identification. And performing characteristic representation on the input text by using the bidirectional LSTM, inputting the characteristic representation into a CRF, classifying each word in the sentence, integrally scoring, and finally outputting a classification result to finish entity identification.

The classical garden entity link is used for associating entities in multiple data sources through links so as to better represent semantic association relations among the entities in the different data sources, and further realize multi-source data fusion so as to be used for semantic understanding and semantic analysis in the classical garden artificial intelligence. In various classical garden text data sources, expressions of various entities of four major elements such as mountains and waters, buildings, plants, terrains and the like are diversified, and irregular expressions such as entity abbreviations and the like and context references of the entities are not clear, so that great difficulty is brought to entity linking. According to the adopted correlation calculation method, entity linking methods are mainly divided into two categories, one category is a method based on an entity, and the method mainly utilizes the characteristics of the entity characters to carry out calculation, such as a character string editing distance Jaro-Winkler distance and a Smith-Waterman algorithm; the other is a calculation method based on entity background information, and generally comprises cosine similarity, Jaccard coefficients, a topic model, a word vector, SimRank and a graph structure. Aiming at the concrete reality of the construction of the classical garden knowledge graph, the subject focuses on researching an entity linking method based on multiple knowledge bases, and the entity linking of the multiple knowledge bases is realized by fusing the same entity of different data sources through multi-source entity linking, so that the problem of low coverage of the knowledge graph of a single data source is solved, and the garden data fusion is fundamentally promoted.

How to identify the relationship between entities from the structured or unstructured text is one of the core tasks of knowledge graph construction, and the relationship extraction is one of the important support technologies for text content understanding. The extraction of entity relations is a key link for constructing the knowledge graph of the classical garden. Entity relationship mining can be classified into pattern matching based, semantic dictionary based, feature based and machine learning based methods. The pattern matching is based on the entity recognition result, sentences are used as units, corresponding patterns are formulated according to the sign words, and then the relation between corresponding entities is determined through pattern matching comparison. The dictionary-based method determines entity relationships based on semantic dictionary resources according to associations between entities. The characteristic-based method is that according to the characteristics of entity type, part of speech, position between words, words and parts of speech before and after the entity, and the like, through continuous iteration and aggregation, entity groups (usually two non-homogeneous entities) with the same characteristics are regarded as the same type, and then entity relationship mining is carried out. Machine learning methods are commonly used in the current entity relationship mining, and the idea of the methods is to convert the relationship mining into a classification problem.

The knowledge graph construction is carried out by starting from most original data (comprising structured, semi-structured and unstructured data), extracting knowledge facts from an original database and a third-party database by adopting a series of automatic or semi-automatic technical means, and storing the knowledge facts into a data layer and a mode layer of a knowledge base, wherein the process comprises the following steps: the method comprises four processes of information extraction, knowledge representation, knowledge fusion and knowledge reasoning, and each updating iteration comprises four stages. The knowledge graph mainly has two construction modes of top-down (top-down) and bottom-up (bottom-up). Top-down refers to defining the ontology and data schema for the knowledge graph and then adding the entity to the knowledge base. And the bottom-up method comprises the steps of extracting entities from some open link data, selecting the entities with higher confidence degrees, adding the entities into a knowledge base, and then constructing a top-level ontology mode.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a method for extracting Chinese classical garden information.

In order to achieve the purpose, the invention adopts the following technical scheme: a method for extracting Chinese classical garden information comprises the following steps: 1. calculating a word vector embedding sequence according to input; 2. Bi-LSTM encoding, namely bidirectional long-short term memory encoding, is carried out on the sequence; 3. executing state conversion, judging whether the entity and relationship information are extracted when the state reaches a final state, and ending the process, otherwise, performing the next step according to probability calculation; 4. selecting an entity extraction state transition action or selecting a relationship extraction state transition action; 5. and returning to the step 3 after the execution is finished, and finally obtaining the extracted entities and the extracted relations.

The specific method of input calculation described in step 1 of this patent is as follows:

word vector embedding:

for each input token, vector embedding is calculated using the following equation:

wherein, w_iIs the word vector that is learned and,

is a fixed word vector, V is a concatenated matrix of two vectors,

calculating to obtain a vector embedded sequence:

x＝(x₁,x₂,......,x_i,......x_n)。

the specific method of Bi-LSTM encoding described in step 2 of this patent is as follows:

performing Bi-LSTM encoding on the sequence x obtained in the step (1), namely bidirectional long-short term memory encoding, firstly according to the sequence x₁To x_nIn order of forward LSTM encoding

Then according to the following from x_nTo x₁Sequentially backward LSTM encoding

Each LSTM encoding includes the following six steps:

1) and (3) state training:

for the current input x^tAnd hidden state h passed by last state^t-1The stitching training yields four states, three of which are gated states z^f，zⁱ，z^oAfter the splicing vector is multiplied by a weight matrix, the splicing vector is converted into a value between 0 and 1 through a sigmoid activation function to serve as a gating state, and the other state z is obtained by converting the result into a value between-1 and 1 through a tanh activation function instead of a gating signal;

2) forgetting:

by state z^fPerforming matrix multiplication as forgetting gating to control long-term memory of last state c^t-1Which are left, important, unimportant, calculated as:

z^f⊙c^t-1

3) selecting and memorizing:

by state zⁱAs a gating signal, a matrix multiplication is performed, controlling the state z and thus the input x^tAnd (4) carrying out selection memory, important recording and unimportant short recording, and calculating according to the following formula:

zⁱ⊙z

4) calculating long-term memory:

performing matrix addition on the control results of the last two steps to obtain long-term memory c transmitted to the next state^tCalculated as follows:

5) calculating short-term memory:

passing state z^oControl and c obtained by tanh activation function^tZooming to obtain short-term memory h^tCalculated as follows:

h^t＝z^o⊙tanh(c^t)

6) and (3) outputting:

final passage h^tChange toTo the output y^tCalculated as follows:

y^t＝σ(W’h^t)

forward LSTM encoding

The results are reported as

Backward LSTM encoding

The results are reported as

The two results are concatenated and are recorded as

Shows the Bi-LSTM encoding results.

The specific method for state transition described in step 3 of this patent is as follows:

defining a six-tuple (sigma, delta, E, beta, E, R) representing the state at each moment, wherein sigma is a stack for storing generated entities, delta is a stack for storing entities which are pushed again after being temporarily popped from sigma, E is used for storing partial entity blocks which are being processed, beta is a buffer containing unprocessed words, E is used for storing a generated entity set, and R is used for storing a generated relation set;

the following information extraction task can be expressed as the initial state

To the final state (σ, δ],[]The state transition process of [ E ], R ], wherein [ 2 ]]The indication is that the stack is empty,

representing an empty set;

for the state at time t:

m_t＝max{0,W[s_t；b_t；p_t；e_t；a_t]+d}

calculating the probability by

Predicting the state transition action to be selected at the moment t, switching to the step (4) or the step (5) according to the prediction result, and returning to the step (3) after executing one state transition until the final state is reached;

given an input w, the probability of any reasonable sequence of state transition actions z can be expressed as:

therefore, there are:

and when E and beta in the state six-tuple are empty stacks, the final state is reached, the state conversion is finished, and at the moment, the extracted entities and the extracted relations are respectively in the sets E and R, and the extracted entities and the extracted relations can be used as algorithm execution results to be output.

The state transition actions of the entity extraction in the step 4 of this patent include the following three types:

1) deleting

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: ([ sigma | i ], delta, E, beta, E, R)

After the state conversion is selected and executed, firstly, judging whether the currently processed word j is not in the entity set E and the entity block E being processed is an empty stack and indicates that the currently processed word j is not the target information to be extracted, and deleting the word j from the buffer area beta to be processed;

2) transfer of

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: ([ sigma | i ], delta, [ j | E ], beta, E, R)

After the state conversion is selected and executed, firstly, judging whether the currently processed word j is not in the entity set E but is selected to be further operated, and transferring the j from the buffer area beta to be processed to the entity block E being processed;

3) entity identification

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, [ j | E ], [ beta ], E, R)

The state after conversion is as follows: ([ sigma | i }, delta, [ ], [ j | beta ], EU { j } R)

After the state transition is selected and executed, firstly, if the currently processed word j is not in the entity set E and the entity block E being processed is not an empty stack, marking j and then moving back to the buffer area beta to be processed, and merging the new entity j into the entity set E.

The state transition actions of the relationship extraction in the step 4 of this patent include the following seven types:

1) extracting left-hand relation and popping the end-point entity

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

after the state conversion is selected and executed, judging whether a left-hand relation is found, merging the relation into a relation set R, and popping a relation end point entity i from a generated entity stack sigma;

2) extracting right-hand relationships and transferring end-point entities

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

after the state conversion is selected and executed, judging whether a right-direction relation is found, merging the relation into a relation set R, and transferring a relation end point entity j to a generated entity stack sigma;

3) un-extracted relationships and transferring entities

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: ([ sigma | i | delta | j ], [ ], E, beta, E, R)

After the state conversion is selected and executed, firstly, judging whether the relation is not extracted, and transferring the entity j to the generated entity stack sigma;

4) un-extracted relationships and popping entities off stack

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: (sigma, delta, E, [ j | beta ], E, R)

After the state conversion is selected and executed, judging whether the relation is not extracted, and popping the entity i from the generated entity stack sigma;

5) extracting left-hand relation and putting end-point entity into temporary stack

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

after the state conversion is selected and executed, judging whether a left-hand relation is found or not, merging the relation into a relation set R, popping a relation end point entity i from a generated entity stack sigma, and then stacking the relation end point entity i into a temporary stack delta;

6) extracting right-direction relation and putting starting point entity into temporary stack

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

after the state conversion is selected and executed, judging whether a right-direction relation is found, merging the relation into a relation set R, popping a relation starting point entity i from a generated entity stack sigma, and then stacking the relation starting point entity i into a temporary stack delta;

7) un-extracted relationships and putting entities into temporary stacks

Judging conversion conditions: is free of

The state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: (sigma, [ i | delta ], E, [ j | beta ], E, R)

After the state transition is selected and executed, the entity i is directly popped from the generated entity stack sigma and then is pushed to the temporary stack delta.

The technical scheme of the invention mainly has the following technical advantages.

1. The information extraction algorithm aiming at the knowledge in the field of Chinese classical gardens is put forward for the first time

The method for constructing the knowledge graph is different according to different application fields and different requirements. The existing methods do not aim at the knowledge in the field of Chinese classical gardens aiming at general knowledge and specific field knowledge, can not meet the application requirements and can not be directly applied. The information extraction algorithm which belongs to the knowledge in the field of Chinese classical gardens is not found through retrieval.

The technical scheme of the invention tightly surrounds the Chinese classical garden knowledge, establishes the domain knowledge ontology model, defines entities and relationship types according to the Chinese classical garden concept model, and forms the automatic extraction algorithm of the classical garden knowledge information.

The invention finally designs five algorithm steps, firstly proposes an information extraction algorithm aiming at the knowledge in the field of Chinese classical gardens, and verifies that the method is effective and feasible through an application example.

2. Improving the utilization rate and the execution efficiency of information

In a mainstream model of a word vector + BilSTM + CRF three-layer model and a BERT + BilSTM + CRF three-layer model, an output layer CRF (conditional random field) is a processed sequence labeling problem, entity information and relationship information are difficult to extract simultaneously, and a large amount of useful information directly related between entities and relationships is discarded.

The algorithm of the invention changes the output layer from CRF to state conversion layer, and changes the sequence marking problem into the problem of generating directed graph through state conversion, and makes full use of the associated information between the entities and the relations in the processing process, thereby not only realizing the information extraction of the entities and the relations, but also improving the utilization rate and the execution efficiency of the information.

3. Can be widely applied to national classical gardens

The Chinese classical garden information extraction algorithm is scientific in design, strict in structure and standard in format, and is applied and verified in the construction process of the Chinese classical garden knowledge map. Therefore, the invention can be widely applied to national classical gardens.

Detailed Description

A method for extracting Chinese classical garden information comprises the following steps: 1. calculating a word vector embedding sequence according to input; 2. Bi-LSTM encoding, namely bidirectional long-short term memory encoding, is carried out on the sequence; 3. executing state conversion, judging whether the entity and relationship information are extracted when the state reaches a final state, and ending the process, otherwise, performing the next step according to probability calculation; 4. selecting an entity extraction state transition action or selecting a relationship extraction state transition action; 5. and returning to the step 3 after the execution is finished, and finally obtaining the extracted entities and the extracted relations.

word vector embedding:

wherein, w_iIs the word vector that is learned and,

is a fixed word vector, V is a concatenated matrix of two vectors,

calculating to obtain a vector embedded sequence:

x＝(x₁,x₂,......,x_i,......x_n)。

Then press againAccording to x_nTo x₁Sequentially backward LSTM encoding

Each LSTM encoding includes the following six steps:

1) and (3) state training:

2) forgetting:

z^f⊙c^t-1

3) selecting and memorizing:

zⁱ⊙z

4) calculating long-term memory:

5) calculating short-term memory:

passing state z^oIs controlled bytan h activation function vs. c obtained previously^tZooming to obtain short-term memory h^tCalculated as follows:

h^t＝z^o⊙tanh(c^t)

6) and (3) outputting:

final passage h^tVariation to give an output y^tCalculated as follows:

y^t＝σ(W’h^t)

forward LSTM encoding

The results are reported as

Backward LSTM encoding

The results are reported as

The two results are concatenated and are recorded as

Shows the Bi-LSTM encoding results.

the following information extraction task can be expressed as the initial state

representing an empty set;

for the state at time t:

m_t＝max{0,W[s_t；b_t；p_t；e_t；a_t]+d}

calculating the probability by

therefore, there are:

1) deleting

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: ([ sigma | i ], delta, E, beta, E, R)

2) transfer of

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

3) entity identification

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, [ j | E ], [ beta ], E, R)

1) extracting left-hand relation and popping the end-point entity

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

2) extracting right-hand relationships and transferring end-point entities

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

3) un-extracted relationships and transferring entities

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

4) un-extracted relationships and popping entities off stack

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: (sigma, delta, E, [ j | beta ], E, R)

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows:

7) un-extracted relationships and putting entities into temporary stacks

Judging conversion conditions: is free of

The state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or changes to the equivalent embodiments with equivalent changes, without departing from the technical spirit of the present invention, and any simple modification, equivalent change and change made to the above embodiments according to the technical spirit of the present invention still belong to the protection scope of the technical spirit of the present invention.

Claims

1. A method for extracting Chinese classical garden information comprises the following steps: 1) calculating a word vector embedding sequence according to input; 2) carrying out Bi-LSTM encoding on the sequence, namely bidirectional long-term and short-term memory encoding; 3) executing state conversion, judging that if the state reaches a final state, entity and relationship information are extracted, and ending, otherwise, performing the next step according to probability calculation; 4) selecting an entity extraction state transition action, or selecting a relationship extraction state transition action; 5) and returning to the step 3 after the execution is finished, and finally obtaining the extracted entities and the extracted relations.

2. The method for extracting classical garden information in China according to claim 1, wherein the specific method of input calculation in step 1) of the present patent is as follows:

word vector embedding:

wherein, w_iIs the word vector that is learned and,

is a fixed word vector, V is a concatenated matrix of two vectors,

calculating to obtain a vector embedded sequence:

x＝(x₁,x₂,......,x_i,......x_n)。

3. the method for extracting classical garden information in China according to claim 1, wherein the specific method of Bi-LSTM encoding in step 2) of this patent is as follows:

Each LSTM encoding includes the following six steps:

1) and (3) state training:

2) forgetting:

by state z^fPerforming matrix multiplication as forgetting gatingMethod for controlling the long-term memory of the last state c^t-1Which are left, important, unimportant, calculated as:

z^f⊙c^t-1

3) selecting and memorizing:

zⁱ⊙z

4) calculating long-term memory:

5) calculating short-term memory:

h^t＝z^o⊙tanh(c^t)

6) and (3) outputting:

final passage h^tVariation to give an output y^tCalculated as follows:

y^t＝σ(W’h^t)

forward LSTM encoding

The results are reported as

Backward LSTM encoding

The results are reported as

The two results are concatenated and are recorded as

Shows the Bi-LSTM encoding results.

4. The method for extracting classical garden information in China according to claim 1, wherein the specific method of state transition in step 3) of the present patent is as follows:

the following information extraction task can be expressed as the initial state

representing an empty set;

for the state at time t:

m_t＝max{0,W[s_t；b_t；p_t；e_t；a_t]+d}

calculating the probability by

therefore, there are:

5. The method for extracting classical garden information in China according to claim 1, wherein there are three state transition actions of the entity extraction in the step 4) of this patent:

1) deleting

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

The state after conversion is as follows: ([ sigma | i ], delta, E, beta, E, R)

2) transfer of

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, E, [ j | beta ], E, R)

3) entity identification

Judging conversion conditions:

the state before conversion: ([ sigma | i ], delta, [ j | E ], [ beta ], E, R)

6. The method for extracting classical garden information in China according to claim 1, wherein there are seven following state transition actions for the relationship extraction in step 4) of this patent:

1) extracting left-hand relation and popping the end-point entity

Judging conversion conditions: