CN111414483B

CN111414483B - Document processing device and method

Info

Publication number: CN111414483B
Application number: CN201910009364.9A
Authority: CN
Inventors: 吴山产; 贺一帆; 张琼
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2023-03-28
Anticipated expiration: 2039-01-04
Also published as: CN111414483A

Abstract

The invention discloses a document processing method, which is suitable for searching a target entity forming a preset relationship with a first entity in a document, and comprises the steps of searching a second entity forming the preset relationship with the first entity in the document based on a deep learning model, and searching a third entity forming the preset relationship with the first entity in the document based on a preset rule; and determining a target entity based on the second entity and the third entity. The invention also discloses corresponding document processing equipment, a computing device for realizing the deep learning model and the like.

Description

Document processing device and method

Technical Field

The present invention relates to the field of natural language processing, and more particularly to the field of Slot filling (Slot filling) to determine two entities constituting a specific relationship in a document.

Background

With the development and popularization of internet technology, more and more documents and conversations are stored and used in the network in an electronic form. Natural language processing techniques are increasingly popular for processing documents and dialog content. In the field of natural language processing, the problem of Slot filling (Slot filling) is increasingly emphasized. The problem definition of slot filling is: given a given relationship and one entity in the relationship, how to find another entity in the document repository that forms the given relationship with the entity.

Solving the slot filling problem is of great importance in natural language processing, natural language understanding, and the creation of knowledge bases. If the problem can be solved well, a relational knowledge base between the entities can be established depending on the method for solving the problem, thereby further helping various task targets of natural language processing, natural language understanding and knowledge mining.

In the existing solutions, the classification effect of the schemes using traditional machine learning such as Support Vector Machine (SVM) is poor, while some schemes for deep learning lack sufficient labeled data, so that the recall rate or accuracy of prediction is affected. A single rule system cannot cover all relationship types and some relationship types are difficult to handle with a rule system. The accuracy of a single Query (QA) method is limited.

In other words, the existing solutions mainly comprise two aspects of problems. In a first aspect, a sufficient number of labeled data sets to train on is lacking. On the other hand, an accurate result, i.e., a target entity, cannot be found for the slot filling problem.

Therefore, there is a need for a slot filling scheme that can provide higher accuracy and recall with less labeled data sets.

Disclosure of Invention

To this end, the present invention provides a new solution in an attempt to solve or at least alleviate at least one of the problems presented above.

According to one aspect of the present invention, there is provided a machine learning based computing device adapted to determine whether a first entity and a predetermined entity have a predetermined relationship in a document snippet. The computing device comprises a recurrent neural network processing unit, a word feature vector calculation unit and a word feature vector calculation unit, wherein the word feature vector calculation unit is suitable for receiving word feature vectors of all words in a document fragment and calculating to obtain global word feature vectors corresponding to all the words by utilizing an algorithm based on a recurrent neural network, and the global word feature vectors of all the words represent the influence of the features of all the words in the context of the words in the document fragment on the word features; the attention mechanism processing unit is suitable for receiving a position vector of each word in the document fragment and a global word feature vector output by the recurrent neural network processing unit, calculating a weight vector corresponding to each word by using an attention mechanism-based algorithm, wherein the position vector represents the position relation mapping of each word in the document fragment and a first entity and a predetermined entity, and the first entity and the predetermined entity comprise one or more continuous words in the document fragment; the combination processing unit is suitable for carrying out weighted combination on the global word feature vector output by the recurrent neural network processing unit based on the weight vector of each word so as to generate a document vector; and a classification output unit adapted to determine a probability that the first entity and the predetermined entity have a predetermined relationship based on the document vector to determine whether the first entity and the predetermined entity have the predetermined relationship in the document fragment.

Optionally, the computing apparatus according to the present invention further comprises an entity position embedding processing unit adapted to map the position relation of each word in the document fragment with the first entity and the predetermined entity as a position vector of the word.

Optionally, the computing device according to the present invention further comprises a word embedding processing unit adapted to map respective words in the document segment to said word feature vectors corresponding to the respective words.

Optionally, in the computing apparatus according to the present invention, the recurrent neural network employed in the recurrent neural network processing unit is a long-short term memory (LSTM) network.

Alternatively, in the computing apparatus according to the present invention, the word embedding processing unit determines a Named Entity (NER) and a part of speech (POS) corresponding to each word; mapping each word, named entity and part of speech into a corresponding vector; and combining the individual vectors to generate a word feature vector for each word.

Optionally, the computing device according to the present invention further comprises a multi-layered perceptual processing unit connected between the combination processing unit and the classification output unit.

Optionally, in the computing device according to the present invention, the multi-layered perceptual processing unit is a fully connected processing unit.

According to another aspect of the present invention, there is provided a document processing method adapted to find a target entity in a document, the target entity forming a predetermined relationship with a first entity, comprising the steps of: locating a first entity in a document; extracting predetermined entities within a predetermined distance of the first entity location; determining a document fragment based on the first entity and a predetermined entity; determining, with a computing device according to the present invention, whether the first entity and the predetermined entity have a predetermined relationship in the document fragment; and determining the predetermined entity having the predetermined relationship as the target entity.

According to yet another aspect of the present invention, there is provided a model training method adapted to construct a training set to train the computing device, the method comprising the steps of obtaining a first entity and a target entity constituting a predetermined relationship, and a training document containing the first entity and the target entity; determining whether the first entity and the target entity are both within a predetermined document fragment of the training document; and if so, determining the document fragment comprising the first entity and the target entity as positive training data.

Optionally, the training method according to the present invention further comprises the steps of: selecting a document segment with a preset length by taking a first entity as a center in a document with positive training data; and determining the document fragment as negative training data if the target entity is not contained in the selected document fragment.

According to still another aspect of the present invention, there is provided a document processing method adapted to find a target entity in a document, the target entity forming a predetermined relationship with a first entity, comprising the steps of: searching a second entity forming a preset relation with the first entity in the document by using the document processing method; searching a third entity forming a predetermined relation with the first entity in the document based on a predetermined rule; and determining a target entity based on the second entity and the third entity.

Optionally, in the document processing method according to the present invention, the step of determining the target entity based on the second entity and the third entity includes: if a third entity forming a preset relation with the first entity is found in the document based on a preset rule, determining the third entity as a target entity; and determining the second entity as the target entity if a third entity forming a predetermined relationship with the first entity cannot be found in the document based on the predetermined rule.

Optionally, the document processing method according to the present invention further includes: constructing a query statement based on the first entity and the predetermined relationship, and searching a query result satisfying the query statement in the document as a fourth entity; and determining the fourth entity as the target entity if a second entity forming a predetermined relationship with the first entity cannot be found in the document based on the deep learning model.

According to still another aspect of the present invention, there is provided a document processing apparatus adapted to find a target entity in a document, the target entity forming a predetermined relationship with a first entity, comprising: a deep learning calculation unit, which is suitable for searching a second entity forming a predetermined relation with the first entity in the document by utilizing the calculation device; a rule calculation unit adapted to find a third entity in the document, which forms a predetermined relationship with the first entity, based on a predetermined rule; and a discriminating unit adapted to determine the target entity based on the second entity and the third entity.

Alternatively, in the document processing apparatus according to the present invention, the discrimination unit is adapted to determine the third entity as the target entity when the rule calculation unit outputs the third entity; and determining the second entity output by the deep learning calculation unit as the target entity if the rule calculation unit cannot output the third entity.

Optionally, the document processing apparatus according to the present invention further includes a query calculation unit adapted to compose a query sentence based on the first entity and the predetermined relationship, and to search a query result satisfying the query sentence in the document as a fourth entity; and the discrimination unit is adapted to determine the fourth entity as the target entity when the deep learning calculation unit cannot output the second entity.

According to still another aspect of the present invention, there is also provided a computing device. The computing device includes at least one processor and a memory storing program instructions, wherein the program instructions are configured to be adapted for execution by the at least one processor and include instructions for performing the above-described document processing method or training method.

According to still another aspect of the present invention, there is also provided a readable storage medium storing program instructions, which, when read and executed by a computing device, cause the computing device to execute the above-described document processing method or training method.

In the scheme of the invention, the built deep learning model can carry out slot filling processing across a plurality of sentences, so that the accuracy and the recall rate of the method are obviously improved.

In addition, in the scheme according to the invention, a way of constructing the training set from relatively few labeled data sets is improved, so that the problem of deep learning effectiveness caused by insufficient data of the training set is solved remarkably.

On the other hand, the scheme of the invention fully considers the complementary effect between the deep learning-based method and the rule system-based and machine reading-based method, and fuses different methods, thereby obviously improving the accuracy and the recall rate of solving the slot filling problem.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a document processing system 100 according to one embodiment of the invention;

FIG. 2 shows a schematic diagram of a computing device 200, according to one embodiment of the invention;

FIG. 3 shows a schematic diagram of a computing device 300, according to one embodiment of the invention;

FIG. 4 shows a schematic diagram of a deep learning model 400 according to one embodiment of the invention;

FIG. 5 illustrates a flow diagram of a document processing method 500 according to one embodiment of the invention;

FIG. 6 illustrates a flow diagram of a model training method 600 according to one embodiment of the invention; and

FIG. 7 shows a flow diagram of a document processing method 700 according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

FIG. 1 shows a schematic diagram of a document processing system 100 according to one embodiment of the invention. As shown in FIG. 1, the document processing system 100 includes a processing front end 110 and a document processing device 800.

Processing front end 110 is any requestor that requires the result of a Slot filling (Slot filling) problem. For example, in one approach, the processing front-end 110 may be part of a financial analysis system. The financial analysis system may need to determine some of these relationships from the documents, such as using the financial documents to determine associations between various stakeholders. Thus, the processing front end 110 may send the name of a company, financial document, and share relationship to the document processing device 800 for slot filling processing to obtain the shareholder name.

The processing front end 110 may also be part of a machine reading system. The machine reading system can read the content of the document and analyze the relationship among various entities in the document. In this case, the processing front end 110 may issue a slot fill request to the document processing device 800 for processing.

The invention is not limited to a particular form of processing front-end 110. The document processing device 800 may also receive requests to process the front end 110 in various ways. For example, the document processing device 800 may provide an Application Program Interface (API) with a predetermined format definition to facilitate the processing front end 110 to organize and send slot fill requests to the document processing device 800 according to the definition.

The document processing apparatus 800 receives the request, obtains the first entity to be searched, the predetermined relationship and the target document to be searched from the request, and searches the target document for a target entity forming a predetermined relationship with the first entity, that is, slot filling processing. As described above, in the field of natural language processing and in the subsequent description herein, various entities refer to textual content having a specific meaning, which generally includes one or more words as a basic unit of text.

The document processing apparatus 800 includes a deep learning calculation unit 810, a rule calculation unit 820, and a discrimination unit 830. The document processing apparatus 800 sends the slot filling request to the deep learning calculation unit 810 and the rule calculation unit 820, respectively. The deep learning calculation unit 810 processes the slot filling problem using a calculation method based on a deep learning algorithm. The rule calculation unit 820 processes the slot fill request based on known rules. The discrimination unit 830 receives the processing results output by the deep learning calculation unit 810 and the rule calculation unit 820, and determines a target entity to be finally output based on these processing results.

In addition, the document processing apparatus 800 may further include a query calculation unit 840. The query computation unit 840 also receives slot fill requests and performs the request processing using a query-answer (QA) computation to obtain processing results. In the QA calculation manner, a query statement is first constructed based on a first entity and a predetermined relationship, and then a processing result is obtained by finding a query result satisfying the query statement in a target document.

The decision unit 830 also receives the processing results output by the query calculation unit 840, and determines the last target entity based on the processing results of the three

calculation units

810, 820, and 840.

Considering the inherent properties of the deep learning algorithm, the rule matching approach, and the QA approach, in one embodiment, if the processing result of the rule calculation unit 820 indicates an explicit target entity, the decision unit 830 will take the output of the rule calculation unit 820 as the final target entity; if the processing result of the rule calculation unit 820 does not provide an explicit target entity, the processing result output by the deep learning calculation unit 810 is taken as a final target entity.

Alternatively, when the query calculation unit 840 is further included in the document processing apparatus 800, if the processing result of the deep learning calculation unit 810 indicates that an explicit target entity cannot be provided, the determination unit 830 may determine the candidate target entity output by the query calculation unit as the final target entity.

The deep learning calculation unit 810 includes the calculation device 300. Computing device 300 includes a plurality of processing units. Together, these processing units form a computational module to implement the deep learning model envisioned. The computing device 300 may utilize a deep learning model to determine whether a first entity and a predetermined entity have a predetermined relationship, or a probability of having a predetermined relationship, in a document snippet.

For this purpose, the deep learning calculation unit 810 further needs to select candidate entities in the document, construct a document fragment including the first entity and the candidate entities, and then send the first entity, the candidate entities, and the document fragment to the calculation device 300 for processing before processing by the calculation device 300. The computing unit 810 may also need some post-processing, such as, after the computing device 300 gives the computation results for a document fragment and a candidate entity, continuing to select a new document fragment or a new candidate entity, and continuing to send the new document fragment or the new candidate entity to the computing device 300 for processing, so as to obtain the final output by comprehensively considering all the computation results.

Deep learning computational models have a large number of computational parameters that need to be adjusted by training in order to achieve the best computational results in practical use. Thus, each processing unit in computing device 300 includes a large number of computational parameters awaiting training. As shown in FIG. 1, the document processing system 100 also includes a training corpus 130. The computing device 300 may be trained using corpora in the training corpus 130. The training corpus 130 has already labeled corpora, that is, has specific first entities, target entities, predetermined relationship probabilities, and document fragment contents. The corpora typically include positive corpora and negative corpora. The positive corpuses indicate that the first entity and the target entity form a predetermined relationship, and the negative corpuses indicate that the first entity and the target entity do not form a predetermined relationship. A predetermined proportion of positive and negative corpora needs to be included in the training corpus 130 so that the computing device 300 can be trained more fully.

The specific structures of the respective computing units, computing devices, processing units, and the like mentioned above, and the corresponding processing methods will be described below with reference to the accompanying drawings.

According to an embodiment of the present invention, various components in the document processing system 100 described above, such as various computing units, computing devices, processing units, and the like, may be implemented by the computing apparatus 200 described below. FIG. 2 shows a schematic diagram of a computing device 200, according to one embodiment of the invention.

As shown in FIG. 2, in a basic configuration 202, computing device 200 typically includes system memory 206 and one or more processors 204. A memory bus 208 may be used for communication between the processor 204 and the system memory 206.

Depending on the desired configuration, the processor 204 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 204 may include one or more levels of cache, such as a level one cache 210 and a level two cache 212, a processor core 214, and registers 216. Example processor cores 214 may include Arithmetic Logic Units (ALUs), floating Point Units (FPUs), digital signal processing cores (DSP cores), or any combination thereof. The example memory controller 218 may be used with the processor 204, or in some implementations the memory controller 218 may be an internal part of the processor 204.

Depending on the desired configuration, system memory 206 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 206 may include an operating system 220, one or more applications 222, and program data 224. In some implementations, the application 222 can be arranged to execute instructions on the operating system with the program data 224 by the one or more processors 204.

Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (e.g., output devices 242, peripheral interfaces 244, and communication devices 246) to the basic configuration 202 via the bus/interface controller 230. The example output device 242 includes a graphics processing unit 248 and an audio processing unit 250. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 252. Example peripheral interfaces 244 can include a serial interface controller 254 and a parallel interface controller 256, which can be configured to facilitate communications with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 258. An example communication device 246 may include a network controller 260, which may be arranged to facilitate communications with one or more other computing devices 262 over a network communication link via one or more communication ports 264.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 200 may be implemented as a server, such as a database server, an application server, a WEB server, and the like, or as a personal computer including desktop and notebook computer configurations. Of course, computing device 200 may also be implemented as part of a small-sized portable (or mobile) electronic device.

In an embodiment in accordance with the invention, the computing device 200 is implemented as a document processing device 800 and is configured to perform a document processing method 700 in accordance with the invention. Where application 222 of computing device 200 includes a plurality of program instructions therein that perform document processing method 700 in accordance with the present invention, program data 224 may also store configuration information for document processing system 100, and the like.

FIG. 3 shows a schematic diagram of a computing device 300, according to one embodiment of the invention. The computing device 300 implements a deep learning computing model based on a deep learning algorithm for determining whether a first entity and a target entity have a predetermined relationship in a document fragment. The computing apparatus 300 includes a plurality of processing units, each of which implements a part of functions of the deep-learning computing model, thereby implementing the entire deep-learning computing model.

As shown in fig. 3, the computing device 300 includes an entity location embedding processing unit 310, a word embedding processing unit 320, a recurrent neural network processing unit 330, an attention mechanism processing unit 340, a combination processing unit 350, and a classification output unit 360.

The entity position embedding processing unit 310 and the word embedding processing unit 320 prepare respective word vectors required for subsequent processing, including word feature vectors and position vectors of respective words. The present invention is not limited to the specific form given in the

processing units

310 and 320, and any specific way in which word feature vectors and position vectors of respective words in document segments can be constructed is within the scope of the present invention.

According to one embodiment, entity location embedding processing unit 310 performs a location embedding (embedding) process. The embedding process is mathematically a functional mapping process for mapping information in one space, e.g., discrete information, to vector information in another space. The mapping function employed for embedding is typically a injective function and can preserve structure. The entity position embedding processing unit 310 encodes positions of respective words in the document fragment with respect to the first entity and the predetermined entity, and then maps to respective spatial vectors through an embedding process.

According to one embodiment, it is assumed that the first entity is s and the predetermined entity is o. s is s at the beginning of the document fragment ₁ The end position is s ₂ . Then X = { X for one document fragment ₁ ，x ₂ ，…，x _n For entity s, every word x in the sentence _i There is relative position information corresponding to it:

/>

likewise, for entity o, each word xi centered has relative position information corresponding to it:

each word in the obtained segment

And &>

After the information, the values are embedded and mapped, and the corresponding space vectors of the values can be found in an embedding space.

The word embedding processing unit 320 performs embedding processing on each word in the document segment so as to map to a corresponding spatial feature vector. Generally, a word embedding (word embedding) process constructs a set of features for each word and then performs an embedding mapping on the set of features. The tool bag word2vec provided by *** and other companies provides a plurality of word embedding modes. The present invention is not limited to the specific way of embedding words, and all ways that each word in a document fragment can be embedded to obtain a corresponding space vector are within the scope of the present invention.

In one embodiment, the document snippets may be processed using various natural language processing techniques to obtain a corresponding Named Entity (NER) and part of speech (POS) for each word. For each word, in addition to embedding the word itself, the corresponding NER and POS are also embedded, so as to obtain three different spatial feature vectors. The three spatial feature vectors are combined to form a feature vector corresponding to each word. The combination may comprise, for example, directly connecting the three feature vectors. The present invention is not limited to a particular combination and manner of obtaining NER and POS.

The recurrent neural network processing unit 330 receives the feature vectors of the respective words output from the word embedding processing unit 320, and processes the space vectors one by one using an algorithm of the recurrent neural network to generate a global feature vector corresponding to each word. The word embedding processing unit 320 does not consider the influence of the word surrounding a word, particularly, the words upstream and downstream, on the word characteristics too much. The recurrent neural network processing can apply a word feature vector around a word to the word, so that the newly generated global feature vector already contains the influence factors of the content around the word on the word.

According to one embodiment of the present invention, the recurrent neural network employed in the Recurrent Neural Network (RNN) processing unit 330 is a Long Short Term Memory (LSTM) network. Each node in the LSTM network will typically employ a 3-gate structure to receive a spatial feature vector of a word and the output vector of the last word of the word in the LSTM network node, output a global feature vector characterizing the word and a spatial feature vector to be output to the next LSTM network node.

The present invention is not limited to a specific LSTM network architecture, and other RNN network architectures, such as a bidirectional (B-LSTM) network architecture, may also be employed. All methods that can superimpose the word context word features on the spatial feature vector characterizing the word itself to generate a global feature vector are within the scope of the present invention.

The Attention mechanism processing unit 340 receives the global feature vector of each word output from the Recurrent Neural Network (RNN) processing unit 330 and the spatial position vector characterizing the position with respect to each word with respect to the first entity and the candidate entity output from the entity position embedding processing unit 310, and performs calculation on these vectors using an Attention mechanism (Attention) based network to output a weight vector characterizing the words after considering the influence of the first entity and the candidate entity.

The RNN processing unit 330, particularly when an LSTM network is employed therein, the output vector corresponding to each word contains the characteristic content of the preceding word, while the last output vector contains the characteristic values of all words in the entire document fragment. But the output of the RNN processing unit 330 has not taken into account the influence of the first entity and the candidate entity. With the attention-based network, an output weight vector is output for each word that takes into account the influence of each word's own features, word context features, first entity and candidate entities (i.e., different weight parameters).

According to one embodiment of the invention, attention mechanism processing unit 340 constructs a weight vector output a for each word _i ：

Wherein W _h ，W _q ，W _s ，W _o And v is the parameter to be learned.

The combining processing unit 350 receives the weight vector output a constructed by the attention mechanism processing unit 340 for each word _i And the RNN processing unit 330 performs weighted combination on the feature vector of each word, for the feature vector constructed for each word. According to one embodiment of the present invention, the combination processing unit 350 performs vector combination in the following manner:

the classification output unit 360 is coupled to the combination processing unit 350, and performs classification processing on the weighted document vector output by the combination processing unit 350, so as to determine a probability that the first entity and the target entity have a predetermined relationship in the document fragment. Alternatively, the classification output unit 360 may determine a plurality of relationships for the first entity and the target entity and probabilities of these belonging to these relationships. In this way, flexible determination can be made using the output of the computing apparatus 300.

According to an embodiment of the present invention, the classification output unit 360 may perform the classification process using a Softmax regression algorithm.

Alternatively, the classification output unit 360 cannot directly process the vectors output by the combination processing unit 350, and for this reason, the computing apparatus 300 may deploy a multi-layered perception processing unit 370 between the combination processing unit 350 and the classification output unit 360. The multi-layered perception processing unit 370 implements a multi-layered perception network to reduce the multi-dimensional vector layers of the combination processing unit 350 into the dimensions required by the classification output unit 360. According to one implementation, the multi-tier aware processing unit 370 includes one or more fully connected tiers to construct a multi-tier aware network.

The computing device 300 constructs a deep learning computation model through cooperation of the plurality of computing units to compute the probability that the first entity and the predetermined entity have the predetermined relationship in the document fragment.

FIG. 4 shows a schematic diagram of a deep learning model 400 according to one embodiment of the invention. The deep learning model 400 shown in fig. 4 is constructed by the computing apparatus 300 shown in fig. 3. As shown in fig. 4, the deep learning model 400 includes a position embedding layer 410, a word embedding layer 420, an LSTM layer 430, an attention mechanism layer 440, a combination layer 450, a fully connected layer 460, and a classification output layer 470, which are implemented by the entity position embedding processing unit 310, the word embedding processing unit 320, the recurrent neural network processing unit 330, the attention mechanism processing unit 340, the combination processing unit 350, the classification output unit 360, and the multi-layer perception processing unit 370 in fig. 3, respectively.

Position embedding layer 410 generates a position vector { P } for each word in the document fragment relative to the first entity and the candidate entities ₁ ^s ，P ₂ ^s ，P ₃ ^s ，…，P _n ^s And { P } ₁ ^o ，P ₂ ^o ，P ₃ ^o ，…，P _n ^o }. Word embedding layer 420 generates an embedded combination vector { x } for each word in the document fragment, NER, and POS ₁ ，x ₂ ，x ₃ ，…，x _n }. The LSTM layer 430 receives the vector x of the word embedding layer 420 ₁ ，x ₂ ，x ₃ ，…，x _n And produces a corresponding output h ₁ ，h ₂ ，h ₃ ，…，h _n }：

{h ₁ ，h ₂ ，…，h _n }＝LSTM(T ₁ ，x ₂ ，…，T _n )

Last output h of LSTM layer 430 _n Is also set to q.

Attention mechanism layer 440 receives h output from LSTM layer 430 ₁ ，h ₂ ，h ₃ ，…，h _n Q, P output by position embedding layer 410 ₁ ^s ，P ₂ ^s ，P ₃ ^s ，…，P _n ^s And { P } ₁ ^o ，P ₂ ^o ，P ₃ ^o ，…，P _n ^o Computing a set of weight vectors a using an attention mechanism network ₁ ，a ₂ ，a ₃ ，…，a _n }：

/>

The combining layer 450 utilizes the weighted multidimensional vector { a } output by the attention mechanism layer 440 ₁ ，a ₂ ，a ₃ ，…，a _n H for LSTM layer 430 output ₁ ，h ₂ ，h ₃ ，…，h _n Carry out weighted summation:

the fully connected layer 460 performs multi-layer perceptual processing MLP on z, followed by classification using softmax in the classification output layer 470.

The slot filling problem may be solved using the computing device described in fig. 3 to construct a deep learning model, for example as given in fig. 4.

FIG. 5 shows a flow diagram of a document processing method 500 according to one embodiment of the invention. The document processing method 500 may utilize a deep learning model as shown in FIG. 4 constructed by the computing device described in FIG. 3 to find a target entity in a document that constitutes a predetermined relationship with a first entity.

The document processing method 500 begins at step S510. In step S510, a first entity is first located in a document. Since a document typically contains a plurality of sentences and may contain a plurality of first entities. The slot filling process is typically performed in document fragments of limited length, which may cover only a portion of the entire document. A slot filling process is required for each first entity, so in step S510, a plurality of first entities may be found in the document and the positions of the first entities are located respectively.

Subsequently, in step S520, for the first entity position determined in step S510, it is determined whether there are other entities within a document range of a predetermined distance from the position, and the determined other entities are extracted as predetermined entities. According to one embodiment, the document search range to be extracted may be determined in units of sentences. After the position of the first entity is determined, the position of a sentence where the first entity is located is determined, and then a document search range is searched for by using a predetermined number of sentences around the sentence. According to one embodiment, the predetermined number may be 0, that is, the sentence in which the first entity is located is taken as the document search range. According to another embodiment, the predetermined number may be 1 or 2, i.e., the document search range is 1 or 2 sentences, each left and right, plus the sentence in which the first entity is located.

The predetermined entity may be extracted within the document search scope using any method used in the art. The predetermined entity recognition may be performed, for example, using a Named Entity Recognition (NER) approach in natural language processing. For example, a query from an already defined entity library.

According to the process of step S520, if a predetermined entity cannot be extracted in the document search range, it may return to step S510 to select the next first entity position for processing. If a plurality of predetermined entities are found in step S520, the processing of the subsequent steps may be performed on a predetermined entity-by-predetermined entity basis.

Subsequently, in step S530, a document fragment is determined based on the first entity and the predetermined entity. After the position of the first entity is determined in step S510 and the content and position of the predetermined entity are determined in step S520, the document fragment of the slot filling process is determined in step S530, i.e. how long the document content should be analyzed for the predetermined relationship between the first entity and the predetermined entity. A document fragment that is too short may not find the predetermined entity of the predetermined relationship, while a document fragment that is too long may result in too many predetermined entities satisfying the predetermined relationship, resulting in too low an accuracy of the method.

According to one embodiment of the present invention, document snippets are determined in sentence units. Namely, the sentence where the first entity and the predetermined entity are located is obtained first. A predetermined number of sentences are expanded at both ends of the sentence where the first entity and the candidate entity are located. This predetermined number is for example 1, i.e. a sentence is expanded in two segments each. Then, the document content covered by the expanded sentence is taken as a document fragment. I.e. the document fragment comprises the expanded sentence, the sentence in which the first entity and the predetermined entity are located and the sentence between the first entity and the predetermined entity.

In addition, according to an embodiment of the present invention, the total length of the document fragment is not limited to only the number of included sentences, but also the total number of included words. If the number of words contained in the document fragment exceeds a predetermined number, the range of the document fragment can be reduced by reducing the expanded sentences or the like.

Subsequently, in step S540, the first entity, the predetermined entity and the document fragment determined in the above steps are transmitted to the computing device 300 to determine whether the first entity and the predetermined entity have a predetermined relationship in the document fragment based on the deep learning algorithm.

As described above, the computing device 300 may provide various relationships that the first entity has with the predetermined entity and probabilities of having those relationships. A first entity and a predetermined entity may be considered to have a predetermined relationship if some of the relationships happens to be a predetermined relationship and has a high probability (e.g., more than 50%).

Next, in step S550, if it is determined in step S540 that a predetermined entity and the first entity have a predetermined relationship, the predetermined entity is determined as a target entity.

Since it is possible to locate the positions of a plurality of first entities in step S510 and a plurality of predetermined entities are extracted near a certain first entity position in step S520, it is also possible to return to step S520 to perform the processes of steps S530 and S540 on other predetermined entities at a first entity position after determining a certain predetermined entity in step S550; after all the predetermined entities at a certain first entity location have been processed, the process returns to step S510 to select the next first entity location for processing.

It is possible to determine that a plurality of target entities and the first entity form a predetermined relationship in one document, and according to one embodiment of the present invention, all of the plurality of target entities may be returned as a result of the method processing, or the target entity with the highest probability may be returned as a result. The present invention is not limited thereto, and all ways of returning the target entity as the processing result are within the scope of the present invention.

Optionally, the respective relationships determined in step S540 do not directly indicate the predetermined relationship, and for this reason, step S540 further includes step S542 and step S544. In step S542, the computing device 300 is utilized to determine the various relationships the first entity and the predetermined entity have in the document snippet and the probabilities of having those relationships. Subsequently, in step S544, a relationship identification process is performed, i.e., it is determined that the first entity and the predetermined entity have the predetermined relationship based on the association between these relationships and the predetermined relationship.

The processing in step S542 is similar to that described previously for step S540 and is not described again here.

The relationship identification process in step S544 may be performed in a variety of ways. According to one embodiment, in step S544, each approximate relationship similar or identical to the predetermined relationship and the probability thereof are indicated, and each inverse relationship not belonging to the predetermined relationship and the probability thereof are indicated. If the sum of probabilities of the approximate relationship exceeds the sum of probabilities of the inverse relationship, it is determined that the first entity and the predetermined entity have the predetermined relationship.

Specifically, assuming that the relationship we want to determine is (employment or membership in an organization, i.e., org: employees _ or _ members), step S542 outputs a plurality of relationships and their probabilities. In step S544, only two of the categories are of interest: there is no probability of relationship (no-relationship) and employment or membership (org: employee _ or _ members), and as long as the probability of org: employee _ or _ members exceeds the probability of no-relationship, the first entity and the predetermined entity are considered to have a predetermined relationship.

In addition, according to another embodiment, in step S544, it may be determined whether the first entity and the predetermined entity have the predetermined relationship according to the inclusion relationship between the respective relationship and the predetermined relationship. For example, a relationship (org: top _ members _ articles) indicating excellent employees should be included in the relationship (org: members _ articles) indicating employees.

According to another embodiment, in step S544, it may also be determined whether the first entity and the predetermined entity have the predetermined relationship according to the directionality between the respective relationship and the predetermined relationship. For example, if A is the parent of B, then B is the child of A. Thus, if the relationship output at step S542 indicates that the first entity and the predetermined entity are child relationships, it may also be determined that there is a parent relationship between the predetermined entity and the first entity.

Using the document processing method 500, the slot filling problem can be handled using a deep learning based algorithm. Because the deep learning mode needs a large amount of training data to carry out parameter training, the more complete the training data is, the better the deep learning effect is. How to efficiently provide training data for deep learning models is another problem to be solved. FIG. 6 illustrates a flowchart of a model training method 600 for training the computing device 300, according to one embodiment of the invention.

The computing apparatus 300 constructs a deep learning computation model using a plurality of processing units, trains the computing apparatus 300, that is, trains each processing unit in the computing apparatus 300, the entity position embedding processing unit 310, the word embedding processing unit 320, the recurrent neural network processing unit 330, the attention mechanism processing unit 340, the combination processing unit 350, the classification output unit 360, and the multi-layer perception processing unit 370, so as to update each parameter value in these processing units, thereby providing a result of high accuracy and recall rate for the slot filling process.

The training method 600 begins at step S610. In step S610, a training document that has been labeled is first obtained. The labeled training document generally includes a first entity, a target entity, and a predetermined relationship formed.

The training document is typically the entire document, and the computing device 300 performs result prediction on the document segments, so the training method 600 proceeds to step S620. In step S620, the positions of the first entities and the target entity in the training document are determined, respectively, and then it is determined whether the target entity is contained within a predetermined document segment for each first entity.

The document snippets may be determined in the manner described above in steps S520 and S530, i.e. for the first entity, the predetermined search scope is first determined. According to one embodiment, the document search range to be extracted may be determined in units of sentences. After the position of the first entity is determined, the position of a sentence where the first entity is located is determined, and then a document search range is searched for with a predetermined number of sentences around the sentence. According to one embodiment, the predetermined number may be 0, i.e. the sentence where the first entity is located is taken as the document search range. According to another embodiment, the predetermined number may be 1 or 2, i.e., the document search range is 1 or 2 sentences on each of the left and right sides plus the sentence in which the first entity is located.

Subsequently, it is determined whether a target entity exists within the determined search range. If the target entity does not exist within the search range, a next first entity is selected for searching.

If the target entity is included within the search range, a document fragment including the first entity and the target entity is constructed, i.e., a sentence expanded by a predetermined number at both ends of the sentence in which the first entity and the target entity are located. This predetermined number is for example 1, i.e. a sentence is expanded in two segments each. Then, the document content covered by the expanded sentence is taken as a document fragment. I.e. the document fragment comprises the expanded sentence, the sentence in which the first entity and the target entity are located and the sentence between the first entity and the target entity.

Subsequently, in step S630, the document fragment found in step S620, which includes the first entity and the target entity, is determined as positive training data.

In addition to constructing positive training data, optionally, the training method 600 should also include the step of constructing negative training data. To this end, the training method 600 may further include step S640. In step S640, for each first entity position in the training document, it is determined whether there are other entities within a search document range of a predetermined distance from the position, and each determined other entity is extracted. According to one embodiment, the document search range to be extracted may be determined in units of sentences. After the position of the first entity is determined, the position of a sentence where the first entity is located is determined, and then a document search range is searched for with a predetermined number of sentences around the sentence. According to one embodiment, the predetermined number may be 0, that is, the sentence in which the first entity is located is taken as the document search range. According to another embodiment, the predetermined number may be 1 or 2, i.e., the document search range is 1 or 2 sentences on each of the left and right sides plus the sentence in which the first entity is located.

Other entities may be extracted within the scope of the document search using any method used in the art. Other entity recognition may be performed, for example, using Named Entity Recognition (NER) in natural language processing. For example, a query from an already defined entity library.

Subsequently, in step S650, if the target entity is different from the extracted other entities, a document fragment is determined based on the first entity and the other entities, and the document fragment is determined as negative training data.

Document snippets may be determined for the first entity and the other entities in a similar manner as determining document snippets for the first entity and the target entity in step S620, i.e., expanding a predetermined number of sentences at both ends of the sentence in which the first entity and the other entities are located. This predetermined number is for example 1, i.e. a sentence is expanded in two segments each. Then, the document content covered by the expanded sentence is taken as a document fragment. I.e. the document fragment comprises the expanded sentence, the sentences in which the first entity and the other entities are located and the sentences between the first entity and the other entities.

The processes of steps S640 and S650 may be repeatedly performed until the obtained positive and negative corpus numbers satisfy the designated ratio.

After a sufficient number of positive and negative corpora are obtained, the computing device 300 may be trained using these corpora. There are many ways to train deep learning models in the prior art, and the present invention is not limited to the specific way to train models, and all of these ways of training are within the scope of the present invention.

With the document processing method 600 described above, a sufficient number of training corpora may be generated using a limited number of training documents, thereby significantly addressing the requirement that the computing device 300 requires a large amount of training data for training.

The invention also provides a solution for comprehensively considering various slot filling problems including a deep learning model. FIG. 7 shows a flow diagram of a document processing method 700 according to one embodiment of the invention. Document processing method 700 is used to find a target entity in a document that forms a predetermined relationship with a first entity.

As shown in FIG. 7, the document processing method 700 begins at step S710. In step S710, a first candidate entity that forms a predetermined relationship with the first entity is found in the document as described above with reference to fig. 5 for method 500, i.e., a deep learning model-based method. The document processing method 500 based on deep learning has been described in detail above, and will not be described in detail here. The document processing method 500 is also suitable for execution by the deep learning computation unit 810 of FIG. 1.

Next, in step S720, a second candidate entity that constitutes a predetermined relationship with the first entity is found in the document based on a predetermined rule. Step S720 may be performed by the rule calculation unit 820 in fig. 1.

To perform the slot filling process based on a predetermined rule, a rule base needs to be first constructed. One way to construct a rule base is to construct based on world knowledge and linguistic knowledge. Some relationships may be inferred directly from world knowledge. For example, to infer the religious context relationship (org) of an organization, if a common religious name appears in the organization name, the corresponding relationship may be extracted directly. Another way to construct a rule base is through linguistic knowledge inference. For example, the relationship (org/per: alternate _ names) to find the name of an organization's replacement can be inferred directly by reference resolution. In this way, a rule base based on world knowledge and linguistic knowledge is obtained.

According to another embodiment, rules may be constructed from syntax information. For example, for statements, a dependency parse tree may be constructed first. A dependency parse tree is a syntactic tree that captures the interrelationships between words. Subsequently, a dependent path, i.e., the shortest path between two entities on the above-constructed dependency analysis tree, is obtained, and the shortest path is determined as an extraction rule of the relationship between the entities.

Next, in step S730, a target entity is determined based on the first candidate entity and the second candidate entity. Step S730 may be performed by the discrimination unit 830 in fig. 1.

Since the slot filling process using the predetermined rule generally has a high accuracy and a low recall rate in step S720. According to one implementation, in step S730, if the process of step S720 provides an explicit second candidate entity, the second candidate entity is determined as the target entity, and conversely, if the process of step S720 cannot provide the second candidate entity, the first candidate entity is determined as the target entity.

Optionally, the document processing method 700 also contemplates a slot fill solution based on the QA approach. To this end, the document processing method 700 further includes step S740. In step S740, a query-response (QA) is used to find a third candidate entity in the document that constitutes a predetermined relationship with the first entity.

In the query method, a corresponding template is first created for each relationship, and then the entity 1 in the template is partially replaced by the first entity, so as to generate a specific question. For example, a type template for finding the birth date relationship can be created as follows:

per:date_of_birth＝＝>what date is the birthday of[S]

the answer to the question is then looked up in the target document. The answer found by the QA method is a prediction of the slot filling problem.

When the method 700 includes step S740, the processing in step S730 may also consider the output result of step S740, i.e. if step S710 cannot output an explicit first candidate entity, the third candidate entity obtained in step S740 is determined as the target entity.

Further alternatively, according to an embodiment of the present invention, in step S710, in consideration of the characteristics of the deep learning calculation model, a plurality of calculation devices 300 having different configurations may be constructed, and the slot filling process is performed in the method S500 using these calculation devices 300 having different configurations, respectively, to obtain different processing results.

Differently configured computing devices 300 may be produced by constructing different configurations of document snippets. As indicated above, these configuration parameters include: the number of furthest spaced sentences between the first entity and the candidate entity; the number of sentences supplemented at two ends of the sentence where the first entity and the candidate entity are located; and the maximum number of words included in the divided document fragment, etc.

After different processing results are obtained, these processing solutions may be integrated (ensemble) to generate a first candidate entity. There are a variety of ways to integrate, including, for example, but not limited to, averaging, weighted averaging, voting, and the like.

It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor with the necessary instructions for carrying out the method or the method elements thus forms a device for carrying out the method or the method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A computing device based on machine learning, the computing device comprising:

the cyclic neural network processing unit is suitable for receiving the word feature vectors of all words in the document segments and calculating to obtain global word feature vectors corresponding to all the words by utilizing an algorithm based on the cyclic neural network, wherein the global word feature vectors of all the words represent the influence of the features of all the words in the word context in the document segments on the word features;

an attention mechanism processing unit adapted to receive a position vector of each word in the document fragment and a global word feature vector output by the recurrent neural network processing unit, calculate a weight vector corresponding to each word by using an attention mechanism-based algorithm, wherein the position vector represents a positional relationship mapping of each word in the document fragment with a first entity and a predetermined entity, and the first entity and the predetermined entity respectively contain one or more words that are continuous in the document fragment, the attention mechanism algorithm calculates a weight vector of each word, and takes a ratio of a specific operation result of the weight vector of the word to a sum of specific operation results of the weight vectors of all words as a weight vector corresponding to the word, the weight vector of each word is obtained by weighting a sum of the global feature vector of the word output by the recurrent neural network processing unit, a last global word feature vector output by the recurrent neural network processing unit, a position vector of the word relative to the first entity, and a position vector of the word relative to the predetermined entity, and a parameter to be learned;

the combination processing unit is suitable for performing weighted combination on the global word feature vectors output by the recurrent neural network processing unit based on the weight vectors of the words so as to generate document vectors; and

a classification output unit adapted to determine a probability of having a predetermined relationship between the first entity and the predetermined entity based on the document vector to determine whether the first entity and the predetermined entity have the predetermined relationship in the document fragment.

2. The computing device of claim 1, further comprising:

an entity position embedding processing unit, adapted to map the position relationship between each word in the document segment and the first entity and the predetermined entity as the position vector;

and the word embedding processing unit is suitable for mapping each word in the document segment into the word feature vector corresponding to each word.

3. The computing device of claim 2, wherein the word embedding processing unit is adapted to:

determining a Named Entity (NER) and a part of speech (POS) corresponding to each word;

mapping each word, named entity and part of speech into a corresponding vector; and

combining the vectors to generate the word feature vector.

4. The computing device of any of claims 1-3, further comprising:

and the multi-layer perception processing unit is connected between the combination processing unit and the classification output unit.

5. The computing device of claim 4, wherein the multi-tier aware processing unit is a fully connected processing unit.

6. The computing device of claim 1, wherein the recurrent neural network employed in the recurrent neural network processing unit is a Long Short Term Memory (LSTM) network.

7. A method of document processing comprising the steps of:

locating a first entity in the document;

extracting predetermined entities within a predetermined distance of the first entity location;

determining a document fragment based on the first entity and a predetermined entity;

determining, with the computing device of any of claims 1-6, whether the first entity and the predetermined entity have the predetermined relationship in the document snippet; and

and determining the predetermined entity with the predetermined relation as a target entity.

8. The document processing method of claim 7, wherein determining, with the computing device, whether the first entity and the predetermined entity have the predetermined relationship in the document snippet comprises:

determining, with the computing device, a first relationship that the first entity and the predetermined entity have in the document snippet; and

determining whether the first entity and a predetermined entity have a predetermined relationship based on an association between the first relationship and the predetermined relationship.

9. The document processing method of claim 8, wherein the first relationship determined using the computing device includes a plurality of first relationships and probabilities corresponding to the respective first relationships, the step of determining whether the first entity and a predetermined entity have a predetermined relationship based on the association between the first relationships and the predetermined relationship comprising:

in each first relationship, it is determined that the first entity and the predetermined entity have the predetermined relationship if the probability of the first relationship indicating the predetermined relationship exceeds the probability of the first relationship indicating a non-predetermined relationship.

10. The document processing method according to claim 8 or 9, wherein said step of determining whether said first entity and a predetermined entity have a predetermined relationship based on an association between said first relationship and said predetermined relationship comprises:

determining whether the first entity and a predetermined entity have a predetermined relationship based on an inclusion relationship between the first relationship and the predetermined relationship.

11. The document processing method of claim 8, wherein the step of determining whether the first entity and a predetermined entity have a predetermined relationship based on the association between the first relationship and the predetermined relationship comprises:

determining whether the first entity and a predetermined entity have a predetermined relationship according to directionality between the first relationship and the predetermined relationship.

12. The document processing method of claim 7, wherein the predetermined distance comprises:

the distance between the first entity and the sentence is up to a first preset sentence number.

13. The document processing method of claim 7, wherein the step of determining a document fragment based on the first entity and a predetermined entity comprises:

a second preset number of sentences supplemented at two ends of the sentence where the first entity and the candidate entity are located; and

the predetermined document fragment is composed of the supplemented sentence, the sentence containing the first entity and the predetermined entity, and the sentence between the first entity and the predetermined entity.

14. The document processing method of claim 7, wherein the document fragment contains no more than a third predetermined number of words.

15. A method of model training adapted to construct a training set to train a computing device as claimed in any one of claims 1 to 6, the method comprising the steps of:

acquiring a first entity and a target entity which form a preset relation, and a training document containing the first entity and the target entity;

determining whether the first entity and a target entity are both within a predetermined document fragment of the training document; and

if yes, determining the document segment comprising the first entity and the target entity as positive training data.

16. The training method of claim 15, further comprising the steps of:

selecting a document segment with a preset length by taking the first entity as a center in the document with the positive training data; and

determining the document snippet as negative training data if the target entity is not included in the selected document snippet.

17. A document processing method adapted to find a target entity in a document, which forms a predetermined relationship with a first entity, comprising the steps of:

finding a second entity in said document forming a predetermined relationship with said first entity using the method of any of claims 7-14;

searching a third entity forming a predetermined relation with the first entity in the document based on a predetermined rule; and

determining the target entity based on the second entity and a third entity.

18. The document processing method of claim 17, wherein the step of determining the target entity based on the second entity and a third entity comprises:

if a third entity forming a predetermined relation with the first entity is found in the document based on a predetermined rule, determining the third entity as the target entity; and

and if a third entity which forms a preset relation with the first entity cannot be found in the document based on the preset rule, determining the second entity as the target entity.

19. The document processing method of claim 17, further comprising:

constructing a query statement based on the first entity and a predetermined relationship, and searching a query result satisfying the query statement in the document as a fourth entity; and

determining the fourth entity as the target entity if a second entity forming a predetermined relationship with the first entity cannot be found in the document based on a deep learning model.

20. The document processing method of claim 17, wherein the predetermined rules include one or more of:

rules determined based on knowledge and linguistics; and

rules are constructed based on the syntactic information.

21. The document processing method of claim 17, wherein the step of locating a second entity in the document that forms a predetermined relationship with the first entity comprises:

constructing the computing device of any of claims 1-6 with different parameter configurations;

utilizing the method according to any of the claims 7-14, based on each structured computing means, to find from the document each fifth entity constituting a predetermined relation with the first entity, respectively; and

integrating the respective fifth entities to form the second entity.

22. The document processing method of claim 21, wherein in the step of integrating the respective fifth entities to form the second entity, the integration includes one or more of:

average, weighted average, and vote.

23. The document processing method of claim 21, wherein the parameter configuration includes one or more of:

a number of furthest spaced sentences between the first entity and a candidate entity;

the number of sentences supplemented at two ends of the sentence where the first entity and the candidate entity are located; and

the maximum number of words included in the divided document fragment.

24. A document processing apparatus adapted to find a target entity in a document that constitutes a predetermined relationship with a first entity, comprising:

a deep learning calculation unit adapted to find in the document a second entity constituting a predetermined relationship with the first entity using the computing apparatus of any of claims 1-6;

a rule calculation unit adapted to find a third entity in the document forming a predetermined relationship with the first entity based on a predetermined rule; and

a discriminating unit adapted to determine the target entity based on the second entity and a third entity.

25. The document processing apparatus according to claim 24, said discrimination unit adapted to determine a third entity as the target entity when the rule calculation unit outputs the third entity; and

determining the second entity output by the deep learning calculation unit as the target entity if the rule calculation unit cannot output a third entity.

26. The document processing apparatus of claim 24, further comprising:

a query calculation unit adapted to compose a query statement based on the first entity and a predetermined relationship, and to search for a query result satisfying the query statement in the document as a fourth entity; and

the discriminating unit is adapted to determine the fourth entity as the target entity when the deep learning calculation unit cannot output the second entity.

27. The document processing apparatus of claim 24, wherein the deep learning computation unit is adapted to:

constructing a plurality of computing devices with different parameter configurations;

searching for fifth entities which form a predetermined relationship with the first entity from the document respectively based on the computing devices; and

integrating the respective fifth entities to form the second entity.

28. A computing device, comprising:

at least one processor; and

a memory storing program instructions configured for execution by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 7-23.

29. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of any of claims 7-23.