CN108182295A - A kind of Company Knowledge collection of illustrative plates attribute extraction method and system - Google Patents

A kind of Company Knowledge collection of illustrative plates attribute extraction method and system Download PDF

Info

Publication number
CN108182295A
CN108182295A CN201810136568.4A CN201810136568A CN108182295A CN 108182295 A CN108182295 A CN 108182295A CN 201810136568 A CN201810136568 A CN 201810136568A CN 108182295 A CN108182295 A CN 108182295A
Authority
CN
China
Prior art keywords
attribute
event
entity
word
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810136568.4A
Other languages
Chinese (zh)
Other versions
CN108182295B (en
Inventor
孙世通
刘德彬
严开
陈玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Yijin Technology Co.,Ltd.
Chongqing Yucun Technology Co ltd
Original Assignee
Chongqing Yu Yu Da Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Yu Yu Da Data Technology Co Ltd filed Critical Chongqing Yu Yu Da Data Technology Co Ltd
Priority to CN201810136568.4A priority Critical patent/CN108182295B/en
Publication of CN108182295A publication Critical patent/CN108182295A/en
Application granted granted Critical
Publication of CN108182295B publication Critical patent/CN108182295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The present invention provides a kind of Company Knowledge collection of illustrative plates attribute extraction method, includes the following steps:Define entity class and event category;To every a kind of substantial definition attribute structure;Language material prepares and mark;Entity attribute extracts;Entity attribute merges.The present invention=combine objectivity, high efficiency that expert extracts content of text and classify to the knowledge of specific field entity attribute with machine learning, and applied in the Chinese language material of full dose business data;It realizes and all kinds of objective attribute target attributes is identified with less amount of mark.It solves the problems, such as the extraction of knowledge mapping interior joint entity attribute and the fusion of multi-source attribute.

Description

A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
Technical field
The present invention relates to a kind of information processing methods and system to be specifically related to a kind of Company Knowledge collection of illustrative plates attribute extraction method And system.
Background technology
Knowledge mapping is a kind of semantic network based on graph data structure, and basic unit is node (Node) and side (Edge).In Company Knowledge collection of illustrative plates, node characterization event entity and business entity;Relationship between the characterization entity of side.Entire enterprise In industry knowledge mapping, if focusing on an enterprise, it can be found that its essential information, the development that each event node concatenation forms is gone through The contents such as journey, each layer affiliated enterprise clustering (" association " here includes but is not limited only to equity investment, cooperation, upstream and downstream, subordinate Deng).
Knowledge mapping is applied to company information and finds field with business risk, and core value is the enterprise each classification Industry information is organically together in series, so as to which risk model be contributed to go wherein hiding co-related risks, group's risk of identification etc..And In the step for structuring node data, two large problems are mainly faced:1) different attribute is extracted from different data sources, 2) it is right The attribute from separate sources is rationally merged in same entity.
For technological layer, such Company Knowledge collection of illustrative plates is built, following two difficult points need be captured:
Entity attribute extracts and multi-source attribute merges the establishment of the relationship between different entities.
The prior art is used based on industry experience rule with the attribute extraction of dictionary with merging and based on supervised learning and mould The matched attribute extraction of formula is with merging.
The shortcomings that prior art is with the attribute extraction of dictionary with merging based on industry experience rule:To the reality of different industries Body, industry attribute is established a capital really needs senior industry specialists intervention, but can not overcome annotating efficiency low always by manpower entirely Under, the problems such as labeled standards are inconsistent.Although and rely on the dictionary of unified standard that can know the pass of the word centered on verb in text System, but the Relation extraction of noun appositive etc is easy for judging by accident.In addition this method can not have unregistered word The processing of effect ground and judgement.
The prior art also has using the attribute extraction based on supervised learning and pattern match with merging:By manually marking Language material on structural classification device, but its main bottleneck is that the mark of needs is more, and higher to data quality requirement.
Prior art Company Knowledge collection of illustrative plates attribute extraction is encountered figure, audio and video, text and is gone out simultaneously based on text data Existing, there are certain restrictions when needing across source processing.It is not accounted in modeling process yet and extracts different levels, the reality of granularity The situation of body and relationship.
Prior art Company Knowledge collection of illustrative plates attribute extraction is to the processing of target text using artificial mark, inefficiency cost It is high, it is impossible to which that mass text is quickly handled.
Prior art Company Knowledge collection of illustrative plates attribute extraction can not achieve correlation analysis and reasoning between text, realize that end is arrived The adaptive learning at end is established with relationship.
Invention content
The present invention provide it is a kind of can efficiently, the method for automatic, accurate progress Company Knowledge collection of illustrative plates attribute extraction, including with Lower step:
Define the entity class, event category, entity attribute structure of training sample;
Training sample language material prepares and mark;
Training entity attribute extraction model;
Target text input entity attribute extraction model is obtained into target text entity attribute;
Entity attribute fusion is performed to target text.
Further, the entity class, event category, entity attribute structure for defining training sample includes,
It is enterprise's factor or/and individual factor to define entity class;
Definition event category for judgement document, law court's bulletin, announcement of court session, bidding, equity, strategy, occurrences in human life, finance, It is a variety of or a kind of in debt, product, marketing, brand, accident;
The field of defined attribute is type field, a variety of or a kind of in time field, tag field, body field;
Training sample language material prepares and mark includes, the event category of each text and entity attribute structure to training sample database Mark.
Further, training entity attribute extraction model includes the following steps:
S1:It marks by word, is inputted N*K dimension word vector matrixs as the first two-way long short-term memory Recognition with Recurrent Neural Network, The N*T dimension mark class probability distribution matrixes of each word are obtained, wherein N is batch dimensional values, and K is embedded in vector length for word, and T is word The classification number of mark, the position of maximum value correspond to the label of current word, and obtain the word embedding data of each word;
S2:Determine training sample main information;
S3:Event vector is defined as the following formula, wherein, eventEmbedding is event vector, wjIt represents in sentence j-th The vector of word, n represent the sentence within main body longitudinal separation n;
It is marked by event, N*K dimension event vector matrixes is initial as the second two-way long short-term memory Recognition with Recurrent Neural Network Input, wherein N are batch dimensional values, and K is embedded in vector length for word, and L is the classification number of event mark, and the position of maximum value corresponds to The label of current event.
Defining Bayes network is:
P (A, B, C, D)=P (D | A, B) * P (C | A) * P (B | A) P (A)
A is the probability whether text describes certain class event,
B is the successful probability of event extraction,
C is the probability containing temporal information,
D is the probability of the vocabulary containing specific area,
Wherein whether the value of B is identical with training sample mark certainly by the label of N*L dimension mark class probability distribution matrix outputs It is fixed, if identical B be assigned a value of 1 if differing B be assigned a value of 0,
It is from the second the first N*L of two-way long short-term memory Recognition with Recurrent Neural Network acquisition dimension matrixes and the first N*L dimension matrixes is defeated Enter Bayesian network, the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed into Fusion Features, it will be special Sign fusion results feed back to the second two-way long short-term memory Recognition with Recurrent Neural Network;
S4:Loss function is defined as the output of the two-way long each timing node of short-term memory Recognition with Recurrent Neural Network and training sample The mean square error of this marking data repeats step S3 to loss function convergence.
Further, entity attribute extraction model, including,
Before the second two-way long short-term memory Recognition with Recurrent Neural Network the first N*L dimension matrixes are obtained to hidden layer and by the first N* L ties up Input matrix Bayesian network, and the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed feature Fusion, using Fusion Features result as the input to hidden layer after the second two-way long short-term memory Recognition with Recurrent Neural Network;
Alternatively,
The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network output layer and tie up the first N*L The 2nd N*L dimension matrixes of Bayesian network output are performed feature with the first N*L dimension matrixes and melted by Input matrix Bayesian network It closes, using Fusion Features result as the input of the second two-way long short-term memory Recognition with Recurrent Neural Network input layer;
Further, entity attribute fusion is performed to target text to include the following steps:
A selectes the foundation structure of event solid data as substrate value according to the similitude with stay in place form;
B traverses Candidate Set event, by tree depth-first sequence matching tree-shaped;
C is when two events compare, it then follows following rule:
If existence foundation structure node attribute value lacks, directly supplement;
If in existence foundation structure, corresponding node attribute values clash, if quality evaluation functions obtain Candidate Set Property value is more excellent, and the non-null value of substrate is replaced;
If substrate attribute is listings format, increase the table of substrate non-duplicate element exclusive in Candidate Set;
D repeat step B and step C can not continue to attribute it is perfect.
In order to ensure the implementation of the above method, the present invention also provides a kind of Company Knowledge collection of illustrative plates attribute extraction system, including With lower unit:
Definition unit, for defining the entity class of training sample, event category, entity attribute structure;
Mark unit, for the preparation of training sample language material and mark;
Training unit, for training entity attribute extraction model;
Entity attribute extracting unit, for target text input entity attribute extraction model to be obtained target text entity category Property;
Attribute integrated unit, for performing entity attribute fusion to target text.
Further, definition unit defines the entity class, event category, entity attribute structure of training sample and includes,
It is enterprise's factor or/and individual factor to define entity class;
Definition event category for judgement document, law court's bulletin, announcement of court session, bidding, equity, strategy, occurrences in human life, finance, It is a variety of or a kind of in debt, product, marketing, brand, accident;
The field of defined attribute is type field, a variety of or a kind of in time field, tag field, body field;
The training sample language material prepares and mark includes the event category and entity attribute of each text to training sample database Structure marks.
Further, training unit trains entity attribute extraction model using following steps:
S1:It marks by word, is inputted N*K dimension word vector matrixs as the first two-way long short-term memory Recognition with Recurrent Neural Network, The N*T dimension mark class probability distribution matrixes of each word are obtained, wherein N is batch dimensional values, and K is embedded in vector length for word, and T is word The classification number of mark, the position of maximum value correspond to the label of current word, and obtain the word embedding data of each word;
S2:Determine training sample main information;
S3:Event vector is defined as the following formula, wherein, eventEmbedding is event vector, wjIt represents in sentence j-th The vector of word, n represent the sentence within main body longitudinal separation n;
It is marked by event, N*K dimension event vector matrixes is initial as the second two-way long short-term memory Recognition with Recurrent Neural Network Input, wherein N are batch dimensional values, and K is embedded in vector length for word, and L is the classification number of event mark, and the position of maximum value corresponds to The label of current event.
Defining Bayes network is:
P (A, B, C, D)=P (D | A, B) * P (C | A) * P (B | A) P (A)
A is the probability whether text describes certain class event,
B is the successful probability of event extraction,
C is the probability containing temporal information,
D is the probability of the vocabulary containing specific area,
Wherein whether the value of B is identical with training sample mark certainly by the label of N*L dimension mark class probability distribution matrix outputs Fixed, B is assigned a value of 1 if identical, if differing B is assigned a value of 0,
It is from the second the first N*L of two-way long short-term memory Recognition with Recurrent Neural Network acquisition dimension matrixes and the first N*L dimension matrixes is defeated Enter Bayesian network, the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed into Fusion Features, it will be special Sign fusion results feed back to the second two-way long short-term memory Recognition with Recurrent Neural Network;
S4:Loss function is defined as the output of the two-way long each timing node of short-term memory Recognition with Recurrent Neural Network and training sample The mean square error of this marking data repeats step S3 to loss function convergence.
Further, entity attribute extraction model, including,
The first N*L dimension matrixes are obtained to hidden layer before the second two-way long short-term memory Recognition with Recurrent Neural Network, and by first N*L ties up Input matrix Bayesian network, and the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed spy Sign fusion, using Fusion Features result as the input to hidden layer after the second two-way long short-term memory Recognition with Recurrent Neural Network;
Alternatively,
The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network output layer and tie up the first N*L The 2nd N*L dimension matrixes of Bayesian network output are performed feature with the first N*L dimension matrixes and melted by Input matrix Bayesian network It closes, using Fusion Features result as the input of the second two-way long short-term memory Recognition with Recurrent Neural Network input layer;
Further, attribute integrated unit takes following steps to perform entity attribute fusion to target text:
A selectes the foundation structure of event solid data as substrate value according to the similitude with stay in place form;
B traverses Candidate Set event, by the pairs of match attribute of tree depth-first sequence;
C is when two events compare, it then follows following rule:
If existence foundation structure node attribute value lacks, directly supplement;
If in existence foundation structure, corresponding node attribute values clash, if quality evaluation functions obtain Candidate Set Property value is more excellent, and the non-null value of substrate is replaced;
If substrate attribute is listings format, increase the table of substrate non-duplicate element exclusive in Candidate Set;
D repeat step B and step C can not continue to attribute it is perfect.
The beneficial effects of the invention are as follows:
1 realizes the acquisition of knowledge in multi-source heterogeneous data and reduces degree of dependence of the algorithm model to label.
2 realize the establishment of relationship between entity attribute extraction and the fusion of multi-source attribute and different entities.
It is objective with classification that 3 combination experts extract content of text the knowledge of specific field entity attribute and machine learning Property, high efficiency, and applied to full dose business data Chinese language material in;It realizes and all kinds of target categories is identified with less amount of mark Property.
4 by sample data to attribute extraction model training after, automation is real to be realized to magnanimity target text data Body attribute extraction and knowledge mapping structure, improve efficiency, reduce human cost.
5 present invention combine Bayesian network and the advantage of LSTM, propose Bayes's return type neural network.Wherein, pattra leaves This network work feeds back BiLSTM return type neural networks, realizes laterally real using the capture of BiLSTM return types neural network The temporal correlation of the long range of long-time between body realizes correlation analysis and reasoning using Bayesian network in the longitudinal direction.Together When, BiLSTM is updated by the reasoning results for feeding back Bayesian network, so as to fulfill adaptive learning end to end with Relationship is established.
Description of the drawings
Fig. 1 is one embodiment of the invention Company Knowledge collection of illustrative plates attribute extraction method flow diagram.
Fig. 2 is one embodiment of the invention Company Knowledge collection of illustrative plates attribute extraction system construction drawing.
Fig. 3 is prior art shot and long term memory network schematic diagram.
Fig. 4 is prior art BiLSTM neural network model schematic diagrames.
Fig. 5 is one embodiment of the invention Bayes's return type neural network model schematic diagram.
Fig. 6 is one embodiment of the invention Bayesian network schematic diagram.
Fig. 7 is prior art LSTM memory module schematic diagrames of the present invention.
Fig. 8 is one embodiment of the invention Fusion Features schematic diagram.
Fig. 9 is one embodiment of the invention Fusion Features schematic diagram.
Specific embodiment
The present invention solves the problems, such as that one of thinking that background technology describes is:Using Bayes's return type neural network as real Body attribute extraction model realization Company Knowledge collection of illustrative plates attribute extraction.Wherein, Bayesian network is arrived as a network layer stack The upper strata of BiLSTM return type neural networks, so as to fulfill the length between entity is laterally captured using BiLSTM return types neural network The temporal correlation of time long range realizes correlation analysis and reasoning using Bayesian network in the longitudinal direction.Meanwhile by anti- The reasoning results of feedback Bayesian network are updated BiLSTM, are established so as to fulfill adaptive learning end to end and relationship. The entity attribute extraction model of precise and high efficiency is built, realizes the automation that entity attribute extracts.
As shown in Figure 1, Company Knowledge collection of illustrative plates attribute extraction method of the present invention includes the following steps:
Define the entity class, event category, entity attribute structure of training sample;
Training sample language material prepares and mark;
Training entity attribute extraction model;
Target text input entity attribute extraction model is obtained into target text entity attribute;
Entity attribute fusion is performed to target text.
Wherein, it defines in entity class and event category step,
Entity class can be enterprise or individual.
Event category can be, judgement document, law court's bulletin, announcement of court session, bidding, equity, strategy, occurrences in human life, finance, Debt, product, marketing, brand, accident etc.
For every a kind of entity, the attribute structure of its standardization is defined, by taking accident class as an example, in an embodiment of the present invention The attribute structure of definition event is:
By taking equity as an example, the attribute structure for defining event in an embodiment of the present invention is:
In the preparation and marking step of language material, word Marking Guidelines and meaning are as follows in an embodiment of the present invention:
B-ORG represents entity start bit label
I-ORG represents entity composition label
X represents the placeholders such as punctuate
O represents other words
After the completion of language material mark, down-stream is appreciated that the meaning of entity in text, facilitate machine to text at Reason.
More than specification is pressed in an embodiment of the present invention, completes the mark of each word of training text.
Event tag specification and meaning are as follows in an embodiment of the present invention:
JUDGE represents judgement document;
NOTICE represents law court's bulletin;
COURT represents announcement of court session;
BIDDING represents bidding;
STOCK represents equity;
STRATEGY represents strategy;
HR represents occurrences in human life;
FINANCE represents finance;
DEBET represents debt;
PROD representative products;
MARKET represents marketing;
BRAND represents brand;
ACCIDENT represents accident;
It should be noted that the label and specification of event can flexibly be selected according to specific project, and do not limit only The above-mentioned event enumerated using the present invention.
Event tag facilitates down-stream to handle text using English statement.
More than specification is pressed, completes the mark of every text of training text.
The mark of training text in an embodiment of the present invention is by manually carrying out, and mark result is as model in subsequent step Trained benchmark.
Training entity attribute extraction model step is illustrated with reference to embodiment,
In view of current main-stream method existing a series of problems (being referred in background technology) in processing entities attribute extraction, Intend coping with these difficult points based on deep neural network.The present invention is proposed in the event entity attributes pumping for enterprises as principal components Take in problem, the semi-supervised end to end and unsupervised method of application, so as to fulfill knowledge in multi-source heterogeneous data acquisition with And reduce degree of dependence of the algorithm model to label.
Shot and long term memory network (Long Short-Term Memory Network, LSTM), is a kind of special reply Formula neural network, to the long-term dependence of learning time sequence data.It has been widely used in hand since being suggested It writes, speech recognition, the numerous areas such as machine translation, and obtains original achievement.It can realize the long-term memory of data, in text There is significant effect in semantic analysis.LSTM is unfolded on time dimension, can obtain chain LSTM neural networks, can be right Relationship between the uncertain entity of length and entity is modeled, and then characterizes its respective feature.LSTM memory modules such as Fig. 7 It is shown.
The cell of LSTM can be characterized with the following formula:
it=g (Wxixt+Whiht-1+bi)
ft=g (Wxfxt+Whfht-1+bf)
ot=g (Wxoxt+Whoht-1+bo)
Input variation can be characterized with the following formula:
c_int=tanh (Wxcxt+Whcht-1+bc_in)
Status Change can be characterized with the following formula:
ct=ft·ct-1+it·c_int
ht=ot·tanh(ct)
Two-way shot and long term memory network (Bidirectional LSTM, BiLSTM) is implied comprising preceding to hidden layer with backward Two groups of module of layer can obtain the associated dependence of the long range of context long-time, capture context substance feature, obtain more Temporal correlation between multiple entity, and can from both direction the noises such as exclusive PCR entity to the shadow of neural network model It rings, excavation of the very big power-assisted to long-term dependence is extracted to the vital height such as information extraction and entity-relationship recognition Layer semantic feature.With respect to Bayesian network, the advantage of LSTM and its mutation is the long sequence relation that can be captured between entity, but its Inferential capability and interpretation are poor.BiLSTM neural network models are as shown in Figure 4.
Bayesian network (Bayesian Network, BN), also known as belief network (Belief Network) are a kind of general Rate graph model.Causal uncertainty has so as to fulfill relationship foundation and reasoning during it simulates mankind inference Good Knowledge representation and the ability for handling uncertainty knowledge.In addition, Bayesian network can carry out knowledge from probability angle Coding and explanation are being widely used including many fields such as computer intelligence science, medical diagnosis, information retrieval.Shellfish The advantages of this network of leaf is powerful inferential capability, and shortcoming is then poor to the modeling ability of long sequence, it is impossible to be caught well Grasp the indirect relation between entity and entity.
The present invention combines Bayesian network and the advantage of BiLSTM, proposes Bayes's return type neural network.Wherein, pattra leaves This network is returned as a network layer stack to the upper strata of BiLSTM return type neural networks so as to fulfill transverse direction using BiLSTM Compound neural network captures the temporal correlation of the long range of long-time between entity, realizes phase using Bayesian network in the longitudinal direction The analysis of closing property and reasoning.Meanwhile the reasoning results by feeding back Bayesian network are updated BiLSTM, are arrived so as to fulfill end The adaptive learning at end is established with relationship.
Bayes's return type neural network model is as shown in Figure 5 in one embodiment of the invention.
One embodiment of the invention trains entity attribute extraction model using following steps:
S1 is marked by word, is inputted word vector matrix (N*K) as BiLSTM, is obtained the mark class probability distribution of each word (N*4 matrixes).Wherein N is the length of each batch, and K is Embedding vector lengths, and the 4 classification number for word mark is maximum The position of value has corresponded to the label of current word.Have also obtained the word embedding of each word simultaneously at this time.
Embedding can be regarded as a space reflection (Mapping) mathematically:map(lambda y:F (x)), The characteristics of mapping is:(in mathematics, injective function is a function to injection, different arguments is connected in different values. More precisely, function f be known as injection when, f is caused to the y in each codomain, the x existed in an at most domain (x)=y.), mapping front-end geometry it is constant, correspond in word embedding concepts can be understood as find a function or Mapping, generates new expression spatially, the X spatial informations expressed by word one-hot be mapped to the hyperspace of Y to Amount.
Batch Size:Criticize size.There are three ways to the parameter updates in an embodiment of the present invention:
(1) Batch Gradient Descent, batch gradient decline, and traversal whole set of data calculates a loss function, Primary parameter update is carried out, the direction obtained in this way can more accurately be directed toward the direction of extreme value.
(2) Stochastic Gradient Descent, stochastic gradient descent calculate each sample primary loss Function carries out primary parameter update, and advantage is that speed is fast.
(3) Mini-batch Gradient Decent, small quantities of gradient decline, before two methods compromise, sample Data are divided into several batches, come counting loss function and undated parameter in batches, and such direction is more stable.S2 is according to sequence labelling As a result, the main body (subject) that event is obtained from text is candidate,
S2:Determine that main body (for those skilled in the art understand thoroughly known by interdependent syntactic analysis by syntax and part of speech analysis Common sense is not reinflated herein);
S3:Event vector is defined as the following formula, wherein, eventEmbedding is event vector, and wj is represented in sentence j-th The vector of word, n represent the sentence within main body longitudinal separation n;
The mark class probability distribution of each word this article can be obtained by above-mentioned steps from training text or target text This event vector matrix.
It is marked by event, is inputted event vector matrix (N*K) as BiLSTM, obtain each event in training sample Mark class probability of happening distribution (N*L matrixes).Wherein N is the length of each batch, and K is Embedding vector lengths, and L is thing The classification number (repeating no more later) of part mark, the position of maximum value has corresponded to the label of current event.
The position of maximum value has corresponded to the label of current event, and both the event of maximum probability was judged as reality in probability distribution The result of body attribute extraction.
In an embodiment of the present invention, refer to the text set that same event type is labeled as in training sample by event mark It closes.
In an embodiment of the present invention, as shown in Figure 6 according to practical dependence, defining Bayesian network, both text describes The joint probability text of certain class event describes DAG (the directed acyclic graph Directed Acyclic of the joint probability of certain class event Graph) it is:
P (A, B, C, D)=P (D | A, B) * P (C | A) * P (B | A) P (A)
A is the probability whether text describes certain class event,
B is the successful probability of event extraction,
D is the probability of the vocabulary containing specific area,
C is the probability containing temporal information,
Wherein B events (extracting successful probability), in all events that can be by calculating language material, the label being calculated is It is no it is identical with training sample mark obtain, if identical B be assigned a value of 1 if differing B be assigned a value of 0.
If the label event and the label event phase of handmarking of the second two-way long short-term memory Recognition with Recurrent Neural Network output Together, then illustrate event extraction success, otherwise illustrate that event extraction is unsuccessful.
In an embodiment of the present invention, a training sample is inputted into BiLSTM and obtains the event category point of this sample Cloth, wherein the sample event are the maximum probability of accident, and both the sampling was accident event, if being thing to the mark of the sample Therefore then event extraction success B=1, represented if not being accident to the mark of the sample event extraction fail B=0
In an embodiment of the present invention, the probability that accident event contains specific area vocabulary is manually to be marked in sample database The sample size containing specific area vocabulary divided by the artificial sample for being labeled as accident are total in all samples that accident event occurs Quantity.
In an embodiment of the present invention, the probability that accident event contains temporal information is that manually mark is all in sample database The sample size containing temporal information divided by the artificial total sample number amount for being labeled as accident in the sample of accident event occurs.
Whether the matrix of Bayesian network output describes the probability distribution matrix of certain event for text;
It is from the second the first N*L of two-way long short-term memory Recognition with Recurrent Neural Network acquisition dimension matrixes and the first N*L dimension matrixes is defeated Enter Bayesian network, the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed into Fusion Features, it will be special Sign fusion results feed back to the second two-way long short-term memory Recognition with Recurrent Neural Network;
Specifically, the above process can include two kinds of embodiments,
First embodiment:Before the second two-way long short-term memory Recognition with Recurrent Neural Network the first N*L dimensions are obtained to hidden layer Matrix, and the first N*L is tieed up into Input matrix Bayesian network, by the 2nd N*L dimension matrixes of Bayesian network output and the first N*L Tie up matrix perform Fusion Features, using Fusion Features result as after the second two-way long short-term memory Recognition with Recurrent Neural Network to hidden layer Input;
Specifically, first embodiment includes,
As shown in figure 8, the first N*L is obtained to hidden layer t moment before the second two-way long short-term memory Recognition with Recurrent Neural Network Matrix is tieed up, and the first N*L is tieed up into Input matrix Bayesian network, the 2nd N*L of Bayesian network output is tieed up into matrix and first N*L dimension matrix perform Fusion Features, using Fusion Features result as after the second two-way long short-term memory Recognition with Recurrent Neural Network to hidden The input of the t moment containing layer;
Those skilled in the art are it is to be understood that t moment refers to list entries t in the present invention, and Recognition with Recurrent Neural Network is at each moment There are one input Xt for meeting.
It is obtained before the second two-way long short-term memory Recognition with Recurrent Neural Network to the hidden layer t1 moment in other embodiments First N*L ties up matrix, and the first N*L is tieed up Input matrix Bayesian network, and the 2nd N*L of Bayesian network output is tieed up matrix Fusion Features are performed with the first N*L dimension matrixes, using Fusion Features result as the second two-way long short-term memory Recognition with Recurrent Neural Network The input at backward hidden layer t2 moment, t1 and t2 are different list entries;
Second embodiment:As shown in figure 9, obtain first from the second two-way long short-term memory Recognition with Recurrent Neural Network output layer First N*L is simultaneously tieed up Input matrix Bayesian network by N*L dimension matrixes, by the 2nd N*L dimension matrixes of Bayesian network output and the One N*L dimension matrixes perform Fusion Features, are inputted Fusion Features result as the second two-way long short-term memory Recognition with Recurrent Neural Network The input of layer;
In the present invention, Bayesian network is as a network layer stack to the upper strata of BiLSTM return type neural networks, Realize the temporal correlation that the long range of long-time between entity is laterally captured using BiLSTM return types neural network, in the longitudinal direction Correlation analysis and reasoning are realized using Bayesian network.Meanwhile by feeding back the reasoning results of Bayesian network to BiLSTM It is updated, is established so as to fulfill adaptive learning end to end and relationship.
It should be noted that two-way long short-term memory Recognition with Recurrent Neural Network output matrix and Bayesian network output matrix A kind of mode that arithmetic average is only matrix character fusion is taken, the present invention limits not to this, the side of matrix character fusion Formula can also include geometrical mean, mean square (root mean square average, rms), harmonic-mean, weighted average etc..
It is square with label for exporting for each timing nodes of BiLSTM that S4 defines loss function (loss function) Error (mean square error), iterative model to loss function convergence repeat step S3 to loss function convergence.
Entity attribute fusion steps are performed with reference to embodiment to target text to illustrate.
By target text input entity attribute extraction model is obtained target text entity attribute, all targets can be obtained The main body and its attribute structure of text, and obtain the distribution of the affiliated event category of target text:
Distribution=[p1, p2 ..., pL]
But in the event obtained for different data sources, it is possible to exist and describe same event mutually, but attribute extraction knot Fruit respectively has phenomena such as missing/conflict.Therefore present invention introduces convergence strategies, this is solved the problems, such as on the basis of event extraction.
The classification similitude that the present invention defines two events can use the similarity characterization (cosine similarity of their event distributions Deng).When the event of extraction is too many, traversing its similarity two-by-two, it will cause larger computing costs.Therefore thing is obtained Part candidate collection, and event set to be fused is chosen in candidate collection.
The primitive rule for choosing candidate collection is as follows:
Event body is identical
The similarity of event category distribution is high (Cosine Similarity)
Event time is close
For event candidate collection, it is also necessary to realize the Mutually fusion of attribute, the step depend on the time, main body, The matching degree of the attributes such as classification reaches the entity alignment of similar events.Attribute fusion steps are as follows:
A selectes the foundation structure of event solid data as substrate value according to the similitude with stay in place form
B traverses Candidate Set event, by the pairs of match attribute of tree depth-first sequence
C is when two events compare, it then follows following rule:
If existence foundation structure node attribute value lacks, directly supplement;
If in existence foundation structure, corresponding node attribute values clash, if quality evaluation functions obtain Candidate Set Property value is more excellent, and the non-null value of substrate is replaced;
If substrate attribute is listings format, increase the table of substrate non-duplicate element exclusive in Candidate Set;
D repeat B~C until attribute can not continue it is perfect
Two events are drawn into two target texts in an embodiment of the present invention
Stay in place form is in the present embodiment
Foundation structure is in the present embodiment
Property value is eventType, tags, subject, time, tags in the present embodiment;
Multiple object table texts are by obtaining two events in an alternative embodiment of the invention after attribute extraction model:
Event 1:
Event 2:
Since above-mentioned two event has identical subject and identical time, both two events had identical structure Template obtains event 3 after being merged to event 1 and time 2
Multiple target texts are by obtaining two events in an alternative embodiment of the invention after attribute extraction model:
Event 4:
Event 5:
Two events are there are identical underlying structure in the present embodiment, but time attributes send out conflict, quality evaluation The time property values of function call outgoing event 5 are more excellent, therefore the time attributes of event 4 are replaced with time:2017-05-0800:00: 00。
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention is described in detail with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:It is still Technical solution recorded in foregoing embodiments modifies and either which part or all technical features is equally replaced It changes;And these modifications or replacement, the model for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution It encloses, should all cover in the claim of the present invention and the range of specification.

Claims (10)

  1. A kind of 1. Company Knowledge collection of illustrative plates attribute extraction method, which is characterized in that include the following steps:
    Define the entity class, event category, entity attribute structure of training sample;
    Training sample language material prepares and mark;
    Training entity attribute extraction model;
    Target text input entity attribute extraction model is obtained into target text entity attribute;
    Entity attribute fusion is performed to target text.
  2. 2. a kind of Company Knowledge collection of illustrative plates attribute extraction method according to claim 1, which is characterized in that
    The entity class, event category, entity attribute structure for defining training sample includes,
    It is enterprise's factor or/and individual factor to define entity class;
    Definition event category for judgement document, law court's bulletin, announcement of court session, bidding, equity, strategy, occurrences in human life, finance, debt, It is a variety of or a kind of in product, marketing, brand, accident;
    The field of defined attribute is type field, a variety of or a kind of in time field, tag field, body field;
    Training sample language material prepares and mark includes, the event category of each text and entity attribute structure mark to training sample database Note.
  3. 3. a kind of Company Knowledge collection of illustrative plates attribute extraction method according to claim 1, which is characterized in that
    Training entity attribute extraction model includes the following steps:
    S1:It is marked by word, inputs, obtain using N*K dimension word vector matrixs as the first two-way long short-term memory Recognition with Recurrent Neural Network The N*T dimension mark class probability distribution matrixes of each word, wherein N is batch dimensional values, and K is embedded in vector length for word, and T is marked for word Classification number, the position of maximum value corresponds to the label of current word, and obtains the word embedding data of each word;
    S2:Determine training sample main information;
    S3:Event vector is defined as the following formula, wherein, eventEmbedding is event vector, wjRepresent j-th word in sentence Vector, n represent the sentence within main body longitudinal separation n;
    It is marked by event, using N*K dimension event vector matrixes as the second two-way long short-term memory Recognition with Recurrent Neural Network initial input, Wherein N is batch dimensional values, and K is embedded in vector length for word, and L is the classification number of event mark, and the position of maximum value, which has corresponded to, works as The label of preceding event;
    Defining Bayes network is:
    P (A, B, C, D)=P (D | A, B) * P (C | A) * P (B | A) P (A),
    A is the probability whether text describes certain class event,
    B is the successful probability of event extraction,
    C is the probability containing temporal information,
    D is the probability of the vocabulary containing specific area,
    Wherein the value of B by N*L tie up mark class probability distribution matrix output label it is whether identical with training sample mark determine, if It is identical, B be assigned a value of 1 if differing B be assigned a value of 0,
    The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network and the first N*L is tieed up into Input matrix shellfish The 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed Fusion Features, feature are melted by this network of leaf It closes result and feeds back to the second two-way long short-term memory Recognition with Recurrent Neural Network;
    S4:The output that loss function is defined as the two-way long each timing node of short-term memory Recognition with Recurrent Neural Network is beaten with training sample The mean square error of data is marked, repeats step S3 to loss function convergence.
  4. 4. a kind of Company Knowledge collection of illustrative plates attribute extraction method as claimed in any of claims 1 to 3, feature exist In,
    Entity attribute extraction model, including,
    The first N*L dimension matrixes are obtained to hidden layer and tie up the first N*L before the second two-way long short-term memory Recognition with Recurrent Neural Network The 2nd N*L dimension matrixes of Bayesian network output are performed feature with the first N*L dimension matrixes and melted by Input matrix Bayesian network It closes, using Fusion Features result as the input to hidden layer after the second two-way long short-term memory Recognition with Recurrent Neural Network;
    Alternatively,
    The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network output layer and the first N*L is tieed up into matrix Bayesian network is inputted, the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed into Fusion Features, it will Input of the Fusion Features result as the second two-way long short-term memory Recognition with Recurrent Neural Network input layer.
  5. 5. a kind of Company Knowledge collection of illustrative plates attribute extraction method according to claim 1, which is characterized in that
    Entity attribute fusion is performed to target text to include the following steps:
    A selectes the foundation structure of event solid data as substrate value according to the similitude with stay in place form;
    B traverses Candidate Set event, by tree depth-first sequence match attribute;
    C is when two events compare, it then follows following rule:
    If existence foundation structure node attribute value lacks, directly supplement;
    If in existence foundation structure, corresponding node attribute values clash, if quality evaluation functions obtain the attribute of Candidate Set Value is more excellent, and the non-null value of substrate is replaced;
    If substrate attribute is listings format, increase the table of substrate non-duplicate element exclusive in Candidate Set;
    D repeat step B and step C can not continue to attribute it is perfect.
  6. 6. a kind of Company Knowledge collection of illustrative plates attribute extraction system, which is characterized in that including with lower unit:
    Definition unit, for defining the entity class of training sample, event category, entity attribute structure;
    Mark unit, for the preparation of training sample language material and mark;
    Training unit, for training entity attribute extraction model;
    Entity attribute extracting unit, for target text input entity attribute extraction model to be obtained target text entity attribute;
    Attribute integrated unit, for performing entity attribute fusion to target text.
  7. 7. a kind of Company Knowledge collection of illustrative plates attribute extraction system according to claim 6, which is characterized in that
    The entity class, event category, entity attribute structure that definition unit defines training sample include,
    It is enterprise's factor or/and individual factor to define entity class;
    Definition event category for judgement document, law court's bulletin, announcement of court session, bidding, equity, strategy, occurrences in human life, finance, debt, It is a variety of or a kind of in product, marketing, brand, accident;
    The field of defined attribute is type field, a variety of or a kind of in time field, tag field, body field;
    The training sample language material prepares and mark includes to training sample database the event category of each text and entity attribute structure Mark.
  8. 8. a kind of Company Knowledge collection of illustrative plates attribute extraction system according to claim 6, which is characterized in that
    Training unit trains entity attribute extraction model using following steps:
    S1:It is marked by word, inputs, obtain using N*K dimension word vector matrixs as the first two-way long short-term memory Recognition with Recurrent Neural Network The N*T dimension mark class probability distribution matrixes of each word, wherein N is batch dimensional values, and K is embedded in vector length for word, and T is marked for word Classification number, the position of maximum value corresponds to the label of current word, and obtains the word embedding data of each word;
    S2:Determine training sample main information;
    S3:Event vector is defined as the following formula, wherein, eventEmbedding is event vector, wjRepresent j-th word in sentence Vector, n represent the sentence within main body longitudinal separation n;
    It is marked by event, using N*K dimension event vector matrixes as the second two-way long short-term memory Recognition with Recurrent Neural Network initial input, Wherein N is batch dimensional values, and K is embedded in vector length for word, and L is the classification number of event mark, and the position of maximum value, which has corresponded to, works as The label of preceding event;
    Defining Bayes network is:
    P (A, B, C, D)=P (D | A, B) * P (C | A) * P (B | A) P (A),
    A is the probability whether text describes certain class event,
    B is the successful probability of event extraction,
    C is the probability containing temporal information,
    D is the probability of the vocabulary containing specific area,
    Wherein the value of B by N*L tie up mark class probability distribution matrix output label it is whether identical with training sample mark determine, if It is identical, B be assigned a value of 1 if differing B be assigned a value of 0,
    The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network and the first N*L is tieed up into Input matrix shellfish The 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed Fusion Features, feature are melted by this network of leaf It closes result and feeds back to the second two-way long short-term memory Recognition with Recurrent Neural Network;
    S4:The output that loss function is defined as the two-way long each timing node of short-term memory Recognition with Recurrent Neural Network is beaten with training sample The mean square error of data is marked, repeats step S3 to loss function convergence.
  9. 9. a kind of Company Knowledge collection of illustrative plates attribute extraction system according to any one in claim 6 to 8, feature exist In,
    Entity attribute extraction model, including,
    The first N*L dimension matrixes are obtained to hidden layer before the second two-way long short-term memory Recognition with Recurrent Neural Network, and the first N*L is tieed up The 2nd N*L dimension matrixes of Bayesian network output are performed feature with the first N*L dimension matrixes and melted by Input matrix Bayesian network It closes, using Fusion Features result as the input to hidden layer after the second two-way long short-term memory Recognition with Recurrent Neural Network;
    Alternatively,
    The first N*L dimension matrixes are obtained from the second two-way long short-term memory Recognition with Recurrent Neural Network output layer and the first N*L is tieed up into matrix Bayesian network is inputted, the 2nd N*L dimension matrixes of Bayesian network output and the first N*L dimension matrixes are performed into Fusion Features, it will Input of the Fusion Features result as the second two-way long short-term memory Recognition with Recurrent Neural Network input layer.
  10. 10. a kind of Company Knowledge collection of illustrative plates attribute extraction system according to claim 6, which is characterized in that
    Attribute integrated unit takes following steps to perform entity attribute fusion to target text:
    A selectes the foundation structure of event solid data as substrate value according to the similitude with stay in place form;
    B traverses Candidate Set event, by the pairs of match attribute of tree depth-first sequence;
    C is when two events compare, it then follows following rule:
    If existence foundation structure node attribute value lacks, directly supplement;
    If in existence foundation structure, corresponding node attribute values clash, if quality evaluation functions obtain the attribute of Candidate Set Value is more excellent, and the non-null value of substrate is replaced;
    If substrate attribute is listings format, increase the table of substrate non-duplicate element exclusive in Candidate Set;
    D repeat step B and step C can not continue to attribute it is perfect.
CN201810136568.4A 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system Active CN108182295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810136568.4A CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810136568.4A CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Publications (2)

Publication Number Publication Date
CN108182295A true CN108182295A (en) 2018-06-19
CN108182295B CN108182295B (en) 2021-09-10

Family

ID=62552761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810136568.4A Active CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Country Status (1)

Country Link
CN (1) CN108182295B (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920656A (en) * 2018-07-03 2018-11-30 龙马智芯(珠海横琴)科技有限公司 Document properties description content extracting method and device
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
CN109189943A (en) * 2018-09-19 2019-01-11 中国电子科技集团公司信息科学研究院 A kind of capability knowledge extracts and the method for capability knowledge map construction
CN109446523A (en) * 2018-10-23 2019-03-08 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and condition random field
CN109446337A (en) * 2018-09-19 2019-03-08 中国信息通信研究院 A kind of knowledge mapping construction method and device
CN109471929A (en) * 2018-11-06 2019-03-15 湖南云智迅联科技发展有限公司 A method of it is matched based on map and carries out equipment maintenance record semantic search
CN109508385A (en) * 2018-11-06 2019-03-22 云南大学 A kind of character relation analysis method in web page news data based on Bayesian network
CN109657918A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Method for prewarning risk, device and the computer equipment of association assessment object
CN109767758A (en) * 2019-01-11 2019-05-17 中山大学 Vehicle-mounted voice analysis method, system, storage medium and equipment
CN110019841A (en) * 2018-07-24 2019-07-16 南京涌亿思信息技术有限公司 Construct data analysing method, the apparatus and system of debtor's knowledge mapping
CN110210840A (en) * 2019-06-14 2019-09-06 言图科技有限公司 A kind of method and system for realizing business administration based on instant chat
CN110245244A (en) * 2019-06-20 2019-09-17 贵州电网有限责任公司 A kind of organizational affiliation knowledge mapping construction method based on mass text data
CN110297904A (en) * 2019-06-17 2019-10-01 北京百度网讯科技有限公司 Generation method, device, electronic equipment and the storage medium of event name
CN110399487A (en) * 2019-07-01 2019-11-01 广州多益网络股份有限公司 A kind of file classification method, device, electronic equipment and storage medium
CN110516120A (en) * 2019-08-27 2019-11-29 北京明略软件***有限公司 Information processing method and device, storage medium, electronic device
CN110516077A (en) * 2019-08-20 2019-11-29 北京中亦安图科技股份有限公司 Knowledge mapping construction method and device towards enterprise's market conditions
CN110858353A (en) * 2018-08-17 2020-03-03 阿里巴巴集团控股有限公司 Method and system for obtaining case referee result
CN111105041A (en) * 2019-12-02 2020-05-05 成都四方伟业软件股份有限公司 Machine learning method and device for intelligent data collision
CN111382843A (en) * 2020-03-06 2020-07-07 浙江网商银行股份有限公司 Method and device for establishing upstream and downstream relation recognition model of enterprise and relation mining
CN111400504A (en) * 2020-03-12 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise key people
CN111475641A (en) * 2019-08-26 2020-07-31 北京国双科技有限公司 Data extraction method and device, storage medium and equipment
CN111523315A (en) * 2019-01-16 2020-08-11 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN111967761A (en) * 2020-08-14 2020-11-20 国网电子商务有限公司 Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN112000718A (en) * 2020-10-28 2020-11-27 成都数联铭品科技有限公司 Attribute layout-based knowledge graph display method, system, medium and equipment
CN112101034A (en) * 2020-09-09 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method and device for distinguishing attribute of medical entity and related product
CN112199961A (en) * 2020-12-07 2021-01-08 浙江万维空间信息技术有限公司 Knowledge graph acquisition method based on deep learning
CN112383575A (en) * 2021-01-18 2021-02-19 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN112417104A (en) * 2020-12-04 2021-02-26 山西大学 Machine reading understanding multi-hop inference model and method with enhanced syntactic relation
CN113326371A (en) * 2021-04-30 2021-08-31 南京大学 Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information
CN113468342A (en) * 2021-07-22 2021-10-01 北京京东振世信息技术有限公司 Data model construction method, device, equipment and medium based on knowledge graph
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN114741569A (en) * 2022-06-09 2022-07-12 杭州欧若数网科技有限公司 Method and device for supporting composite data types in graph database

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
CN105335378A (en) * 2014-06-25 2016-02-17 富士通株式会社 Multi-data source information processing device and method, and server
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
WO2017185887A1 (en) * 2016-04-29 2017-11-02 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating medical knowledge graph representing natural language medical text
CN107633093A (en) * 2017-10-10 2018-01-26 南通大学 A kind of structure and its querying method of DECISION KNOWLEDGE collection of illustrative plates of powering
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
CN105335378A (en) * 2014-06-25 2016-02-17 富士通株式会社 Multi-data source information processing device and method, and server
WO2017185887A1 (en) * 2016-04-29 2017-11-02 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating medical knowledge graph representing natural language medical text
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN107633093A (en) * 2017-10-10 2018-01-26 南通大学 A kind of structure and its querying method of DECISION KNOWLEDGE collection of illustrative plates of powering

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TAO CHEN 等: ""Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN"", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
曾道建 等: ""面向非结构化文本的开放式实体属性抽取"", 《江西师范大学学报(自然科学版)》 *
袁凯琦 等: ""医学知识图谱构建技术与研究进展"", 《计算机应用研究》 *
贾真 等: ""面向中文网络百科的属性和属性值抽取"", 《北京大学学报(自然科学版)》 *

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
CN108920556B (en) * 2018-06-20 2021-11-19 华东师范大学 Expert recommending method based on discipline knowledge graph
CN108920656A (en) * 2018-07-03 2018-11-30 龙马智芯(珠海横琴)科技有限公司 Document properties description content extracting method and device
CN110019841A (en) * 2018-07-24 2019-07-16 南京涌亿思信息技术有限公司 Construct data analysing method, the apparatus and system of debtor's knowledge mapping
CN110858353A (en) * 2018-08-17 2020-03-03 阿里巴巴集团控股有限公司 Method and system for obtaining case referee result
CN110858353B (en) * 2018-08-17 2023-05-05 阿里巴巴集团控股有限公司 Method and system for obtaining case judge result
CN109189943A (en) * 2018-09-19 2019-01-11 中国电子科技集团公司信息科学研究院 A kind of capability knowledge extracts and the method for capability knowledge map construction
CN109446337A (en) * 2018-09-19 2019-03-08 中国信息通信研究院 A kind of knowledge mapping construction method and device
CN109446337B (en) * 2018-09-19 2020-10-13 中国信息通信研究院 Knowledge graph construction method and device
CN109446523A (en) * 2018-10-23 2019-03-08 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and condition random field
CN109446523B (en) * 2018-10-23 2023-04-25 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and conditional random field
CN109471929B (en) * 2018-11-06 2021-08-17 湖南云智迅联科技发展有限公司 Method for semantic search of equipment maintenance records based on map matching
CN109508385A (en) * 2018-11-06 2019-03-22 云南大学 A kind of character relation analysis method in web page news data based on Bayesian network
CN109471929A (en) * 2018-11-06 2019-03-15 湖南云智迅联科技发展有限公司 A method of it is matched based on map and carries out equipment maintenance record semantic search
CN109657918B (en) * 2018-11-19 2023-07-18 平安科技(深圳)有限公司 Risk early warning method and device for associated evaluation object and computer equipment
CN109657918A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Method for prewarning risk, device and the computer equipment of association assessment object
CN109767758A (en) * 2019-01-11 2019-05-17 中山大学 Vehicle-mounted voice analysis method, system, storage medium and equipment
CN109767758B (en) * 2019-01-11 2021-06-08 中山大学 Vehicle-mounted voice analysis method, system, storage medium and device
CN111523315A (en) * 2019-01-16 2020-08-11 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN111523315B (en) * 2019-01-16 2023-04-18 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN110210840A (en) * 2019-06-14 2019-09-06 言图科技有限公司 A kind of method and system for realizing business administration based on instant chat
CN110297904A (en) * 2019-06-17 2019-10-01 北京百度网讯科技有限公司 Generation method, device, electronic equipment and the storage medium of event name
CN110297904B (en) * 2019-06-17 2022-10-04 北京百度网讯科技有限公司 Event name generation method and device, electronic equipment and storage medium
CN110245244A (en) * 2019-06-20 2019-09-17 贵州电网有限责任公司 A kind of organizational affiliation knowledge mapping construction method based on mass text data
CN110399487B (en) * 2019-07-01 2021-09-28 广州多益网络股份有限公司 Text classification method and device, electronic equipment and storage medium
CN110399487A (en) * 2019-07-01 2019-11-01 广州多益网络股份有限公司 A kind of file classification method, device, electronic equipment and storage medium
CN110516077A (en) * 2019-08-20 2019-11-29 北京中亦安图科技股份有限公司 Knowledge mapping construction method and device towards enterprise's market conditions
CN111475641A (en) * 2019-08-26 2020-07-31 北京国双科技有限公司 Data extraction method and device, storage medium and equipment
CN110516120A (en) * 2019-08-27 2019-11-29 北京明略软件***有限公司 Information processing method and device, storage medium, electronic device
CN111105041A (en) * 2019-12-02 2020-05-05 成都四方伟业软件股份有限公司 Machine learning method and device for intelligent data collision
CN111105041B (en) * 2019-12-02 2022-12-23 成都四方伟业软件股份有限公司 Machine learning method and device for intelligent data collision
CN111382843B (en) * 2020-03-06 2023-10-20 浙江网商银行股份有限公司 Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship
CN111382843A (en) * 2020-03-06 2020-07-07 浙江网商银行股份有限公司 Method and device for establishing upstream and downstream relation recognition model of enterprise and relation mining
CN111400504A (en) * 2020-03-12 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise key people
CN111400504B (en) * 2020-03-12 2023-04-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise key people
CN111967761B (en) * 2020-08-14 2024-04-02 国网数字科技控股有限公司 Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN111967761A (en) * 2020-08-14 2020-11-20 国网电子商务有限公司 Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN112101034A (en) * 2020-09-09 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method and device for distinguishing attribute of medical entity and related product
CN112101034B (en) * 2020-09-09 2024-02-27 沈阳东软智能医疗科技研究院有限公司 Method and device for judging attribute of medical entity and related product
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN112000718A (en) * 2020-10-28 2020-11-27 成都数联铭品科技有限公司 Attribute layout-based knowledge graph display method, system, medium and equipment
CN112417104B (en) * 2020-12-04 2022-11-11 山西大学 Machine reading understanding multi-hop inference model and method with enhanced syntactic relation
CN112417104A (en) * 2020-12-04 2021-02-26 山西大学 Machine reading understanding multi-hop inference model and method with enhanced syntactic relation
CN112199961A (en) * 2020-12-07 2021-01-08 浙江万维空间信息技术有限公司 Knowledge graph acquisition method based on deep learning
CN112383575B (en) * 2021-01-18 2021-05-04 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN112383575A (en) * 2021-01-18 2021-02-19 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN113326371B (en) * 2021-04-30 2023-12-29 南京大学 Event extraction method integrating pre-training language model and anti-noise interference remote supervision information
CN113326371A (en) * 2021-04-30 2021-08-31 南京大学 Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information
CN113468342A (en) * 2021-07-22 2021-10-01 北京京东振世信息技术有限公司 Data model construction method, device, equipment and medium based on knowledge graph
CN113468342B (en) * 2021-07-22 2023-12-05 北京京东振世信息技术有限公司 Knowledge graph-based data model construction method, device, equipment and medium
CN114741569A (en) * 2022-06-09 2022-07-12 杭州欧若数网科技有限公司 Method and device for supporting composite data types in graph database

Also Published As

Publication number Publication date
CN108182295B (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN108182295A (en) A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN106776581B (en) Subjective text emotion analysis method based on deep learning
CN109857990B (en) Financial bulletin information extraction method based on document structure and deep learning
CN110059188B (en) Chinese emotion analysis method based on bidirectional time convolution network
CN107239444B (en) A kind of term vector training method and system merging part of speech and location information
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN110298037A (en) The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
CN108920544A (en) A kind of personalized position recommended method of knowledge based map
CN110287323B (en) Target-oriented emotion classification method
CN111914558A (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN110472042B (en) Fine-grained emotion classification method
CN109697285A (en) Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN109558492A (en) A kind of listed company's knowledge mapping construction method and device suitable for event attribution
CN110442720A (en) A kind of multi-tag file classification method based on LSTM convolutional neural networks
CN110321563A (en) Text emotion analysis method based on mixing monitor model
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN105740382A (en) Aspect classification method for short comment texts
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN115599899B (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN113128233B (en) Construction method and system of mental disease knowledge map
CN113255321A (en) Financial field chapter-level event extraction method based on article entity word dependency relationship
CN111460830B (en) Method and system for extracting economic events in judicial texts
CN111914556A (en) Emotion guiding method and system based on emotion semantic transfer map
CN111710428A (en) Biomedical text representation method for modeling global and local context interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20191111

Address after: 400042 No.51 dapingzheng street, Yuzhong District, Chongqing

Applicant after: CHONGQING TELECOMMUNICATION SYSTEM INTEGRATION CO.,LTD.

Applicant after: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

Address before: 401121 the 18 layer of kylin C, No. 2, No. 53, Mount Huangshan Avenue, Yubei District, Chongqing

Applicant before: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Sun Shitong

Inventor after: Liu Debin

Inventor after: Yan Kai

Inventor after: Chen Wei

Inventor after: Yang Chen

Inventor before: Sun Shitong

Inventor before: Liu Debin

Inventor before: Yan Kai

Inventor before: Chen Wei

CB03 Change of inventor or designer information
CP03 Change of name, title or address

Address after: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee after: Zhongdian Zhi'an Technology Co.,Ltd.

Country or region after: China

Patentee after: Chongqing Yucun Technology Co.,Ltd.

Address before: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee before: CHONGQING TELECOMMUNICATION SYSTEM INTEGRATION CO.,LTD.

Country or region before: China

Patentee before: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20240409

Address after: 401120 Tower B, No. 10 Datagu West Road, Yubei District, Xiantao Street, Yubei District, Chongqing

Patentee after: China Telecom Yijin Technology Co.,Ltd.

Country or region after: China

Patentee after: Chongqing Yucun Technology Co.,Ltd.

Address before: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee before: Zhongdian Zhi'an Technology Co.,Ltd.

Country or region before: China

Patentee before: Chongqing Yucun Technology Co.,Ltd.

TR01 Transfer of patent right