CN110377759B - Method and device for constructing event relation graph - Google Patents

Method and device for constructing event relation graph Download PDF

Info

Publication number
CN110377759B
CN110377759B CN201910659446.8A CN201910659446A CN110377759B CN 110377759 B CN110377759 B CN 110377759B CN 201910659446 A CN201910659446 A CN 201910659446A CN 110377759 B CN110377759 B CN 110377759B
Authority
CN
China
Prior art keywords
event
events
sentences
sentence
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910659446.8A
Other languages
Chinese (zh)
Other versions
CN110377759A (en
Inventor
张梦迪
贾玉红
郑凡奇
熊俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201910659446.8A priority Critical patent/CN110377759B/en
Publication of CN110377759A publication Critical patent/CN110377759A/en
Application granted granted Critical
Publication of CN110377759B publication Critical patent/CN110377759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses an event relation graph construction method and a device, wherein the method comprises the following steps: acquiring event related data; identifying an event sentence in the event-related data according to a trained event sentence identification model, wherein the event sentence identification model is obtained by training by taking event-related data marked with the event sentence as a training sample; identifying events and/or relations in the event sentences according to a trained event relation identification model, and generating event relation pairs according to the events and/or relations, wherein the event relation identification model is obtained by training by taking the event sentences marked with the events and/or relations as training samples; and generating an event relation map according to the event relation pair. The invention solves the technical problem that the reliability of credit or marketing decision can not be further improved through the affair logic because the prior art lacks the mining and description of the affair logic knowledge.

Description

Method and device for constructing event relation graph
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a method and a device for constructing an event relation graph.
Background
The loan structure of the public loan bank generally occupies a major position in the marketing bank, and particularly, the loan of the bank to small and micro enterprises is continuously increased along with the increase of the national support for the general finance. While the bank is developing popular finance vigorously, the control of enterprise risks should be strengthened from various aspects. The current wind control system is mainly evaluated by a method of combining a rule engine and a model engine, and the evaluation result is mainly based on artificially defined rules or a black box model trained based on shallow features. The methods have a bottleneck in processing complex association relations, hidden association characteristics in a large number of data networks are difficult to consider, and the information often has huge value.
The knowledge graph is a graph structure formed by nodes and relations, provides a capability of analyzing problems from the relation perspective, and embodies great advantages in the mining of complex associated information. Knowledge-graph technology has been gradually applied to the financial field, such as anti-money laundering, anti-fraud, enterprise risk identification, and the like. However, the currently used maps mainly use concepts and relationships between the concepts as the core, and lack mining of knowledge of "logic of affairs". The event logic is crucial to the real-world behavior deduction, and can further support the reliability of credit or marketing decisions. Therefore, the prior art lacks a method for describing the affair logic applied in the financial field.
In order to solve at least one of the above technical problems, the present invention provides a method for constructing an event relationship graph.
Disclosure of Invention
The invention mainly aims to provide a method and a device for constructing an event relation graph, which are used for solving the technical problem that the reliability of credit or marketing decision cannot be further improved through the event logic because the prior art lacks mining and description of the knowledge of the event logic.
In order to achieve the above object, according to an aspect of the present invention, there is provided an event relationship graph construction method, including:
obtaining event related data, wherein the event related data comprises: at least one of legal documents, enterprise announcements, enterprise public opinions and news data;
identifying an event sentence in the event-related data according to a trained event sentence identification model, wherein the event sentence identification model is obtained by training by taking event-related data marked with the event sentence as a training sample;
identifying events and/or relations in the event sentences according to a trained event relation identification model, and generating event relation pairs according to the events and/or relations, wherein the event relation identification model is obtained by training by taking the event sentences marked with the events and/or relations as training samples;
and generating an event relation map according to the event relation pair.
Optionally, the identifying an event sentence in the event-related data according to the trained event sentence identification model specifically includes:
and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: and the training sample of the event sentence recognition model also marks the type of each event sentence.
Optionally, the identifying an event and/or a relationship in the event sentence according to the trained event relationship identification model, and generating an event relationship pair according to the event and/or the relationship specifically includes:
identifying events and relations in the cause-effect order-bearing event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations;
and identifying the events in the cause-order event sentences and the fruit-bearing event sentences according to the event relation identification model, matching the events in the cause-order event sentences with the events in the fruit-bearing event sentences to generate event relation pairs, wherein training samples of the event relation identification model are different types of event sentences marked with events and/or relations, the events and relations are marked for the cause-effect order-bearing event sentences, and the events are marked for the cause-order and fruit-bearing event sentences.
Optionally, the generating an event relationship graph according to the event relationship pair specifically includes:
acquiring standardized names of various events and standardized names of various relations, wherein the standardized names of various events are obtained by clustering the full events through a clustering model, and the standardized names of various relations are obtained by clustering the full relations through the clustering model;
respectively converting the events and the relations in the event relation pair into corresponding standardized names;
and generating an event relation map according to the event relation pairs after the standardized names are converted.
Optionally, the method further includes:
marking out event sentences and the types of the event sentences in event related data used for model training;
and training the event sentence recognition model according to the labeled event related data.
Optionally, the method further includes:
marking events and/or relations in event sentences used for model training, wherein events and relations are marked in causal sequence bearing event sentences, and events are marked in causal sequence bearing event sentences and fruit bearing event sentences;
and training the event relation recognition model according to the labeled event sentence.
In order to achieve the above object, according to another aspect of the present invention, there is provided an event relationship map construction apparatus including:
an event related data acquiring unit configured to acquire event related data, wherein the event related data includes: at least one of legal documents, enterprise announcements, enterprise public opinions and news data;
the event sentence recognition unit is used for recognizing an event sentence in the event related data according to a trained event sentence recognition model, wherein the event sentence recognition model is obtained by training by taking the event related data marked with the event sentence as a training sample;
the event and relationship recognition unit is used for recognizing events and/or relationships in the event sentences according to a trained event relationship recognition model and generating event relationship pairs according to the events and/or relationships, wherein the event relationship recognition model is obtained by training by taking the event sentences marked with the events and/or relationships as training samples;
and the relation map construction unit is used for generating an event relation map according to the event relation pair.
Optionally, the event sentence identification unit is specifically configured to: and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: and the training sample of the event sentence recognition model also marks the type of each event sentence.
Optionally, the event and relationship identifying unit is specifically configured to: identifying events and relations in the cause-effect order-bearing event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations; and identifying the events in the cause-order event sentences and the fruit-bearing event sentences according to the event relation identification model, matching the events in the cause-order event sentences with the events in the fruit-bearing event sentences to generate event relation pairs, wherein training samples of the event relation identification model are different types of event sentences marked with events and/or relations, the events and relations are marked for the cause-effect order-bearing event sentences, and the events are marked for the cause-order and fruit-bearing event sentences.
Optionally, the relationship graph constructing unit includes:
the standardized name acquisition module is used for acquiring standardized names of various events and standardized names of various relations, wherein the standardized names of various events are obtained by clustering the full-scale events through a clustering model, and the standardized names of various relations are obtained by clustering the full-scale relations through the clustering model;
the standardized name conversion module is used for respectively converting the events and the relations in the event relation pair into corresponding standardized names;
and the relation map generation module is used for generating an event relation map according to the event relation pair converted from the standardized name.
Optionally, the apparatus further comprises:
the first text labeling unit is used for labeling the event sentences and the types of the event sentences in the event related data used for model training;
and the event sentence recognition model training unit is used for training the event sentence recognition model according to the labeled event related data.
Optionally, the apparatus further comprises:
the second text labeling unit is used for labeling events and/or relations in the event sentences used for model training, wherein the events and relations are labeled for causal sequence bearing event sentences, and the events are labeled for causal sequence bearing event sentences and causal sequence bearing event sentences;
and the event relation recognition model training unit is used for training the event relation recognition model according to the labeled event sentence.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the above event relationship map construction method when executing the computer program.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the above event relationship map construction method.
The invention has the beneficial effects that: according to the embodiment of the invention, the event sentence in the event related text is identified through the trained event sentence identification model, the event and the relation in the event sentence are identified according to the trained event relation identification model, and then the event and the relation generate the event relation map. The embodiment of the invention forms a logical relationship link map taking the event as a unit by identifying the corresponding event and relationship of the related field from various texts, thereby solving the technical problem that the reliability of credit or marketing decision can not be further improved through the event logic due to the lack of mining and description of the event logic knowledge in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts. In the drawings:
FIG. 1 is a flowchart of a method for constructing an event relationship graph according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an embodiment of the present invention for identifying events and relationships in different types of event sentences;
FIG. 3 is a flow diagram of an embodiment of the present invention for converting events and relationships to standard names;
FIG. 4 is a flow chart of an embodiment of the invention for training an event sentence recognition model;
FIG. 5 is a flow diagram of training an event relationship recognition model according to an embodiment of the present invention;
FIG. 6 is a first block diagram of an event relationship graph building apparatus according to an embodiment of the present invention;
FIG. 7 is a block diagram of a relational map building unit according to an embodiment of the present invention;
fig. 8 is a second configuration block diagram of the event relationship map building apparatus according to the embodiment of the present invention;
fig. 9 is a third structural block diagram of the event relationship map building apparatus according to the embodiment of the present invention;
FIG. 10 is a schematic diagram of a computer apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a flowchart of an event relationship graph construction method according to an embodiment of the present invention, and as shown in fig. 1, the event relationship graph construction method according to the embodiment includes steps S101 to S104.
Step S101, obtaining event related data, wherein the event related data comprises: at least one of legal documents, enterprise announcements, enterprise public opinions and news data.
In an embodiment of the present invention, the event related data may be dynamic company data crawled in a network, such as news, public opinions, announcements, and the like.
In an optional embodiment of the present invention, after the event-related data is obtained, the event-related data is further preprocessed, where the preprocessing includes integrating, denoising, and sentence splitting processing on the text data, such as removing format errors and removing text data with low correlation, and then sentence splitting processing is performed on the text data with a separation of a commonly used sentence end symbol.
And step S102, identifying an event sentence in the event-related data according to a trained event sentence identification model, wherein the event sentence identification model is obtained by training by using event-related data marked with the event sentence as a training sample.
In the embodiment of the present invention, an event sentence refers to a sentence containing an event in one article. If the capital source of the company A is cut off, the bankruptcy is probably caused to be the event sentence which is to be found and comprises the events of cutting off the capital source, breaking the bankruptcy and the relationship between the two events.
In the embodiment of the invention, the event sentence recognition model can adopt a rule-based model or a machine learning-based model. In the rule-based classification model, the classification rules can be defined by domain experts or can be learned by computer learning. As the machine learning-based model, CNN, RNN, transform, or the like can be used. In a preferred embodiment of the present invention, the classical text classification model Fasttext can be used as the event sentence recognition model.
A specific training method of the event sentence recognition model may refer to the following embodiments of step S401 and step S402.
And S103, recognizing events and/or relationships in the event sentences according to the trained event relationship recognition model, and generating event relationship pairs according to the events and/or relationships, wherein the event relationship recognition model is obtained by training by taking the event sentences marked with the events and/or relationships as training samples.
In the embodiment of the invention, each event relationship pair comprises three elements of a starting event, a terminating event and a relationship between the starting event and the terminating event. In the embodiment of the invention, the event sentences comprise three types, namely cause-effect order bearing type, cause order bearing type and effect bearing type, the cause-effect order bearing type event sentences mean that cause-effect events or order bearing events simultaneously appear in one sentence, for example, the IPO success of company A and the evaluation rise, after the event sentences are extracted, the initial events, the relation and the termination events are all in the same event sentence, and the extracted results can directly form event relation pairs. Because the sequence event sentence may only include the start event, and the same fruit bearing event sentence may only include the end event, when generating the event relationship pair, the two types of event sentences need to be considered separately, the start event and the end event are spliced, and when splicing, the relationship between the start event and the end event is set as the preset relationship, so as to generate the complete event relationship pair.
In a preferred embodiment of the present invention, the training sample used for training the event relation recognition model may be the same sample as the normal sample used for training the event sentence recognition model, that is, the events and relations of the event sentences are marked in the event-related data marked with the event sentences, and the event sentences including the marked events and relations are used for training the event relation recognition model.
In the embodiment of the invention, the event relation identification model is an entity identification model for extracting events and relations, and in essence, the event and relation identification also belongs to a classification model. In an alternative embodiment of the invention, the classical model bilstm + crf is selected as the event relation recognition model.
The specific training method of the event relation recognition model can refer to the following embodiments of step S501 and step S502.
And step S104, generating an event relation map according to the event relation pair.
In the embodiment of the invention, after the event relation pair is generated, the event and the relation are loaded into the graph database to form an event relation network, namely an event relation graph, and the event relation can be inquired through a relevant interface provided by graph data.
It can be seen from the above description that, in the embodiment of the present invention, the event sentence in the event-related text is identified by the trained event sentence identification model, and then the event and the relationship in the event sentence are identified according to the trained event relationship identification model, so that the event and the relationship generate the event relationship map. The embodiment of the invention forms a logical relationship link map taking the event as a unit by identifying the corresponding event and relationship of the related field from various texts, thereby solving the technical problem that the reliability of credit or marketing decision can not be further improved through the event logic due to the lack of mining and description of the event logic knowledge in the prior art.
In an embodiment of the present invention, the step S102 may specifically include: and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: and the training sample of the event sentence recognition model also marks the type of each event sentence.
Fig. 2 is a flowchart illustrating an embodiment of identifying events and relationships in event sentences of different types according to the present invention, and as shown in fig. 2, the step S103 specifically includes a step S201 and a step S202.
Step S201, identifying events and relations in the causal compliance event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations.
In the embodiment of the invention, the causal sequence-bearing event sentence refers to the fact that a causal event or a sequence-bearing event simultaneously appears in one event sentence, for example, "the company a IPO succeeds, the valuation is greatly increased", the initial event, the relationship and the termination event are in the same sentence after the event sentence is extracted, and the extraction result can directly form an event relationship pair.
Step S202, recognizing the events in the reason-sequence event sentence and the fruit-bearing event sentence according to the event relation recognition model, and matching the events in the reason-sequence event sentence with the events in the fruit-bearing event sentence to generate an event relation pair.
In the embodiment of the present invention, because the sequence event sentence may only include the start event, and the same fruit bearing event sentence may only include the end event, when the event relationship pair is generated, the two types of event sentences need to be considered separately, the start event and the end event are spliced, and when the event relationship pair is generated, the relationship between the start event and the end event is set to be the preset relationship, so as to generate the complete event relationship pair.
In this embodiment, the training samples of the event relationship identification model are different types of event sentences labeled with events and/or relationships, and for the causal compliance type event sentences, the starting event, the ending event, and the relationship between the starting event and the ending event may be labeled. And only the starting event or the ending event can be marked for the reason class and the fruit bearing event.
Fig. 4 is a flowchart of training an event sentence recognition model according to an embodiment of the present invention, and as shown in fig. 4, the flowchart of training the event sentence recognition model according to the embodiment of the present invention includes step S401 and step S402.
Step S401, annotating the event sentences and the types of the event sentences in the event-related data used for model training.
In the embodiment of the present invention, after the event-related data used for model training is obtained, the event-related data may be preprocessed, where the preprocessing includes integrating, denoising, and sentence splitting processing on the text data, such as removing format errors and removing text data with low correlation, and then sentence splitting processing is performed on the text data with separation of commonly used sentence end symbols.
In the embodiment of the invention, when the event related data used for training is labeled, the event sentence and the type of the event sentence are labeled at the same time. If "capital source of company a is cut off in the event-related data, it is highly likely to cause bankruptcy" as an event sentence to be annotated, since it includes the event "capital source is cut off", the event "bankruptcy" and the relationship "cause" between events, and thus this sentence is annotated as a cause-and-effect compliance class.
And S402, training the event sentence recognition model according to the labeled event related data.
In the embodiment of the invention, a classic text classification model Fasttext can be adopted as an event sentence recognition model.
Fig. 5 is a flowchart of training an event relationship recognition model according to an embodiment of the present invention, and as shown in fig. 5, the flowchart of training the event relationship recognition model according to the embodiment of the present invention includes step S501 and step S502.
Step S501, marking events and/or relations in event sentences used for model training, wherein the events and relations are marked in causal sequence bearing event sentences, and the events are marked in causal sequence bearing event sentences and fruit bearing event sentences.
Step S502, the event relation recognition model is trained according to the labeled event sentence.
In the embodiment of the invention, the training samples are generated by marking out the events and the relations in the event sentences used for model training. For the cause-effect compliance type event sentence, since it includes the start event, the end event and the relationship between the start event and the end event at the same time, the start event, the end event and the relationship need to be marked at the same time when marking. For example, the method can prevent the small and medium bank from shrinking the table when the volume of the distributed documents of the same industry is recovered, the events are the volume of the distributed documents of the same industry is recovered and the small and medium bank shrinks the table, the relationship is the prevention, and the standard events and the relationship are required at the same time when the standard is used. Since the sequence event sentence may only include the start event and the same fruit bearing event sentence may only include the end event, only the event can be marked during marking.
In the embodiment of the invention, the event relation recognition model essentially belongs to a classification model, and a classical model bilstm + crf is selected. The input of the bilstm + crf algorithm is a word vector of a single sentence, each word vector passes through two lstm layers and then enters the crf layer, and the word vector is output as the category of a single word.
In this embodiment, because the event relationship identification model adopts the bilstm + crf model, when an event sentence is labeled, the event sentence needs to be labeled word by word according to the format of the bilstm + crf model. If the event sentence 'the release quantity of the same-industry deposit receipt is recovered, the table shrinkage of the middle and small banks' can BE prevented, and the event sentence 'B-BE I-BE I-BE I-BE I-BE I-BE I-BE I-BE O B-R I-R B-EE I-EE I-EE I-EE I-EE I-EE'.
In the embodiment of the present invention, when the step S104 generates the event relationship map according to the event relationship pair, the events and the relationships need to be converted into standardized names. Fig. 3 is a flowchart illustrating a process of converting an event and a relationship into a standard name according to an embodiment of the present invention, and as shown in fig. 3, the process of converting an event and a relationship into a standard name according to an embodiment of the present invention includes steps S301 to S303.
Step S301, obtaining standardized names of various events and standardized names of various relations, wherein the standardized names of various events are clustering centers obtained by clustering the total events through a clustering model, and the standardized names of various relations are clustering centers obtained by clustering the total relations through the clustering model.
Step S302, respectively converting the events and the relations in the event relation pair into corresponding standardized names.
And step S303, generating an event relation map according to the event relation pair after the standardized name is converted.
In the embodiment of the present invention, after determining the events and relationships in the event sentence through the event relationship identification model and generating the corresponding event relationship pair, the events and relationships need to be converted into standardized names. In the embodiment of the present invention, normalization refers to merging events and relationships with the same meaning but different expression forms into a unified expression form, such as "continuous setdown of stock price", "oscillation setdown of stock price" and "setdown of stock price" which mean the same meaning, but have more than one expression form, and at this time, it is necessary to normalize their names. The event standardization idea in the invention is to cluster the existing full events or relations, and select the clustering center in each class as the standardization name of the class. The invention finally uses the AP clustering model as an event and relationship clustering model, and the AP clustering model does not need to self-define the number of categories, so that the prior experience becomes a non-essential condition. The process of clustering the full-scale events by the AP clustering model may be as follows:
step 1: all events are vectorized as input to the AP clustering model.
Step 2: the initial attraction degree matrix R and the attribution degree matrix A are 0 matrixes. The attraction matrix r (i, k) is used to describe the degree to which point k fits as the cluster center for data point i. The attribution matrix a (i, k) is used to describe the degree to which point i selects point k as its cluster center. i, k is the vectorization matrix for the event or relationship i, k.
And step 3: the attraction matrix R is updated. S (i, k) indicates the similarity between the point i and the point k, and the similarity is the similarity of the point k as the clustering center of the point i. Typically using euclidean distances.
Figure BDA0002138027050000101
And step 3: and updating the attribution degree matrix A.
Figure BDA0002138027050000102
And 4, step 4: the two equations are damped according to the damping coefficient λ, the main effect being to make the model converge, λ defaulting to 0.5.
rt+1(i,k)=λ×rt(i,k)+(1-λ)×rt+1(i,k)
at+1(i,k)=λ×at(i,k)+(1-λ)×at+1(i,k)
And 5: and (5) repeating the steps 2, 3 and 4 until the matrix is stable or the maximum iteration number is reached, and finishing the algorithm.
And outputting the position of the clustering center, the class label and the like.
The process of clustering the total relationship by the AP clustering model is similar to the above process of clustering events, and is not described here again.
In an optional embodiment of the present invention, after determining the events and relationships in the event sentence through the event relationship identification model and generating the corresponding event relationship pair, and before performing standardized transformation on the events and relationships, the generated event relationship pair also needs to be checked. Specifically, the generated event relationship pair is sent to a user for auditing through a preset interface, and the user can manually audit the initial event, the relationship and the termination event in the event relationship pair. The user may perform operations including, but not limited to, query operations on all events and relationships, delete operations on obviously problematic event relationships, modify operations on event or relationship names that do not conform to human defined name thoughts at the time of review.
The embodiment shows that the invention provides a knowledge graph construction method based on event relation. The logic relationship link map taking the event as a unit is formed by identifying the event and the relationship corresponding to the related field from various texts, so that at least the following beneficial effects are realized:
1. the event relation graph generated by the invention is equivalent to the supplement of the knowledge description layer of the static enterprise information of the prior knowledge graph. If the static knowledge map is combined with the logic rule of the dynamic event, namely a strong logic link network is added on the basis of the existing knowledge base, the knowledge information of the event can be further expanded, and further identification and judgment are carried out on the dynamic risk of an enterprise related to the event;
2. the invention abandons the constraint that event relation map construction needs experts to deeply participate in construction of event ontologies, and uses a machine learning algorithm to automatically extract events and relations to form an ontology. The method greatly reduces the human intervention in the construction of the event relation map, and reduces the influence of the limited cognition and experience of experts on the map construction;
3. the embodiment of the invention combines the event auditing steps to perform standardized fusion on the extracted event relation, and then performs manual auditing on the fused event and relation, thereby meeting the requirement of screening high-quality event relation;
4. the atlas construction method based on the event relation fully considers hidden associated features in a large number of data networks, overcomes the limitation of risk assessment by a method combining a rule engine and a model engine at present, and comprises the limitation of rule definition caused by limited expert experience or poor interpretability of a black box model trained by shallow features;
5. the event relation graph is equivalent to the supplement of the static enterprise information knowledge description of the prior knowledge graph, and overcomes the defect that the prior knowledge graph only considers the static enterprise information knowledge description and does not consider the short board of the dynamic event logic relation.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Based on the same inventive concept, the embodiment of the present invention further provides an event relationship graph constructing apparatus, which can be used to implement the event relationship graph constructing method described in the above embodiment, as described in the following embodiments. Because the principle of the event relation graph construction device for solving the problem is similar to the event relation graph construction method, the embodiment of the event relation graph construction device can refer to the embodiment of the event relation graph construction method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a first structural block diagram of an event relationship map building apparatus according to an embodiment of the present invention, and as shown in fig. 6, the event relationship map building apparatus according to the embodiment of the present invention includes: the system comprises an event related data acquisition unit 1, an event sentence identification unit 2, an event and relationship identification unit 3 and a relationship map construction unit 4.
An event related data obtaining unit 1 configured to obtain event related data, wherein the event related data includes: at least one of legal documents, enterprise announcements, enterprise public opinions and news data.
And the event sentence recognition unit 2 is configured to recognize an event sentence in the event-related data according to a trained event sentence recognition model, where the event sentence recognition model is obtained by training using event-related data labeled with the event sentence as a training sample.
And the event and relationship recognition unit 3 is configured to recognize an event and/or a relationship in the event sentence according to a trained event relationship recognition model, and generate an event relationship pair according to the event and/or the relationship, where the event relationship recognition model is obtained by training an event sentence with the event and/or the relationship labeled as a training sample.
And the relation map construction unit 4 is used for generating an event relation map according to the event relation pair.
In an embodiment of the present invention, the event sentence recognition unit 2 is specifically configured to: and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: and the training sample of the event sentence recognition model also marks the type of each event sentence.
In an embodiment of the present invention, the event and relationship identifying unit 3 is specifically configured to: identifying events and relations in the cause-effect order-bearing event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations; and identifying the events in the cause-order event sentences and the fruit-bearing event sentences according to the event relation identification model, matching the events in the cause-order event sentences with the events in the fruit-bearing event sentences to generate event relation pairs, wherein training samples of the event relation identification model are different types of event sentences marked with events and/or relations, the events and relations are marked for the cause-effect order-bearing event sentences, and the events are marked for the cause-order and fruit-bearing event sentences.
Fig. 7 is a structural diagram of a relationship graph constructing unit according to an embodiment of the present invention, and as shown in fig. 7, the relationship graph constructing unit 4 according to the embodiment of the present invention includes: a standardized name acquisition module 401, a standardized name conversion module 402 and a relationship graph generation module 403.
The standardized name acquisition module 401 is configured to acquire standardized names of various events and standardized names of various relationships, where the standardized names of various events are obtained by clustering the full-scale events through a clustering model, and the standardized names of various relationships are obtained by clustering the full-scale relationships through the clustering model;
a standardized name conversion module 402, configured to convert the events and the relationships in the event relationship pair into corresponding standardized names, respectively;
a relationship map generation module 403, configured to generate an event relationship map according to the event relationship pair after the standardized name is converted.
Fig. 8 is a second structural block diagram of the event relationship map building apparatus according to the embodiment of the present invention, and as shown in fig. 8, the event relationship map building apparatus according to the embodiment of the present invention further includes: a first text labeling unit 5 and an event sentence recognition model training unit 6.
A first text labeling unit 5, configured to label event sentences and types of the event sentences in event-related data used for model training;
and the event sentence recognition model training unit 6 is used for training the event sentence recognition model according to the labeled event related data.
Fig. 9 is a third structural block diagram of the event relationship map building apparatus according to the embodiment of the present invention, and as shown in fig. 9, the event relationship map building apparatus according to the embodiment of the present invention further includes: a second text labeling unit 7 and an event relation recognition model training unit 8.
And the second text labeling unit 7 is used for labeling events and/or relations in the event sentences used for model training, wherein the events and relations are labeled for causal sequence bearing event sentences, and the events are labeled for causal sequence bearing event sentences and causal sequence bearing event sentences.
And the event relation recognition model training unit 8 is used for training the event relation recognition model according to the labeled event sentence.
To achieve the above object, according to another aspect of the present application, there is also provided a computer apparatus. As shown in fig. 10, the computer device comprises a memory, a processor, a communication interface and a communication bus, wherein a computer program that can be run on the processor is stored in the memory, and the steps of the method of the embodiment are realized when the processor executes the computer program.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as the corresponding program units in the above-described method embodiments of the present invention. The processor executes various functional applications of the processor and the processing of the work data by executing the non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more units are stored in the memory and when executed by the processor perform the method of the above embodiments.
The specific details of the computer device may be understood by referring to the corresponding related descriptions and effects in the above embodiments, and are not described herein again.
In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the above event relationship map construction method. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An event relation graph construction method is characterized by comprising the following steps:
obtaining event related data, wherein the event related data comprises: at least one of legal documents, enterprise announcements, enterprise public opinions and news data;
identifying an event sentence in the event-related data according to a trained event sentence identification model, wherein the event sentence identification model is obtained by training by taking event-related data marked with the event sentence as a training sample;
identifying events and/or relations in the event sentences according to a trained event relation identification model, and generating event relation pairs according to the events and/or relations, wherein the event relation identification model is obtained by training by taking the event sentences marked with the events and/or relations as training samples;
generating an event relation map according to the event relation pair;
the identifying the event sentence in the event-related data according to the trained event sentence identification model specifically includes:
and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: the training sample of the event sentence recognition model also marks the type of each event sentence;
the identifying an event and/or a relationship in the event sentence according to the trained event relationship identifying model, and generating an event relationship pair according to the event and/or the relationship specifically include:
identifying events and relations in the cause-effect order-bearing event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations;
and identifying the events in the cause-order event sentences and the fruit-bearing event sentences according to the event relation identification model, matching the events in the cause-order event sentences with the events in the fruit-bearing event sentences to generate event relation pairs, wherein training samples of the event relation identification model are different types of event sentences marked with events and/or relations, the events and relations are marked for the cause-effect order-bearing event sentences, and the events are marked for the cause-order and fruit-bearing event sentences.
2. The method for constructing an event relationship graph according to claim 1, wherein the generating an event relationship graph according to the event relationship pair specifically includes:
acquiring standardized names of various events and standardized names of various relations, wherein the standardized names of various events are obtained by clustering the full events through a clustering model, and the standardized names of various relations are obtained by clustering the full relations through the clustering model;
respectively converting the events and the relations in the event relation pair into corresponding standardized names;
and generating an event relation map according to the event relation pairs after the standardized names are converted.
3. The event relationship graph construction method according to claim 1, further comprising:
marking out event sentences and the types of the event sentences in event related data used for model training;
and training the event sentence recognition model according to the labeled event related data.
4. The event relationship graph construction method according to claim 1, further comprising:
marking events and/or relations in event sentences used for model training, wherein events and relations are marked in causal sequence bearing event sentences, and events are marked in causal sequence bearing event sentences and fruit bearing event sentences;
and training the event relation recognition model according to the labeled event sentence.
5. An event relationship map construction apparatus, comprising:
an event related data acquiring unit configured to acquire event related data, wherein the event related data includes: at least one of legal documents, enterprise announcements, enterprise public opinions and news data;
the event sentence recognition unit is used for recognizing an event sentence in the event related data according to a trained event sentence recognition model, wherein the event sentence recognition model is obtained by training by taking the event related data marked with the event sentence as a training sample;
the event and relationship recognition unit is used for recognizing events and/or relationships in the event sentences according to a trained event relationship recognition model and generating event relationship pairs according to the events and/or relationships, wherein the event relationship recognition model is obtained by training by taking the event sentences marked with the events and/or relationships as training samples;
the relation map construction unit is used for generating an event relation map according to the event relation pair;
the event sentence identification unit is specifically configured to: and identifying the event sentences and the types of the event sentences in the event related data according to the trained event sentence identification model, wherein the types of the event sentences comprise: the training sample of the event sentence recognition model also marks the type of each event sentence;
the event and relationship identification unit is specifically configured to: identifying events and relations in the cause-effect order-bearing event sentence according to the event relation identification model, and generating an event relation pair according to the identified events and relations; and identifying the events in the cause-order event sentences and the fruit-bearing event sentences according to the event relation identification model, matching the events in the cause-order event sentences with the events in the fruit-bearing event sentences to generate event relation pairs, wherein training samples of the event relation identification model are different types of event sentences marked with events and/or relations, the events and relations are marked for the cause-effect order-bearing event sentences, and the events are marked for the cause-order and fruit-bearing event sentences.
6. The event relationship map construction apparatus according to claim 5, wherein the relationship map construction unit comprises:
the standardized name acquisition module is used for acquiring standardized names of various events and standardized names of various relations, wherein the standardized names of various events are obtained by clustering the full-scale events through a clustering model, and the standardized names of various relations are obtained by clustering the full-scale relations through the clustering model;
the standardized name conversion module is used for respectively converting the events and the relations in the event relation pair into corresponding standardized names;
and the relation map generation module is used for generating an event relation map according to the event relation pair converted from the standardized name.
7. The event relationship map construction apparatus according to claim 5, further comprising:
the first text labeling unit is used for labeling the event sentences and the types of the event sentences in the event related data used for model training;
and the event sentence recognition model training unit is used for training the event sentence recognition model according to the labeled event related data.
8. The event relationship map construction apparatus according to claim 5, further comprising:
the second text labeling unit is used for labeling events and/or relations in the event sentences used for model training, wherein the events and relations are labeled for causal sequence bearing event sentences, and the events are labeled for causal sequence bearing event sentences and causal sequence bearing event sentences;
and the event relation recognition model training unit is used for training the event relation recognition model according to the labeled event sentence.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 4 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed in a computer processor, carries out the steps of the method according to any one of claims 1 to 4.
CN201910659446.8A 2019-07-22 2019-07-22 Method and device for constructing event relation graph Active CN110377759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910659446.8A CN110377759B (en) 2019-07-22 2019-07-22 Method and device for constructing event relation graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910659446.8A CN110377759B (en) 2019-07-22 2019-07-22 Method and device for constructing event relation graph

Publications (2)

Publication Number Publication Date
CN110377759A CN110377759A (en) 2019-10-25
CN110377759B true CN110377759B (en) 2022-02-11

Family

ID=68254534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910659446.8A Active CN110377759B (en) 2019-07-22 2019-07-22 Method and device for constructing event relation graph

Country Status (1)

Country Link
CN (1) CN110377759B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781317B (en) * 2019-10-29 2022-03-01 北京明略软件***有限公司 Method and device for constructing event map and electronic equipment
CN110807104B (en) * 2019-11-08 2023-04-14 上海明胜品智人工智能科技有限公司 Method and device for determining abnormal information, storage medium and electronic device
CN111709244B (en) * 2019-11-20 2023-09-26 中共南通市委政法委员会 Deep learning method for identifying cause and effect relationship of contradictory dispute
CN110968702B (en) * 2019-11-29 2023-05-09 北京明略软件***有限公司 Method and device for extracting rational relation
CN111400456B (en) * 2020-03-20 2023-09-26 北京百度网讯科技有限公司 Information recommendation method and device
CN111522906B (en) * 2020-04-22 2023-03-28 电子科技大学 Financial event main body extraction method based on question-answering mode
CN111797230B (en) * 2020-06-11 2021-07-13 南京擎盾信息科技有限公司 Legal three-layer theory automatic reasoning method and device and electronic equipment
CN111753102A (en) * 2020-07-02 2020-10-09 武汉卓尔数字传媒科技有限公司 Public opinion analysis method and device based on affair map and electronic equipment
CN112015871B (en) * 2020-10-30 2021-01-01 中南大学 Automatic character relation labeling method based on event set remote supervision
CN112837148B (en) * 2021-03-03 2023-06-23 中央财经大学 Risk logic relationship quantitative analysis method integrating domain knowledge
CN113344405B (en) * 2021-06-18 2024-05-17 北京百度网讯科技有限公司 Method, device, equipment, medium and product for generating information based on knowledge graph
CN113590824A (en) * 2021-07-30 2021-11-02 平安科技(深圳)有限公司 Method and device for constructing causal graph and related equipment
CN113342943B (en) * 2021-08-05 2021-12-07 北京明略软件***有限公司 Training method and device for classification model
CN114064937A (en) * 2022-01-14 2022-02-18 云孚科技(北京)有限公司 Method and system for automatically constructing case map
CN116662577B (en) * 2023-08-02 2023-11-03 北京网智天元大数据科技有限公司 Knowledge graph-based large language model training method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997399A (en) * 2017-05-24 2017-08-01 海南大学 A kind of classification question answering system design method that framework is associated based on data collection of illustrative plates, Information Atlas, knowledge mapping and wisdom collection of illustrative plates
CN108304386A (en) * 2018-03-05 2018-07-20 上海思贤信息技术股份有限公司 A kind of logic-based rule infers the method and device of legal documents court verdict
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A kind of construction method, device and the equipment of causality knowledge base
CN109508473A (en) * 2018-09-29 2019-03-22 北京国双科技有限公司 Method, oil well fault reason map method for building up and relevant apparatus are determined based on the oil well fault of reason map

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150161549A1 (en) * 2013-12-05 2015-06-11 Adobe Systems Incorporated Predicting outcomes of a modeled system using dynamic features adjustment
US10929100B2 (en) * 2017-09-08 2021-02-23 EMC IP Holding Company LLC Mitigating causality discrepancies caused by stale versioning
CN108052576B (en) * 2017-12-08 2021-04-23 国家计算机网络与信息安全管理中心 Method and system for constructing affair knowledge graph
CN108875051B (en) * 2018-06-28 2020-04-28 中译语通科技股份有限公司 Automatic knowledge graph construction method and system for massive unstructured texts

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997399A (en) * 2017-05-24 2017-08-01 海南大学 A kind of classification question answering system design method that framework is associated based on data collection of illustrative plates, Information Atlas, knowledge mapping and wisdom collection of illustrative plates
CN108304386A (en) * 2018-03-05 2018-07-20 上海思贤信息技术股份有限公司 A kind of logic-based rule infers the method and device of legal documents court verdict
CN109508473A (en) * 2018-09-29 2019-03-22 北京国双科技有限公司 Method, oil well fault reason map method for building up and relevant apparatus are determined based on the oil well fault of reason map
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A kind of construction method, device and the equipment of causality knowledge base

Also Published As

Publication number Publication date
CN110377759A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110377759B (en) Method and device for constructing event relation graph
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN109815336B (en) Text aggregation method and system
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN108228704A (en) Identify method and device, the equipment of Risk Content
CN112070138B (en) Construction method of multi-label mixed classification model, news classification method and system
CN109684476B (en) Text classification method, text classification device and terminal equipment
CN108960574A (en) Quality determination method, device, server and the storage medium of question and answer
CN111126067B (en) Entity relationship extraction method and device
CN111723569A (en) Event extraction method and device and computer readable storage medium
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN111309910A (en) Text information mining method and device
CN111709225B (en) Event causal relationship discriminating method, device and computer readable storage medium
CN110209772B (en) Text processing method, device and equipment and readable storage medium
CN115600605A (en) Method, system, equipment and storage medium for jointly extracting Chinese entity relationship
CN116680386A (en) Answer prediction method and device based on multi-round dialogue, equipment and storage medium
CN115545558A (en) Method, device, machine readable medium and equipment for obtaining risk identification model
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN111382243A (en) Text category matching method, text category matching device and terminal
CN113869049B (en) Fact extraction method and device with legal attribute based on legal consultation problem
CN114491076B (en) Data enhancement method, device, equipment and medium based on domain knowledge graph
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN115048523A (en) Text classification method, device, equipment and storage medium
US20220318230A1 (en) Text to question-answer model system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant