CN111414747B

CN111414747B - Time knowledge fuzzy measurement method and system based on weak supervised learning

Info

Publication number: CN111414747B
Application number: CN202010118531.6A
Authority: CN
Inventors: 彭德光; 孙健
Original assignee: Chongqing Zhaoguang Technology Co ltd
Current assignee: Chongqing Zhaoguang Technology Co ltd
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2023-08-18
Anticipated expiration: 2040-02-26
Also published as: CN111414747A

Abstract

The application provides a time knowledge fuzzy measure method and a system based on weak supervision learning, comprising the following steps: based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge; performing deep learning training on the neural network according to the samples containing the time knowledge; and obtaining the time knowledge in the sentences in the plot event through the neural network after the deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

Description

Time knowledge fuzzy measurement method and system based on weak supervised learning

Technical Field

The application relates to the technical field of natural language, in particular to a time knowledge fuzzy measurement method and a system based on weak supervision learning.

Background

The narrative text has dual timeliness, which can extract rich time knowledge from the narrative paragraphs, wherein the dual timeliness indicates that the narrative story usually describes a series of events in time sequence, and the current natural language cannot acquire rich time 'pre/post' event knowledge between sentences in the narrative story, so we propose a fuzzy measure method and system for time knowledge based on weak supervised learning to acquire rich time 'pre/post' event knowledge between sentences in the narrative story.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present application is to provide a method and a system for fuzzy measurement of time knowledge based on weak supervised learning, which are used for solving the technical problems existing in the prior art.

To achieve the above and other related objects, the present application provides a time knowledge fuzzy measurement method based on weak supervised learning, including:

based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge;

performing deep learning training on the neural network according to the sample containing the time knowledge;

and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event.

Optionally, the custom narrative identification rule is R _n Sentences in the plot event are S _m Wherein n, m is a positive integer;

in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules _i Time knowledge of (a) and another sentence S _j A comparison relation R between the time knowledge in the past;

if R (S) _i ，S _j ) =1, then the certain sentence S _i The physical time of occurrence is earlier than the other sentence S _j The physical time of occurrence, i.e. T (S _i )>T(S _j )；

If R (S) _i ，S _j ) =0, then the certain sentence S _i Physical time of occurrence and the other sentence S _j The physical time of occurrence is the same, i.e. T (S _i )>T(S _j )；

If R (S) _i ，S _j ) = -1, then the certain sentence S _i The physical time of occurrence is later than the other sentence S _j The physical time of occurrence, i.e. T (S _i )＜T(S _j )；

R (S) _i ，S _j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) _i ，S _j )；

According to the trained R (S _i ，S _j ) Computing ambiguityThe degree is as follows:wherein k is a positive integer, i < m, j < m.

Optionally, the weak supervision includes:

acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier;

and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears.

Optionally, the pre-training of the statistical classifier includes:

determining narrative paragraphs and non-narrative paragraphs in the seed narrative text;

and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples.

Optionally, if the custom narrative identification rule is a text rule;

acquiring POS labels, analysis trees, named entities and co-instruction chains from the seed narrative text; the seed narrative text includes news, novels, blogs;

training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain.

Optionally, if the custom narrative identification rule is a grammar rule, the grammar rule at least includes a grammar structure, a title sentence pattern, a text sentence pattern, and sentence characters;

the grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases.

Optionally, if the custom narrative identification rule is a role rule, dividing the number of references to the event chain by the number of sentences in the narrative paragraph, and calculating the normalized length of the event chain.

Optionally, the top n event chain lengths in the event are obtained, candidate events are ranked based on the deep-learned narrative recognition rules and the event chain lengths, and the probability of occurrence at a specific time is obtained.

The application also provides a time knowledge fuzzy measurement system based on weak supervised learning, which comprises the following steps:

the sample module is used for generating a sample containing time knowledge based on the weak supervision learning custom narrative recognition rule;

the learning training module is used for performing deep learning training on the neural network according to the sample containing the time knowledge;

and the calculation module is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and calculating the ambiguity of the time knowledge in the sentences in the plot event.

According to the trained R (S _i ，S _j ) Calculating ambiguity, which is:wherein k is a positive integer, i < m, j < m.

As described above, the application provides a time knowledge fuzzy measurement method and a system based on weak supervised learning, which have the following beneficial effects: based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge; performing deep learning training on the neural network according to the sample containing the time knowledge; and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

Drawings

FIG. 1 is a schematic flow chart of a time knowledge fuzzy measurement method based on weak supervised learning;

fig. 2 is a schematic diagram of a hardware structure of a time knowledge fuzzy measurement system based on weak supervised learning.

Detailed Description

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

Please refer to fig. 1 and 2. It should be noted that, the illustrations provided in the present embodiment merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex. The structures, proportions, sizes, etc. shown in the drawings attached hereto are for illustration purposes only and are not intended to limit the scope of the application, which is defined by the claims, but rather by the claims. Also, the terms such as "upper," "lower," "left," "right," "middle," and "a" and the like recited in the present specification are merely for descriptive purposes and are not intended to limit the scope of the application, but are intended to provide relative positional changes or modifications without materially altering the technical context in which the application may be practiced.

Referring to fig. 1, the present embodiment provides a time knowledge fuzzy measurement method based on weak supervised learning, which includes the following steps:

s100, generating a sample containing time knowledge based on a weak supervision learning self-defined narrative recognition rule;

s200, deep learning training is carried out on the neural network according to the sample containing the time knowledge;

s300, obtaining time knowledge in sentences in the scenario event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the scenario event.

The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

Specifically, the custom narrative identification rule is R _n Sentences in the plot event are S _m Wherein n, m is a positive integer;

In an exemplary embodiment, the weak supervision includes: acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier; and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears. Specifically, the weakly supervised approach is designed to capture the key elements of the narrative in every two phases:

in the first stage, we identify the first narrative passage meeting the strict rules and narrative key principles; i.e. the first narrative paragraph satisfying the text rules, grammar rules, role rules is identified.

In the second stage we train a statistical classifier using the initial identified seed narrative text and a set of soft features to capture the same key principles and other textual means of the narrative. The classifier is then used again to identify a new narrative from the original text. The newly discovered story will be used to supplement the seed story, directing the learning process to iterate until insufficient new stories are discovered.

In an exemplary embodiment, the pre-training of the statistical classifier includes: determining narrative paragraphs and non-narrative paragraphs in the seed narrative text; and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples. Specifically, using the seed narrative paragraph determined in the first phase as a positive example, we train a statistical classifier to continue to identify further narrative paragraphs that may not meet a particular rule. We also have to prepare counterexamples. The opposite examples are paragraphs that are unlikely to be narratives, do not feature a plot or a lead angle, but are otherwise similar to a seed narrative. We choose the maximum entropy as the classifier.

In an exemplary embodiment, if the custom narrative identification rule is a text rule, acquiring a POS tag, an parse tree, a named entity, and a co-fingering chain from the seed narrative text; the seed narrative text includes news, novels, blogs; training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain. In particular, the weakly supervised approach may be applied to different text sources to identify a narrative based on the principle that all narrative is shared. We consider three types of text: news articles, novels, blogs; stanford Core NLP tools are applied to these three text corpora to obtain POS tags, parse trees, named entities, co-fingerchains, etc. To overcome semantic drift in autonomous learning, the initial selection confidence score generated by the statistical classifier is set to 0.5 and raised by 0.05 after each iteration.

In an exemplary embodiment, if the custom narrative identification rule is a grammar rule, the grammar rule includes at least a grammar structure, a heading sentence pattern, a text sentence pattern, and sentence characters. The grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases. Specifically, grammar rules for identifying storyline events, guided by previous narrative studies; context-free grammar generation rules are used to identify sentences that describe events in a real-state grammar structure. Specifically, three sets of grammar rules are used to specify the overall grammar structure of a sentence, as follows: (1) a sentence has a basic active language structure "S-! "or a more complex sentence structure derived from the basic structure of a conjunctive (CC), an adverb phrase (ADVP) or a Preposition Phrase (PP); (2) the heading of the adverb must be of past formula; (3) the subject of the sentence is intended to represent a character.

In an exemplary embodiment, if the custom narrative identification rule is a character rule, the number of references to an event chain is divided by the number of sentences in the narrative paragraph to calculate the normalized length of the event chain. Specifically, we specify a rule that requires a paragraph of narrative to have a principal angle. The normalized length of this event chain is then calculated by dividing the number of event chain references by the number of sentences in the paragraph. In the embodiment of the present application, we require the normalized length of the longest physical chain to be 0.4, which means that 40% or more of sentences in the description refer to one character. More specifically, the top n event chain lengths in the event are obtained, candidate events are ranked based on the deep-learned narrative recognition rules and the event chain lengths, and the probability of occurrence at a specific time is obtained. In particular, we apply statistical indicators based on point-to-mutual information (PMI) to measure the intensity of event time relationships in order to identify common sense not specific to any particular story. By learning pairs of events and longer chains of events, events are fully ordered in a "front-to-back" relationship in time. In particular, by exploiting the dual temporal features of a narrative we consider only pairs of events and a longer chain of n events that have occurred as a segment in at least one sequence of events extracted from a paragraph of the narrative; the candidate event pairs are then ranked based on two factors, the degree of association between the two events, and the frequency of occurrence in a particular temporal order. And calculating the time sequence of the related events by adopting a time knowledge fuzzy measure method.

The method is based on a weak supervision learning self-defined narrative recognition rule, and a sample containing time knowledge is generated; performing deep learning training on the neural network according to the sample containing the time knowledge; and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

As shown in fig. 2, the present application further provides a time knowledge fuzzy measurement system based on weak supervised learning, including:

the sample module M10 is used for generating a sample containing time knowledge based on the weak supervision learning custom narrative recognition rule;

the learning training module M20 is used for performing deep learning training on the neural network according to the sample containing the time knowledge;

the computing module M30 is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and computing the ambiguity of the time knowledge in the sentences in the plot event.

The system comprises a sample module, a sampling module and a processing module, wherein the sample module is used for generating a sample containing time knowledge based on a weak supervision learning self-defined narrative recognition rule; the learning training module is used for performing deep learning training on the neural network according to the sample containing the time knowledge; and the calculation module is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

In summary, the present application effectively overcomes the disadvantages of the prior art and has high industrial utility value.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. The time knowledge fuzzy measurement method based on weak supervision learning is characterized by comprising the following steps of:

acquiring time knowledge in sentences in a plot event through a neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event; comprising the following steps: the custom narrative recognition rule is R _n Sentences in the plot event are S _m Wherein n, m is a positive integer;

If R (S) _i ，S _j ) = -1, then the certain sentence S _i The physical time of occurrence is later than the other sentence S _j The physical time of occurrence, i.e. T (S _i )<T(S _j )；

According to the trained R (S _i ，S _j ) Calculating ambiguity, which is:wherein k is a positive integer, i<m，j<m。

2. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein the weak supervision comprises:

3. The weak supervised learning based temporal knowledge fuzzy metric method of claim 2, wherein: the pre-training of the statistical classifier includes:

4. A time knowledge fuzzy measurement method based on weak supervised learning as set forth in claim 2 or 3, characterized in that: if the custom narrative identification rule is a text rule;

5. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein: if the custom narrative recognition rule is a grammar rule, the grammar rule at least comprises a grammar structure, a title sentence pattern, a text sentence pattern and sentence characters;

6. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein: if the custom narrative identification rule is a role rule, dividing the number of the event chain references by the number of sentences in the narrative paragraph, and calculating the standardized length of the event chain.

7. The weak supervised learning based temporal knowledge fuzzy metric method of claim 6, wherein: and acquiring the top n event chain lengths in the event, and ranking the candidate events based on the deep-learning narrative recognition rule and the event chain lengths to acquire the probability of occurrence at a specific time.

8. A weak supervised learning-based time knowledge fuzzy measure system, comprising:

the calculation module is used for acquiring time knowledge in sentences in the scenario event through the neural network after deep learning training and calculating the ambiguity of the time knowledge in the sentences in the scenario event; comprising the following steps: the custom narrative recognition rule is R _n Sentences in the plot event are S _m Wherein n, m is a positive integer;

R (S) _i ，S _j ) Substitution of the formed samples into a deep neural networkLearning training is performed to obtain trained R (S _i ，S _j )；