CN111414747B - Time knowledge fuzzy measurement method and system based on weak supervised learning - Google Patents

Time knowledge fuzzy measurement method and system based on weak supervised learning Download PDF

Info

Publication number
CN111414747B
CN111414747B CN202010118531.6A CN202010118531A CN111414747B CN 111414747 B CN111414747 B CN 111414747B CN 202010118531 A CN202010118531 A CN 202010118531A CN 111414747 B CN111414747 B CN 111414747B
Authority
CN
China
Prior art keywords
narrative
time
sentence
knowledge
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010118531.6A
Other languages
Chinese (zh)
Other versions
CN111414747A (en
Inventor
彭德光
孙健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Zhaoguang Technology Co ltd
Original Assignee
Chongqing Zhaoguang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Zhaoguang Technology Co ltd filed Critical Chongqing Zhaoguang Technology Co ltd
Priority to CN202010118531.6A priority Critical patent/CN111414747B/en
Publication of CN111414747A publication Critical patent/CN111414747A/en
Application granted granted Critical
Publication of CN111414747B publication Critical patent/CN111414747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a time knowledge fuzzy measure method and a system based on weak supervision learning, comprising the following steps: based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge; performing deep learning training on the neural network according to the samples containing the time knowledge; and obtaining the time knowledge in the sentences in the plot event through the neural network after the deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.

Description

Time knowledge fuzzy measurement method and system based on weak supervised learning
Technical Field
The application relates to the technical field of natural language, in particular to a time knowledge fuzzy measurement method and a system based on weak supervision learning.
Background
The narrative text has dual timeliness, which can extract rich time knowledge from the narrative paragraphs, wherein the dual timeliness indicates that the narrative story usually describes a series of events in time sequence, and the current natural language cannot acquire rich time 'pre/post' event knowledge between sentences in the narrative story, so we propose a fuzzy measure method and system for time knowledge based on weak supervised learning to acquire rich time 'pre/post' event knowledge between sentences in the narrative story.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present application is to provide a method and a system for fuzzy measurement of time knowledge based on weak supervised learning, which are used for solving the technical problems existing in the prior art.
To achieve the above and other related objects, the present application provides a time knowledge fuzzy measurement method based on weak supervised learning, including:
based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge;
performing deep learning training on the neural network according to the sample containing the time knowledge;
and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event.
Optionally, the custom narrative identification rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) i ,S j );
According to the trained R (S i ,S j ) Computing ambiguityThe degree is as follows:wherein k is a positive integer, i < m, j < m.
Optionally, the weak supervision includes:
acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier;
and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears.
Optionally, the pre-training of the statistical classifier includes:
determining narrative paragraphs and non-narrative paragraphs in the seed narrative text;
and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples.
Optionally, if the custom narrative identification rule is a text rule;
acquiring POS labels, analysis trees, named entities and co-instruction chains from the seed narrative text; the seed narrative text includes news, novels, blogs;
training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain.
Optionally, if the custom narrative identification rule is a grammar rule, the grammar rule at least includes a grammar structure, a title sentence pattern, a text sentence pattern, and sentence characters;
the grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases.
Optionally, if the custom narrative identification rule is a role rule, dividing the number of references to the event chain by the number of sentences in the narrative paragraph, and calculating the normalized length of the event chain.
Optionally, the top n event chain lengths in the event are obtained, candidate events are ranked based on the deep-learned narrative recognition rules and the event chain lengths, and the probability of occurrence at a specific time is obtained.
The application also provides a time knowledge fuzzy measurement system based on weak supervised learning, which comprises the following steps:
the sample module is used for generating a sample containing time knowledge based on the weak supervision learning custom narrative recognition rule;
the learning training module is used for performing deep learning training on the neural network according to the sample containing the time knowledge;
and the calculation module is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and calculating the ambiguity of the time knowledge in the sentences in the plot event.
Optionally, the custom narrative identification rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) i ,S j );
According to the trained R (S i ,S j ) Calculating ambiguity, which is:wherein k is a positive integer, i < m, j < m.
As described above, the application provides a time knowledge fuzzy measurement method and a system based on weak supervised learning, which have the following beneficial effects: based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge; performing deep learning training on the neural network according to the sample containing the time knowledge; and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.
Drawings
FIG. 1 is a schematic flow chart of a time knowledge fuzzy measurement method based on weak supervised learning;
fig. 2 is a schematic diagram of a hardware structure of a time knowledge fuzzy measurement system based on weak supervised learning.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
Please refer to fig. 1 and 2. It should be noted that, the illustrations provided in the present embodiment merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex. The structures, proportions, sizes, etc. shown in the drawings attached hereto are for illustration purposes only and are not intended to limit the scope of the application, which is defined by the claims, but rather by the claims. Also, the terms such as "upper," "lower," "left," "right," "middle," and "a" and the like recited in the present specification are merely for descriptive purposes and are not intended to limit the scope of the application, but are intended to provide relative positional changes or modifications without materially altering the technical context in which the application may be practiced.
Referring to fig. 1, the present embodiment provides a time knowledge fuzzy measurement method based on weak supervised learning, which includes the following steps:
s100, generating a sample containing time knowledge based on a weak supervision learning self-defined narrative recognition rule;
s200, deep learning training is carried out on the neural network according to the sample containing the time knowledge;
s300, obtaining time knowledge in sentences in the scenario event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the scenario event.
The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.
Specifically, the custom narrative identification rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) i ,S j );
According to the trained R (S i ,S j ) Calculating ambiguity, which is:wherein k is a positive integer, i < m, j < m.
In an exemplary embodiment, the weak supervision includes: acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier; and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears. Specifically, the weakly supervised approach is designed to capture the key elements of the narrative in every two phases:
in the first stage, we identify the first narrative passage meeting the strict rules and narrative key principles; i.e. the first narrative paragraph satisfying the text rules, grammar rules, role rules is identified.
In the second stage we train a statistical classifier using the initial identified seed narrative text and a set of soft features to capture the same key principles and other textual means of the narrative. The classifier is then used again to identify a new narrative from the original text. The newly discovered story will be used to supplement the seed story, directing the learning process to iterate until insufficient new stories are discovered.
In an exemplary embodiment, the pre-training of the statistical classifier includes: determining narrative paragraphs and non-narrative paragraphs in the seed narrative text; and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples. Specifically, using the seed narrative paragraph determined in the first phase as a positive example, we train a statistical classifier to continue to identify further narrative paragraphs that may not meet a particular rule. We also have to prepare counterexamples. The opposite examples are paragraphs that are unlikely to be narratives, do not feature a plot or a lead angle, but are otherwise similar to a seed narrative. We choose the maximum entropy as the classifier.
In an exemplary embodiment, if the custom narrative identification rule is a text rule, acquiring a POS tag, an parse tree, a named entity, and a co-fingering chain from the seed narrative text; the seed narrative text includes news, novels, blogs; training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain. In particular, the weakly supervised approach may be applied to different text sources to identify a narrative based on the principle that all narrative is shared. We consider three types of text: news articles, novels, blogs; stanford Core NLP tools are applied to these three text corpora to obtain POS tags, parse trees, named entities, co-fingerchains, etc. To overcome semantic drift in autonomous learning, the initial selection confidence score generated by the statistical classifier is set to 0.5 and raised by 0.05 after each iteration.
In an exemplary embodiment, if the custom narrative identification rule is a grammar rule, the grammar rule includes at least a grammar structure, a heading sentence pattern, a text sentence pattern, and sentence characters. The grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases. Specifically, grammar rules for identifying storyline events, guided by previous narrative studies; context-free grammar generation rules are used to identify sentences that describe events in a real-state grammar structure. Specifically, three sets of grammar rules are used to specify the overall grammar structure of a sentence, as follows: (1) a sentence has a basic active language structure "S-! "or a more complex sentence structure derived from the basic structure of a conjunctive (CC), an adverb phrase (ADVP) or a Preposition Phrase (PP); (2) the heading of the adverb must be of past formula; (3) the subject of the sentence is intended to represent a character.
In an exemplary embodiment, if the custom narrative identification rule is a character rule, the number of references to an event chain is divided by the number of sentences in the narrative paragraph to calculate the normalized length of the event chain. Specifically, we specify a rule that requires a paragraph of narrative to have a principal angle. The normalized length of this event chain is then calculated by dividing the number of event chain references by the number of sentences in the paragraph. In the embodiment of the present application, we require the normalized length of the longest physical chain to be 0.4, which means that 40% or more of sentences in the description refer to one character. More specifically, the top n event chain lengths in the event are obtained, candidate events are ranked based on the deep-learned narrative recognition rules and the event chain lengths, and the probability of occurrence at a specific time is obtained. In particular, we apply statistical indicators based on point-to-mutual information (PMI) to measure the intensity of event time relationships in order to identify common sense not specific to any particular story. By learning pairs of events and longer chains of events, events are fully ordered in a "front-to-back" relationship in time. In particular, by exploiting the dual temporal features of a narrative we consider only pairs of events and a longer chain of n events that have occurred as a segment in at least one sequence of events extracted from a paragraph of the narrative; the candidate event pairs are then ranked based on two factors, the degree of association between the two events, and the frequency of occurrence in a particular temporal order. And calculating the time sequence of the related events by adopting a time knowledge fuzzy measure method.
The method is based on a weak supervision learning self-defined narrative recognition rule, and a sample containing time knowledge is generated; performing deep learning training on the neural network according to the sample containing the time knowledge; and obtaining time knowledge in sentences in the plot event through the neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.
As shown in fig. 2, the present application further provides a time knowledge fuzzy measurement system based on weak supervised learning, including:
the sample module M10 is used for generating a sample containing time knowledge based on the weak supervision learning custom narrative recognition rule;
the learning training module M20 is used for performing deep learning training on the neural network according to the sample containing the time knowledge;
the computing module M30 is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and computing the ambiguity of the time knowledge in the sentences in the plot event.
Optionally, the custom narrative identification rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) i ,S j );
According to the trained R (S i ,S j ) Calculating ambiguity, which is:wherein k is a positive integer, i < m, j < m.
In an exemplary embodiment, the weak supervision includes: acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier; and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears. Specifically, the weakly supervised approach is designed to capture the key elements of the narrative in every two phases:
in the first stage, we identify the first narrative passage meeting the strict rules and narrative key principles; i.e. the first narrative paragraph satisfying the text rules, grammar rules, role rules is identified.
In the second stage we train a statistical classifier using the initial identified seed narrative text and a set of soft features to capture the same key principles and other textual means of the narrative. The classifier is then used again to identify a new narrative from the original text. The newly discovered story will be used to supplement the seed story, directing the learning process to iterate until insufficient new stories are discovered.
In an exemplary embodiment, the pre-training of the statistical classifier includes: determining narrative paragraphs and non-narrative paragraphs in the seed narrative text; and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples. Specifically, using the seed narrative paragraph determined in the first phase as a positive example, we train a statistical classifier to continue to identify further narrative paragraphs that may not meet a particular rule. We also have to prepare counterexamples. The opposite examples are paragraphs that are unlikely to be narratives, do not feature a plot or a lead angle, but are otherwise similar to a seed narrative. We choose the maximum entropy as the classifier.
In an exemplary embodiment, if the custom narrative identification rule is a text rule, acquiring a POS tag, an parse tree, a named entity, and a co-fingering chain from the seed narrative text; the seed narrative text includes news, novels, blogs; training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain. In particular, the weakly supervised approach may be applied to different text sources to identify a narrative based on the principle that all narrative is shared. We consider three types of text: news articles, novels, blogs; stanford Core NLP tools are applied to these three text corpora to obtain POS tags, parse trees, named entities, co-fingerchains, etc. To overcome semantic drift in autonomous learning, the initial selection confidence score generated by the statistical classifier is set to 0.5 and raised by 0.05 after each iteration.
In an exemplary embodiment, if the custom narrative identification rule is a grammar rule, the grammar rule includes at least a grammar structure, a heading sentence pattern, a text sentence pattern, and sentence characters. The grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases. Specifically, grammar rules for identifying storyline events, guided by previous narrative studies; context-free grammar generation rules are used to identify sentences that describe events in a real-state grammar structure. Specifically, three sets of grammar rules are used to specify the overall grammar structure of a sentence, as follows: (1) a sentence has a basic active language structure "S-! "or a more complex sentence structure derived from the basic structure of a conjunctive (CC), an adverb phrase (ADVP) or a Preposition Phrase (PP); (2) the heading of the adverb must be of past formula; (3) the subject of the sentence is intended to represent a character.
In an exemplary embodiment, if the custom narrative identification rule is a character rule, the number of references to an event chain is divided by the number of sentences in the narrative paragraph to calculate the normalized length of the event chain. Specifically, we specify a rule that requires a paragraph of narrative to have a principal angle. The normalized length of this event chain is then calculated by dividing the number of event chain references by the number of sentences in the paragraph. In the embodiment of the present application, we require the normalized length of the longest physical chain to be 0.4, which means that 40% or more of sentences in the description refer to one character. More specifically, the top n event chain lengths in the event are obtained, candidate events are ranked based on the deep-learned narrative recognition rules and the event chain lengths, and the probability of occurrence at a specific time is obtained. In particular, we apply statistical indicators based on point-to-mutual information (PMI) to measure the intensity of event time relationships in order to identify common sense not specific to any particular story. By learning pairs of events and longer chains of events, events are fully ordered in a "front-to-back" relationship in time. In particular, by exploiting the dual temporal features of a narrative we consider only pairs of events and a longer chain of n events that have occurred as a segment in at least one sequence of events extracted from a paragraph of the narrative; the candidate event pairs are then ranked based on two factors, the degree of association between the two events, and the frequency of occurrence in a particular temporal order. And calculating the time sequence of the related events by adopting a time knowledge fuzzy measure method.
The system comprises a sample module, a sampling module and a processing module, wherein the sample module is used for generating a sample containing time knowledge based on a weak supervision learning self-defined narrative recognition rule; the learning training module is used for performing deep learning training on the neural network according to the sample containing the time knowledge; and the calculation module is used for acquiring the time knowledge in the sentences in the plot event through the neural network after the deep learning training and calculating the ambiguity of the time knowledge in the sentences in the plot event. The application establishes a weak supervision method, which can identify and extract the time knowledge in the narrative paragraphs from a large text corpus, and realizes the calculation and sequencing of the time knowledge in the narrative paragraphs by means of the method, and the time-space relationship classification can be improved by the time knowledge, so that the performance in the aspect of descriptive interactive service tasks is improved.
In summary, the present application effectively overcomes the disadvantages of the prior art and has high industrial utility value.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (8)

1. The time knowledge fuzzy measurement method based on weak supervision learning is characterized by comprising the following steps of:
based on the weak supervision learning self-defined narrative recognition rules, generating a sample containing time knowledge;
performing deep learning training on the neural network according to the sample containing the time knowledge;
acquiring time knowledge in sentences in a plot event through a neural network after deep learning training, and calculating the ambiguity of the time knowledge in the sentences in the plot event; comprising the following steps: the custom narrative recognition rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) The constructed samples are substituted into a deep neural network to perform learning training, and R (S) i ,S j );
According to the trained R (S i ,S j ) Calculating ambiguity, which is:wherein k is a positive integer, i<m,j<m。
2. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein the weak supervision comprises:
acquiring a seed narrative text, and acquiring a new narrative from the seed narrative text through a pre-trained statistical classifier;
and supplementing the new narrative into the seed narrative text, and guiding the iteration of the statistical classifier learning process until no new narrative appears.
3. The weak supervised learning based temporal knowledge fuzzy metric method of claim 2, wherein: the pre-training of the statistical classifier includes:
determining narrative paragraphs and non-narrative paragraphs in the seed narrative text;
and training the statistical classifier by taking the narrative paragraphs as positive examples and the non-narrative paragraphs as negative examples through the positive examples and the negative examples.
4. A time knowledge fuzzy measurement method based on weak supervised learning as set forth in claim 2 or 3, characterized in that: if the custom narrative identification rule is a text rule;
acquiring POS labels, analysis trees, named entities and co-instruction chains from the seed narrative text; the seed narrative text includes news, novels, blogs;
training the statistical classifier from a first confidence score to a second confidence score according to the POS label, the analysis tree, the named entity and the public finger chain.
5. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein: if the custom narrative recognition rule is a grammar rule, the grammar rule at least comprises a grammar structure, a title sentence pattern, a text sentence pattern and sentence characters;
the grammar structure comprises grammar structures formed by deriving basic structures of conjunctions, adverb phrases and preposition phrases.
6. The weak supervised learning based temporal knowledge fuzzy metric method of claim 1, wherein: if the custom narrative identification rule is a role rule, dividing the number of the event chain references by the number of sentences in the narrative paragraph, and calculating the standardized length of the event chain.
7. The weak supervised learning based temporal knowledge fuzzy metric method of claim 6, wherein: and acquiring the top n event chain lengths in the event, and ranking the candidate events based on the deep-learning narrative recognition rule and the event chain lengths to acquire the probability of occurrence at a specific time.
8. A weak supervised learning-based time knowledge fuzzy measure system, comprising:
the sample module is used for generating a sample containing time knowledge based on the weak supervision learning custom narrative recognition rule;
the learning training module is used for performing deep learning training on the neural network according to the sample containing the time knowledge;
the calculation module is used for acquiring time knowledge in sentences in the scenario event through the neural network after deep learning training and calculating the ambiguity of the time knowledge in the sentences in the scenario event; comprising the following steps: the custom narrative recognition rule is R n Sentences in the plot event are S m Wherein n, m is a positive integer;
in a certain custom narrative recognition rule, a sentence S is obtained according to the syntactic and statistical rules i Time knowledge of (a) and another sentence S j A comparison relation R between the time knowledge in the past;
if R (S) i ,S j ) =1, then the certain sentence S i The physical time of occurrence is earlier than the other sentence S j The physical time of occurrence, i.e. T (S i )>T(S j );
If R (S) i ,S j ) =0, then the certain sentence S i Physical time of occurrence and the other sentence S j The physical time of occurrence is the same, i.e. T (S i )>T(S j );
If R (S) i ,S j ) = -1, then the certain sentence S i The physical time of occurrence is later than the other sentence S j The physical time of occurrence, i.e. T (S i )<T(S j );
R (S) i ,S j ) Substitution of the formed samples into a deep neural networkLearning training is performed to obtain trained R (S i ,S j );
According to the trained R (S i ,S j ) Calculating ambiguity, which is:wherein k is a positive integer, i<m,j<m。
CN202010118531.6A 2020-02-26 2020-02-26 Time knowledge fuzzy measurement method and system based on weak supervised learning Active CN111414747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010118531.6A CN111414747B (en) 2020-02-26 2020-02-26 Time knowledge fuzzy measurement method and system based on weak supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010118531.6A CN111414747B (en) 2020-02-26 2020-02-26 Time knowledge fuzzy measurement method and system based on weak supervised learning

Publications (2)

Publication Number Publication Date
CN111414747A CN111414747A (en) 2020-07-14
CN111414747B true CN111414747B (en) 2023-08-18

Family

ID=71492831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010118531.6A Active CN111414747B (en) 2020-02-26 2020-02-26 Time knowledge fuzzy measurement method and system based on weak supervised learning

Country Status (1)

Country Link
CN (1) CN111414747B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214992A (en) * 2020-10-14 2021-01-12 哈尔滨福涛科技有限责任公司 Deep learning and rule combination based narrative structure analysis method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011100862A1 (en) * 2010-02-22 2011-08-25 Yahoo! Inc. Bootstrapping text classifiers by language adaptation
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus
CN108959305A (en) * 2017-05-22 2018-12-07 北京国信宏数科技有限公司 A kind of event extraction method and system based on internet big data
EP3447663A1 (en) * 2017-08-23 2019-02-27 Tata Consultancy Services Limited System and method for event profiling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011100862A1 (en) * 2010-02-22 2011-08-25 Yahoo! Inc. Bootstrapping text classifiers by language adaptation
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus
CN108959305A (en) * 2017-05-22 2018-12-07 北京国信宏数科技有限公司 A kind of event extraction method and system based on internet big data
EP3447663A1 (en) * 2017-08-23 2019-02-27 Tata Consultancy Services Limited System and method for event profiling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Dongfeng Cai.A semi-supervied learning method for names of traditional Chinese prescriptions and drugs recognition.《IEEEXPlore》.2012,全文. *

Also Published As

Publication number Publication date
CN111414747A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
Hermann et al. Semantic frame identification with distributed word representations
US10289952B2 (en) Semantic frame identification with distributed word representations
WO2021068339A1 (en) Text classification method and device, and computer readable storage medium
Sharma et al. Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
Helwe et al. A semi-supervised BERT approach for Arabic named entity recognition
Ren et al. Detecting the scope of negation and speculation in biomedical texts by using recursive neural network
CN108763192B (en) Entity relation extraction method and device for text processing
Tlili-Guiassa Hybrid method for tagging Arabic text
Shah et al. Study of named entity recognition for indian languages
Wosiak Automated extraction of information from Polish resume documents in the IT recruitment process
CN111414747B (en) Time knowledge fuzzy measurement method and system based on weak supervised learning
Anandika et al. Named entity recognition in Odia language: a rule-based approach
Yousif et al. Part of speech tagger for Arabic text based support vector machines: A review
CN111368532B (en) Topic word embedding disambiguation method and system based on LDA
Ananth et al. Grammatical tagging for the Kannada text documents using hybrid bidirectional long-short term memory model
Mehta et al. Hindi text classification: A review
Han et al. Lexicalized neural unsupervised dependency parsing
Srinivasagan et al. An automated system for tamil named entity recognition using hybrid approach
Fanoon et al. Part of speech tagging for Twitter conversations using Conditional Random Fields model
Francis A comprehensive survey on parts of speech tagging approaches in dravidian languages
Alfaidi et al. Exploring the performance of farasa and CAMeL taggers for arabic dialect tweets.
Mahafdah et al. Arabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination.
Biswas et al. A hybrid oriya named entity recognition system: Harnessing the power of rule
Fattoh et al. Semantic question generation using artificial immunity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 400000 6-1, 6-2, 6-3, 6-4, building 7, No. 50, Shuangxing Avenue, Biquan street, Bishan District, Chongqing

Applicant after: CHONGQING ZHAOGUANG TECHNOLOGY CO.,LTD.

Address before: 400000 2-2-1, 109 Fengtian Avenue, tianxingqiao, Shapingba District, Chongqing

Applicant before: CHONGQING ZHAOGUANG TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant