CN113743125A - Text continuity analysis method and device - Google Patents

Text continuity analysis method and device Download PDF

Info

Publication number
CN113743125A
CN113743125A CN202111042313.XA CN202111042313A CN113743125A CN 113743125 A CN113743125 A CN 113743125A CN 202111042313 A CN202111042313 A CN 202111042313A CN 113743125 A CN113743125 A CN 113743125A
Authority
CN
China
Prior art keywords
text
semantic
analysis
target analysis
consistency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111042313.XA
Other languages
Chinese (zh)
Inventor
彼得·布尔贡耶
黄惠燕
约翰内斯·阿德里亚努斯·玛丽亚·范·萨斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaoyang Intelligent Technology Co ltd
Original Assignee
Guangzhou Xiaoyang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaoyang Intelligent Technology Co ltd filed Critical Guangzhou Xiaoyang Intelligent Technology Co ltd
Priority to CN202111042313.XA priority Critical patent/CN113743125A/en
Publication of CN113743125A publication Critical patent/CN113743125A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text continuity analysis method and a text continuity analysis device, wherein the method comprises the following steps: acquiring a target analysis text; the target analysis text comprises a plurality of text segments; determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text; determining a consistency parameter corresponding to the target analysis text according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text. Therefore, the method and the device can determine the consistency of the text according to the semantic relation between adjacent text fragments in the text, and are beneficial to determining the consistency degree of the text content on the chapter level, so that the accurate and effective text analysis effect is achieved.

Description

Text continuity analysis method and device
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text continuity analysis method and device.
Background
With the development of natural language processing technology, semantic analysis for text begins to transition from analysis of local indexes to analysis of global indexes, and more global indexes, such as continuity of sentences in the text or correlation between the text and a theme, begin to be introduced into research and are gradually valued. However, in the prior art, for the analysis of the text continuity, local factors such as word combinations and keyword types in the text are mainly considered, and a global factor of semantic relationship among text fragments is not considered. Therefore, the prior art has defects and needs to be solved urgently.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a text continuity analysis method and device, which is helpful to determine the continuity degree of text content at chapter level, thereby achieving an accurate and effective text analysis effect.
In order to solve the technical problem, a first aspect of the present invention discloses a text continuity analysis method, including:
acquiring a target analysis text; the target analysis text comprises a plurality of text segments;
determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text;
determining a consistency parameter corresponding to the target analysis text according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text.
As an optional implementation manner, in the first aspect of the present invention, the determining, according to the semantic relationship, a coherence parameter corresponding to the target analysis text includes:
determining a text consistency requirement corresponding to the target analysis text; the text consistency requirement is used for indicating a semantic relation parameter requirement corresponding to at least one part of text of the target analysis text;
judging whether the corresponding semantic relation between at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement or not;
and determining the consistency parameters corresponding to the target analysis texts according to the judgment result.
As an optional implementation manner, in the first aspect of the present invention, the semantic relationship includes an explicit semantic relationship and/or an implicit semantic relationship.
As an optional implementation manner, in the first aspect of the present invention, the determining the corresponding semantic relationship between at least one pair of adjacent text segments in the target analysis text includes:
inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relations between the adjacent text segments; the semantic analysis algorithm model comprises an explicit semantic analysis model and/or an implicit semantic analysis model.
As an optional implementation manner, in the first aspect of the present invention, the inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relationships between the adjacent text segments includes:
judging whether dominant connecting words exist between at least one pair of adjacent text segments in the target analysis text;
when the judgment result is yes, inputting the adjacent text segments into an explicit semantic analysis model for analysis so as to obtain corresponding explicit semantic relations between the adjacent text segments;
and when the judgment result is negative, inputting the adjacent text segments into a latent semantic analysis model for analysis so as to obtain the corresponding latent semantic relation between the adjacent text segments.
As an optional implementation manner, in the first aspect of the present invention, the implicit semantic relationship is a probability that a subsequent text segment is a next semantic consistency segment of a previous text segment in the adjacent text segments, and/or the text consistency requirement is specifically used to indicate an implicit consistency requirement corresponding to at least a part of text of the target analysis text;
and/or, the determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text consistency requirement includes:
for a text fragment pair consisting of any two adjacent text fragments in any part of the text of the target analysis text, determining the implicit coherence of the text fragment pair according to the probability that the next text fragment in the text fragment pair is the next semantic coherence fragment of the previous text fragment;
and judging whether the implicit consistency of at least one text segment pair in at least one part of text in the target analysis text meets the text consistency requirement.
As an optional implementation manner, in the first aspect of the present invention, the semantic relationship includes a plurality of types of statement join relationships; and/or the text consistency requirement is specifically used for indicating the type and/or the number of sentence connection relations which at least one part of text of the target analysis text should include;
and/or, the determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text consistency requirement includes:
for any part of text of the target analysis text, determining all sentence connection relations in the part of text according to corresponding sentence connection relations between any pair of adjacent text segments in the part of text;
and judging whether all sentence connection relations in at least one part of texts in the target analysis texts meet the text consistency requirement or not.
The second aspect of the embodiments of the present invention discloses a text continuity analysis device, including:
the text acquisition module is used for acquiring a target analysis text; the target analysis text comprises a plurality of text segments;
the semantic determining module is used for determining the corresponding semantic relation between at least one pair of adjacent text segments in the target analysis text;
the consistency analysis module is used for determining consistency parameters corresponding to the target analysis texts according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text.
As an optional implementation, in the second aspect of the present invention, the coherence analysis module includes:
the first determining unit is used for determining the text consistency requirement corresponding to the target analysis text; the text consistency requirement is used for indicating a semantic relation parameter requirement corresponding to at least one part of text of the target analysis text;
the judging unit is used for judging whether the corresponding semantic relation between at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement or not;
and the second determining unit is used for determining the consistency parameters corresponding to the target analysis text according to the judgment result.
As an optional implementation manner, in the second aspect of the present invention, the semantic relationship includes an explicit semantic relationship and/or an implicit semantic relationship.
As an optional implementation manner, in the second aspect of the present invention, a specific manner of determining a corresponding semantic relationship between at least one pair of adjacent text segments in the target analysis text by the semantic determination module includes:
inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relations between the adjacent text segments; the semantic analysis algorithm model comprises an explicit semantic analysis model and/or an implicit semantic analysis model.
As an optional implementation manner, in the second aspect of the present invention, the specific manner in which the semantic determining module inputs at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relationships between the adjacent text segments includes:
judging whether dominant connecting words exist between at least one pair of adjacent text segments in the target analysis text;
when the judgment result is yes, inputting the adjacent text segments into an explicit semantic analysis model for analysis so as to obtain corresponding explicit semantic relations between the adjacent text segments;
and when the judgment result is negative, inputting the adjacent text segments into a latent semantic analysis model for analysis so as to obtain the corresponding latent semantic relation between the adjacent text segments.
As an optional implementation manner, in the second aspect of the present invention, the implicit semantic relationship is a probability that a subsequent text segment is a next semantic consistency segment of a previous text segment in the adjacent text segments, and/or the text consistency requirement is specifically used to indicate an implicit consistency requirement corresponding to at least a part of text of the target analysis text;
and/or the specific manner of judging whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement by the judging unit comprises the following steps:
for a text fragment pair consisting of any two adjacent text fragments in any part of the text of the target analysis text, determining the implicit coherence of the text fragment pair according to the probability that the next text fragment in the text fragment pair is the next semantic coherence fragment of the previous text fragment;
and judging whether the implicit consistency of at least one text segment pair in at least one part of text in the target analysis text meets the text consistency requirement.
As an optional implementation manner, in the second aspect of the present invention, the semantic relationship includes a plurality of types of statement connecting relationships; and/or the text consistency requirement is specifically used for indicating the type and/or the number of sentence connection relations which at least one part of text of the target analysis text should include;
and/or the specific manner of judging whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement by the judging unit comprises the following steps:
for any part of text of the target analysis text, determining all sentence connection relations in the part of text according to corresponding sentence connection relations between any pair of adjacent text segments in the part of text;
and judging whether all sentence connection relations in at least one part of texts in the target analysis texts meet the text consistency requirement or not.
The third aspect of the present invention discloses another text consistency analysis apparatus, which includes:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute part or all of the steps in the text consistency analysis method disclosed by the first aspect of the invention.
In a fourth aspect of the present invention, a computer storage medium is disclosed, which stores computer instructions for performing some or all of the steps of the method for text coherence analysis disclosed in the first aspect of the present invention when the computer instructions are invoked.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention discloses a text continuity analysis method and a text continuity analysis device, wherein the method comprises the following steps: acquiring a target analysis text; the target analysis text comprises a plurality of text segments; determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text; determining a consistency parameter corresponding to the target analysis text according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text. Therefore, the method and the device can determine the consistency of the text according to the semantic relation between adjacent text fragments in the text, and are beneficial to determining the consistency degree of the text content on the chapter level, so that the accurate and effective text analysis effect is achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a text consistency analysis method according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating another method for text consistency analysis according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a text consistency analysis apparatus according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of another text consistency analysis apparatus disclosed in the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of another text consistency analysis apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to those listed but may alternatively include other steps or elements not listed or inherent to such process, method, product, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The invention discloses a text continuity analysis method and a text continuity analysis device, which can determine the continuity of a text according to the semantic relation between adjacent text fragments in the text and are beneficial to determining the continuity degree of text contents at the chapter level, thereby achieving the accurate and effective text analysis effect. The following are detailed below.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a text consistency analysis method according to an embodiment of the present invention. The text coherence analysis method described in fig. 1 is applied to a text data analysis system/a text data analysis device/a text data analysis server (where the text data analysis server includes a local server or a cloud server). As shown in fig. 1, the text coherence analysis method may include the following operations:
101. and acquiring a target analysis text.
Optionally, the target analysis text may be written text submitted by a user, for example, text created by the user on software according to theme requirements or type requirements, or may also be text on an existing network or media or book, such as text of a book, news of a media, or an internet podcast article. Alternatively, the target analysis text may be english, chinese, german or other languages with semantic analysis possibility, such as interpretable ancient languages, or other languages with semantic analysis possibility in the future due to the development of language technology and the accumulation of corpus, which is not limited by the present invention.
Optionally, the target text may be obtained in a manner including, but not limited to: receiving input computer information directly, recognizing text in images, recognizing text in speech, and receiving information input by handwriting equipment.
Alternatively, the target analysis text may include a plurality of text segments. Optionally, the text segment in the present invention may be a segment with complete semantics, such as a clause, a sentence, or a paragraph, or may also be a segment with incomplete semantics, which is composed of any continuous words or single words, and the present invention is not limited.
102. And determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text.
Optionally, the semantic relationships may include explicit semantic relationships and/or implicit semantic relationships, where an explicit semantic relationship is used to refer to a semantic relationship that can be determined explicitly according to semantic links, and an implicit semantic relationship is used to refer to a semantic relationship that needs to be inferred according to the semantics of the two text segments themselves because there is no explicit semantic link, and is implicit. For example, the following two examples:
(1)The weather was awesome.We decided to go for lunch outside.
or, the weather is really good, and we decide to go outside to have lunch.
(2)The weather was awesome.However,we had to eat our food inside.
Or, the weather is really good, however, we have to eat our food inside.
The above two examples, in which the semantic relationship between the preceding and following sentences of the first example is causal, that is, it is desirable to enjoy lunch outside because of good weather, are applicable to both languages of chinese and english, and also similar concepts exist in german or french, and thus are very suitable for explaining the technical concept of the present invention. Importantly, However, in this example, the relationship must be inferred from the natural semantics of two adjacent sentences, in contrast to the second example, where the semantic conjunction However or, However, is used to indicate that there is a similarly contrasting relationship between two adjacent sentences. The presence of such semantic connectors thus makes the expressed relationship explicit, thereby making the second instance an explicit semantic relationship, while the absence of such semantic connectors makes the semantic relationship implicit, thereby making the first instance an implicit semantic relationship.
Optionally, the semantic relationship may include a plurality of types of statement connection relationships, where the plurality of types of statement connection relationships may be implicit semantic relationships or explicit semantic relationships, and these two settings do not conflict with each other, because both the implicit semantic analysis algorithm and the explicit semantic analysis algorithm may analyze the statement connection relationships between adjacent text fragments. Optionally, the types of statement connection relationships may include, but are not limited to: the semantic connection relation can be expanded or deleted according to actual conditions. Optionally, the type of the sentence connection relationship may refer to a research result of an existing semantic tree classification theory, for example, when the method is applied to languages of the japanese family of languages, such as english or german, or other similar languages, the type of the sentence connection relationship may refer to a classification proposed by PDTB (Penn discrete tree bank) theory, and when the method is applied to languages, such as chinese, the type of the sentence connection relationship may refer to a chinese chapter-level sentence semantic relationship system and labels proposed by a theory proposed in chinese semantic research, such as zhangmu, qin soldier or liu, or a chinese chapter structure representation system based on CDT (connection-driven dependency tree) proposed by suzhou natural language processing laboratory. Therefore, the semantic relation can be applied to the text analysis of a plurality of selectable language types through the semantic classification theory proposed in the localization research of various languages.
103. And determining the consistency parameters corresponding to the target analysis texts according to the semantic relation.
Specifically, the consistency parameter is used to indicate the narrative consistency of the target analysis text. Optionally, the coherence parameter may be a coherence score corresponding to the target analysis text, and may be obtained according to a semantic relationship between text segments and a preset scoring rule. Optionally, the coherence parameter may also be a corresponding coherence prompt or modification suggestion for the target analysis text, for example, when the target analysis text is not coherent, a corresponding warning prompt may be displayed at the user terminal, or a corresponding modification suggestion may be displayed.
Therefore, the embodiment of the invention can determine the consistency of the text according to the semantic relation between adjacent text fragments in the text, and is beneficial to determining the consistency degree of the text content on the chapter level, thereby achieving the accurate and effective text analysis effect.
As an optional implementation manner, in the step 102, determining the corresponding semantic relationship between at least one pair of adjacent text segments in the target analysis text may include:
and inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model so as to output corresponding semantic relations between the adjacent text segments.
Optionally, the semantic analysis algorithm model may be obtained by training a text segment training set labeled with a semantic connection relationship type in advance, and the semantic analysis algorithm model in the embodiment of the present invention may also include an explicit semantic analysis model and/or a implicit semantic analysis model corresponding to the classification in the above embodiment. The explicit semantic analysis model is specifically used for analyzing and obtaining an explicit semantic relationship between adjacent text fragments, and the implicit semantic analysis model is specifically used for analyzing and obtaining an implicit semantic relationship between adjacent text fragments.
Therefore, through the optional implementation mode, at least one pair of adjacent text segments in the target analysis text can be input into the semantic analysis algorithm model to output the corresponding semantic relationship between the adjacent text segments, so that the semantic analysis algorithm model can be utilized to obtain the accurate semantic relationship, and an accurate data basis is provided for the subsequent coherence analysis.
As an optional implementation manner, in the above step, inputting at least one pair of adjacent text segments in the target analysis text into the semantic analysis algorithm model to output corresponding semantic relationships between the adjacent text segments, includes:
judging whether dominant connecting words exist between at least one pair of adjacent text segments in the target analysis text;
when the judgment result is yes, inputting the adjacent text segments into the explicit semantic analysis model for analysis so as to obtain the corresponding explicit semantic relationship between the adjacent text segments;
and when the judgment result is negative, inputting the adjacent text segments into the implicit semantic analysis model for analysis so as to obtain the corresponding implicit semantic relationship between the adjacent text segments.
Optionally, the explicit conjunctions described in the embodiments of the present invention are used to indicate whether conjunctions used to connect two propositions or two viewpoints exist between text segments, and optionally, at least one pair of adjacent text segments in the target analysis text may be input into a pre-trained conjunctions determination model to determine whether explicit conjunctions exist between at least one pair of adjacent text segments in the target analysis text, where the conjunctions determination model may employ a neural network classification algorithm model, such as a combination of a BERT model and a multi-layer classification sensor, and train the conjunctions determination model based on a preset conjunctions dictionary and a training text.
Further, when explicit conjunctions are determined by using the conjunctions determination model, it is necessary to pay attention to ambiguity issues of conjunctions, such as the following two examples:
(3) he is now changing the place He sheets every night, sometimes more than once a night)
(4) One it gets other, a company can do with it has it takes Once there the company can do as desired
In example (3), once per night, once is part of the phrase once a night, which does not connect any proposition and therefore does not belong to an explicit conjunct, in contrast to example (4), where once represents two propositions contained in a connected sentence, separated by commas, so that it can be seen that the same conjunct has different meanings in different context. In order to eliminate the ambiguity problem of the conjunctive word, in the present embodiment, in the training or judgment of the conjunctive word judgment model, preset ambiguity information of the specific conjunctive word is further introduced to avoid a judgment error of the conjunctive word judgment model. The specific connective word ambiguity information may specify a condition that the specific connective word should not be considered as an explicit connective word in a specific context collocation, and taking the above two sentences as an example, when training a connective word judgment model, preset specific connective word ambiguity information may be introduced as training data in the connective word judgment model to train the connective word judgment model as: the once with the context of once a night is not judged as the conjunctive word, and specific conjunctive word ambiguity information can be introduced as a post-correction model of the conjunctive word judgment model, so that when the once with the context of once an night is judged as the conjunctive word by the conjunctive word judgment model, the judgment result is corrected.
For example, in a practical embodiment, the dictionary of concatenated words DiMLex is used for training a model for determining concatenated words in german, all the concatenated words appearing in the dictionary of concatenated words DiMLex are used as candidate recognized objects, and a neural network classification algorithm model is trained using a Potsdam Comment Corpus (PCC) and web resource texts of wikipedia as training texts, thereby achieving a good effect of determining concatenated words. Furthermore, the proposal introduces specific connective ambiguity information in a connective word dictionary DiMLex in the training of the connective word judgment model so as to negate the training prediction result of the connective word judgment model and realize ambiguity elimination of the dominant connective word. In another practical implementation, the conjunctive word dictionary EN-DiMLex is used for training the conjunctive word judgment model in English, and the training effect is also good.
Therefore, through the optional implementation mode, whether the dominant connecting words exist can be judged before the corresponding semantic relation between the adjacent text fragments is analyzed, so that the dominant semantic text fragment pair and the recessive semantic text fragment pair can be distinguished, the accurate semantic relation can be obtained through a semantic analysis algorithm model in the follow-up process, and an accurate data basis is provided for the continuity analysis.
As an optional embodiment, the explicit semantic analysis module includes two parallel vector parsing modules and a multi-layer perceptual classifier module, where the multi-layer perceptual classifier module is connected to outputs of the two vector parsing modules and classifies text vectors output by the two vector parsing modules. Correspondingly, in the above steps, inputting the adjacent text segments into the explicit semantic analysis model for analysis to obtain the corresponding explicit semantic relationship between the adjacent text segments, may include:
respectively inputting two adjacent text segments into two parallel vector analysis modules to obtain two corresponding text vectors;
and inputting the two corresponding text vectors into a multi-layer perception classifier module for classification so as to obtain the corresponding explicit semantic relation between the adjacent text segments.
Optionally, the vector parsing module may be a BERT model or an ERNIE model. Optionally, the multi-layer perception classifier module may include a full connection layer module and a softmax layer module.
Further, the explicit semantic analysis model may also be an ensemble learning algorithm model obtained by training the explicit semantic analysis model including the two parallel vector analysis modules and the multi-layered perceptual classifier module using an ensemble learning algorithm, and optionally, the ensemble learning algorithm may be a random forest algorithm, and the ensemble learning algorithm model may be a random forest algorithm model. Further optionally, the random forest algorithm trains the explicit semantic analysis model by using syntactic features as decision tree parameters to obtain a random forest algorithm model, where the syntactic features may be those based on pos-Tags (Part-of-Speech Tags) or those based on parent nodes and child nodes of the phrase structure tree.
Therefore, by the optional implementation mode, the explicit semantic relation of adjacent text fragments can be analyzed by adopting an explicit semantic analysis model comprising two parallel vector analysis modules and a multi-layer perception classifier module, so that the accurate explicit semantic relation can be calculated, and an accurate data basis is provided for consistency analysis.
As an alternative, the implicit semantic relationship may be a probability that a subsequent text segment is a next semantically consecutive segment of a previous text segment in adjacent text segments. Accordingly, the implicit semantic analysis model may be an NSP (Next sequence Prediction) module in the BERT model. Optionally, the adjacent text segments may be input to the BERT model pre-trained by NSP, and a probability that the subsequent text segment is a next semantic consecutive segment of the previous text segment is obtained, in some cases, the probability is in the form of a prediction score, and then the continuity parameter corresponding to the target analysis text is determined according to the semantic relationship in step 103, which may be: and judging the continuity of the adjacent text segments according to a preset score threshold value and the prediction scores of the adjacent text segments. For example, when the prediction score is lower than the score threshold, the adjacent text segments are judged to be incoherent, and a corresponding warning prompt is displayed on the user terminal.
Therefore, by the optional implementation mode, the NSP module in the BERT model can be adopted to analyze the implicit semantic relation of the adjacent text segments, so that the accurate implicit semantic relation can be calculated, and an accurate data basis is provided for the consistency analysis.
Considering that when the scheme of the invention is applied to a scene of writing evaluation, because the prediction score of a recessive semantic relation in the NSP module is generally low, if a general score threshold is adopted, an algorithm can only detect a very incoherent semantic relation in a text, but cannot identify a semantic relation with a slightly incoherent degree. Therefore, as an alternative embodiment, the score threshold may be determined by:
determining all recessive text segment pairs in the target analysis text; the hidden text segment pair is a pair of adjacent text segments without dominant connecting words between the adjacent text segments;
inputting all recessive text segment pairs into a recessive semantic analysis model to obtain a prediction score of each recessive text segment pair;
and determining the average value of the prediction scores of all the recessive text segment pairs as a score threshold value.
Optionally, the specific details of all pairs of the recessive text segments in the target analysis text are determined, and the implementation may be performed by referring to the technical details of explicit connecting word judgment by using the connecting word judgment model, and similarly, the specific details of the prediction score of each pair of the recessive text segments are obtained by inputting all pairs of the recessive text segments into the recessive semantic analysis model, and the implementation may also be performed by referring to the above-mentioned method of inputting adjacent text segments into the BERT model pre-trained by NSP, and obtaining the probability that the subsequent text segment is the next semantic consecutive segment of the previous text segment, which is not described herein again.
Therefore, by the alternative implementation, the average value of the prediction scores of all the recessive text segment pairs of the target analysis text can be determined as the score threshold, so that the continuity of the target analysis text can be determined more accurately in the following process.
Example two
Referring to fig. 2, fig. 2 is a schematic flow chart of another text consistency analysis method according to an embodiment of the present invention. The text coherence analysis method described in fig. 2 is applied to a text data analysis system/a text data analysis device/a text data analysis server (where the text data analysis server includes a local server or a cloud server). As shown in fig. 2, the text coherence analysis method may include the following operations:
201. and acquiring a target analysis text.
202. And determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text.
In the embodiment of the present invention, for the related description of steps 201-202, refer to the detailed description of steps 101-102 in the first embodiment, and the detailed description of the embodiment of the present invention is not repeated.
203. And determining the text consistency requirement corresponding to the target analysis text.
Optionally, the text consistency requirement is used to indicate a semantic relationship parameter requirement corresponding to at least a part of the text of the target analysis text. Optionally, the text continuity requirement may be specifically used to indicate an implicit continuity requirement corresponding to at least a part of text of the target analysis text. Optionally, the text continuity requirement may also be specifically used to indicate a type and/or a number of sentence connection relationships that at least a part of the text of the target analysis text should include.
204. And judging whether the corresponding semantic relation between at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement or not.
205. And determining the consistency parameters corresponding to the target analysis text according to the judgment result.
Therefore, the embodiment of the invention can judge whether the corresponding semantic relation between the text segments in the target analysis text meets the text consistency requirement corresponding to the target analysis text, and determine the consistency parameter corresponding to the target analysis text according to the judgment result, so that whether the semantic relation of the target analysis text is consistent can be judged by combining the preset consistency requirement index, and the implementation of more macroscopic semantic consistency analysis is facilitated.
As an optional implementation manner, in step 203, determining the text consistency requirement corresponding to the target analysis text may include:
determining a target text architecture corresponding to a target analysis text;
and determining the text consistency requirement corresponding to the target analysis text according to the target text architecture.
The target text architecture may be a text discussion structure to which a text genre corresponding to the target analysis text should follow, for example, when the text genre corresponding to the target analysis text is an english discussion essay, the target text architecture generally includes three parts: introduction, text and conclusion. In general, in a good english discussion text, the introduction shall be mainly a statement of the fact, the text shall discuss several pros and cons of the author from the standpoint of the subject, from the front and back and by means of a plurality of comparative examples to prove the author's point of view, and in the conclusion we want to summarize the key points of view.
Optionally, the text coherence requirement corresponding to the target analysis text is determined according to the target text architecture, and the text coherence requirement of different parts of texts of the target analysis text may be determined according to the expected semantic relationship of the different parts of texts in the target text architecture. The expected semantic relationship may be an expectation of a statement connection relationship or an expectation of an implicit semantic relationship.
Further combining the above example, when the text genre corresponding to the target analysis text is an english discussion essay, the texts (introduction, text and conclusion) of different parts in the target text architecture correspond to different expected semantic relationships, and also correspond to different semantic relationship requirements. In the text of the introductory portion, there should be mainly statements of facts and therefore their corresponding desired semantic relationships should be topological, declarative, their corresponding semantic relationship requirements should be a topological or declarative statement connection relationship, such as the expansion. While in the text section, several advantages and disadvantages of the author from the standpoint of the subject should be discussed, from the standpoint of both obverse and reverse sides and attesting to the author by means of a plurality of comparative examples, the corresponding desired semantic relationships should be comparative, the corresponding semantic relationship requirements should be comparative statement connection relationships, such as the relationships of several sub-levels below the Comparison level in the PTDB, and at the same time, the number of corresponding semantic relationships should be higher than that of the introduction section or the conclusion section. Whereas in the conclusion section we expect to summarize the key views, their corresponding expected semantic relationships should be either expansive, detailed, or Detail, requiring statement join relationships that should be either expansive or detailed, such as the expansion.
In addition, when referring to implicit semantic relationships, also in connection with the above example, since the author should discuss various viewpoints and examples in the body part, and the author only needs to make factual exposition or viewpoint summarization in the introduction part and the conclusion part, the implicit semantic coherence of the body part should be lower than that of the introduction part and the conclusion part, where the implicit semantic coherence may be the probability that a following text segment is the next semantic coherence segment of the previous text segment in the adjacent text segments disclosed in the first embodiment. Therefore, in the text coherence requirement corresponding to the text genre, the english short text in the above example, the implicit coherence requirement for the body part should be lower than the implicit coherence requirement for the introduction part and the conclusion part.
Another example is a guided or programmed text genre, such as recipes. Text of recipe genres, generally first listing the required ingredients and then giving the steps required to turn them into breakfast or dinner, semantically, this text type is characterized by many expressions of temporal relationship (usually expressions of imperative), e.g. letting milk stew slowly for 5 minutes, then adding flour etc., thus semantic relationships mainly represented throughout the text, generally temporal ones, so in this type of text, its corresponding desired semantic relationships should be temporal, its corresponding semantic relationships require sentence-join relationships that should be temporal, e.g. temporal.
Therefore, by the above example, the text coherence requirements corresponding to different text structures can be obtained by analyzing the text structures of different genres, and then the text coherence requirements can be determined for the target analysis text for coherence analysis of the target analysis text.
As an optional implementation manner, the determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text continuity requirement in step 204 may include:
for a text fragment pair consisting of any two adjacent text fragments in any part of the text of the target analysis text, determining the implicit consistency of the text fragment pair according to the probability that the next text fragment in the text fragment pair is the next semantic consistent fragment of the previous text fragment;
and judging whether the implicit consistency of at least one text segment pair in at least one part of the target analysis text meets the text consistency requirement.
Optionally, the NSP prediction score corresponding to the probability may be determined as the implicit consistency of the text segment pair. Furthermore, the average value of the implicit coherence of all text segment pairs in at least one part of the target analysis text can be determined as the implicit coherence corresponding to the part of the target analysis text, the implicit coherence corresponding to the part of the target analysis text is compared with the implicit coherence threshold in the text coherence requirement corresponding to the part of the target analysis text, and when the implicit coherence corresponding to the part of the target analysis text is lower than the corresponding implicit coherence threshold, the part of the target analysis text is judged not to conform to the text coherence requirement.
Further, when it is determined that the portion of text does not meet the text consistency requirement, a corresponding warning prompt or a corresponding modification suggestion may be displayed at the user terminal.
Therefore, through the optional implementation mode, the consistency of the text can be analyzed by judging whether the implicit consistency of at least one text segment pair in at least one part of the target analysis text meets the text consistency requirement or not, the consistency degree of the text content can be determined through the implicit semantic relation, and therefore the accurate and effective text analysis effect is achieved.
As an optional implementation manner, in the step 204, determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text continuity requirement includes:
for any part of text of the target analysis text, determining all sentence connection relations in the part of text according to corresponding sentence connection relations between any pair of adjacent text segments in the part of text;
and judging whether all sentence connection relations in at least one part of texts in the target analysis texts meet the text consistency requirement or not.
Optionally, the type and/or number of the sentence connection relations in the part of text may be counted, and the counted information is compared with the type requirement and/or number requirement of the sentence connection relations in the text continuity requirement corresponding to the part of text, so as to determine whether the type and/or number of the sentence connection relations in the part of text meet the type requirement and/or number requirement of the sentence connection relations in the corresponding text continuity requirement.
Further, when it is determined that all sentence connection relations of the part of text do not meet the text consistency requirement, a corresponding warning prompt or a corresponding modification suggestion may be displayed at the user terminal. Optionally, a corresponding modification suggestion may be generated according to a portion where all the sentence connection relationships of the portion of text do not satisfy the text continuity requirement, for example, when there is no sentence connection relationship of the type required in the text continuity requirement in all the sentence connection relationships of the portion of text, a modification suggestion is generated to suggest that the user adds a text segment of the sentence connection relationship of the type in the portion of text, or for example, when the number of the sentence connection relationships of the specific type or the unspecific type in the portion of text does not satisfy the number requirement in the text continuity requirement, a modification suggestion is generated to suggest that the user adds more text segments of the sentence connection relationship of the specific type or the unspecific type in the portion of text, for example, when there are fewer text segments of the comparison relationship of the body portion found in the text of the english discussion essay, a modification suggestion may be generated, students are advised to add more comparative or exemplary discussions in the text section to document their own opinion.
Therefore, through the optional implementation mode, the consistency of the text can be analyzed by judging whether the connection relations of all sentences in at least a part of text in the target analysis text meet the text consistency requirement, the consistency degree of the text content can be determined through the connection relations of the sentences, and the accurate and effective text analysis effect is achieved.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a text continuity analysis device according to an embodiment of the present invention. The text continuity analyzing apparatus described in fig. 3 is applied to a text data analyzing system/a text data analyzing device/a text data analyzing server (where the text data analyzing server includes a local server or a cloud server). As shown in fig. 3, the text continuity analyzing apparatus may include:
a text obtaining module 301, configured to obtain a target analysis text.
Optionally, the target analysis text may be written text submitted by a user, for example, text created by the user on software according to theme requirements or type requirements, or may also be text on an existing network or media or book, such as text of a book, news of a media, or an internet podcast article. Alternatively, the target analysis text may be english, chinese, german or other languages with semantic analysis possibility, such as interpretable ancient languages, or other languages with semantic analysis possibility in the future due to the development of language technology and the accumulation of corpus, which is not limited by the present invention.
Optionally, the target text may be obtained in a manner including, but not limited to: receiving input computer information directly, recognizing text in images, recognizing text in speech, and receiving information input by handwriting equipment.
Alternatively, the target analysis text may include a plurality of text segments. Optionally, the text segment in the present invention may be a segment with complete semantics, such as a clause, a sentence, or a paragraph, or may also be a segment with incomplete semantics, which is composed of any continuous words or single words, and the present invention is not limited.
A semantic determining module 302, configured to determine a corresponding semantic relationship between at least one pair of adjacent text segments in the target analysis text.
Optionally, the semantic relationships may include explicit semantic relationships and/or implicit semantic relationships, where an explicit semantic relationship is used to refer to a semantic relationship that can be determined explicitly according to semantic links, and an implicit semantic relationship is used to refer to a semantic relationship that needs to be inferred according to the semantics of the two text segments themselves because there is no explicit semantic link, and is implicit.
Optionally, the semantic relationship may include a plurality of types of statement connection relationships, where the plurality of types of statement connection relationships may be implicit semantic relationships or explicit semantic relationships, and these two settings do not conflict with each other, because both the implicit semantic analysis algorithm and the explicit semantic analysis algorithm may analyze the statement connection relationships between adjacent text fragments. Optionally, the types of statement connection relationships may include, but are not limited to: the semantic connection relation can be expanded or deleted according to actual conditions. Optionally, the type of the sentence connection relationship may refer to a research result of an existing semantic tree classification theory, for example, when the method is applied to languages of the japanese family of languages, such as english or german, or other similar languages, the type of the sentence connection relationship may refer to a classification proposed by PDTB (Penn discrete tree bank) theory, and when the method is applied to languages, such as chinese, the type of the sentence connection relationship may refer to a chinese chapter-level sentence semantic relationship system and labels proposed by a theory proposed in chinese semantic research, such as zhangmu, qin soldier or liu, or a chinese chapter structure representation system based on CDT (connection-driven dependency tree) proposed by suzhou natural language processing laboratory. Therefore, the semantic relation can be applied to the text analysis of a plurality of selectable language types through the semantic classification theory proposed in the localization research of various languages.
And a coherence analysis module 303, configured to determine a coherence parameter corresponding to the target analysis text according to the semantic relationship.
Specifically, the consistency parameter is used to indicate the narrative consistency of the target analysis text. Optionally, the coherence parameter may be a coherence score corresponding to the target analysis text, and may be obtained according to a semantic relationship between text segments and a preset scoring rule. Optionally, the coherence parameter may also be a corresponding coherence prompt or modification suggestion for the target analysis text, for example, when the target analysis text is not coherent, a corresponding warning prompt may be displayed at the user terminal, or a corresponding modification suggestion may be displayed.
Therefore, the embodiment of the invention can determine the consistency of the text according to the semantic relation between adjacent text fragments in the text, and is beneficial to determining the consistency degree of the text content on the chapter level, thereby achieving the accurate and effective text analysis effect.
As an optional implementation, the semantic determining module 302 determines a specific manner of the corresponding semantic relationship between at least one pair of adjacent text segments in the target analysis text, including:
and inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model so as to output corresponding semantic relations between the adjacent text segments.
Optionally, the semantic analysis algorithm model may be obtained by training a text segment training set labeled with a semantic connection relationship type in advance, and the semantic analysis algorithm model in the embodiment of the present invention may also include an explicit semantic analysis model and/or a implicit semantic analysis model corresponding to the classification in the above embodiment. The explicit semantic analysis model is specifically used for analyzing and obtaining an explicit semantic relationship between adjacent text fragments, and the implicit semantic analysis model is specifically used for analyzing and obtaining an implicit semantic relationship between adjacent text fragments.
Therefore, through the optional implementation mode, at least one pair of adjacent text segments in the target analysis text can be input into the semantic analysis algorithm model to output the corresponding semantic relationship between the adjacent text segments, so that the semantic analysis algorithm model can be utilized to obtain the accurate semantic relationship, and an accurate data basis is provided for the subsequent coherence analysis.
As an optional implementation, the specific way in which the semantic determining module 302 inputs at least one pair of adjacent text segments in the target analysis text into the semantic analysis algorithm model to output the corresponding semantic relationship between the adjacent text segments includes:
judging whether dominant connecting words exist between at least one pair of adjacent text segments in the target analysis text;
when the judgment result is yes, inputting the adjacent text segments into the explicit semantic analysis model for analysis so as to obtain the corresponding explicit semantic relationship between the adjacent text segments;
and when the judgment result is negative, inputting the adjacent text segments into the implicit semantic analysis model for analysis so as to obtain the corresponding implicit semantic relationship between the adjacent text segments.
Optionally, the explicit conjunctions described in the embodiments of the present invention are used to indicate whether conjunctions used to connect two propositions or two viewpoints exist between text segments, and optionally, at least one pair of adjacent text segments in the target analysis text may be input into a pre-trained conjunctions determination model to determine whether explicit conjunctions exist between at least one pair of adjacent text segments in the target analysis text, where the conjunctions determination model may employ a neural network classification algorithm model, such as a combination of a BERT model and a multi-layer classification sensor, and train the conjunctions determination model based on a preset conjunctions dictionary and a training text. In order to eliminate the ambiguity problem of the conjunctive word, in the present embodiment, in the training or judgment of the conjunctive word judgment model, preset ambiguity information of the specific conjunctive word is further introduced to avoid a judgment error of the conjunctive word judgment model. The particular connectionless information may specify that the particular connectionless information should not be considered an explicit connectionless in the particular context collocation.
Therefore, through the optional implementation mode, whether the dominant connecting words exist can be judged before the corresponding semantic relation between the adjacent text fragments is analyzed, so that the dominant semantic text fragment pair and the recessive semantic text fragment pair can be distinguished, the accurate semantic relation can be obtained through a semantic analysis algorithm model in the follow-up process, and an accurate data basis is provided for the continuity analysis.
As an optional embodiment, the explicit semantic analysis module includes two parallel vector parsing modules and a multi-layer perceptual classifier module, where the multi-layer perceptual classifier module is connected to outputs of the two vector parsing modules and classifies text vectors output by the two vector parsing modules. Correspondingly, the semantic determining module 302 inputs the adjacent text segments into the explicit semantic analysis model for analysis, so as to obtain a specific manner of the corresponding explicit semantic relationship between the adjacent text segments, which may include:
respectively inputting two adjacent text segments into two parallel vector analysis modules to obtain two corresponding text vectors;
and inputting the two corresponding text vectors into a multi-layer perception classifier module for classification so as to obtain the corresponding explicit semantic relation between the adjacent text segments.
Optionally, the vector parsing module may be a BERT model or an ERNIE model. Optionally, the multi-layer perception classifier module may include a full connection layer module and a softmax layer module.
Further, the explicit semantic analysis model may also be an ensemble learning algorithm model obtained by training the explicit semantic analysis model including the two parallel vector analysis modules and the multi-layered perceptual classifier module using an ensemble learning algorithm, and optionally, the ensemble learning algorithm may be a random forest algorithm, and the ensemble learning algorithm model may be a random forest algorithm model. Further optionally, the random forest algorithm trains the explicit semantic analysis model by using syntactic features as decision tree parameters to obtain a random forest algorithm model, where the syntactic features may be those based on pos-Tags (Part-of-Speech Tags) or those based on parent nodes and child nodes of the phrase structure tree.
Therefore, by the optional implementation mode, the explicit semantic relation of adjacent text fragments can be analyzed by adopting an explicit semantic analysis model comprising two parallel vector analysis modules and a multi-layer perception classifier module, so that the accurate explicit semantic relation can be calculated, and an accurate data basis is provided for consistency analysis.
As an alternative, the implicit semantic relationship may be a probability that a subsequent text segment is a next semantically consecutive segment of a previous text segment in adjacent text segments. Accordingly, the implicit semantic analysis model may be an NSP (Next sequence Prediction) module in the BERT model. Optionally, the adjacent text segments may be input to the BERT model pre-trained by NSP, and a probability that a subsequent text segment is a next semantic consecutive segment of a previous text segment is obtained, in some cases, the probability is in the form of a prediction score, and then the continuity analysis module 303 determines a specific manner of determining a continuity parameter corresponding to the target analysis text according to the semantic relationship, where the specific manner may be: and judging the continuity of the adjacent text segments according to a preset score threshold value and the prediction scores of the adjacent text segments. For example, when the prediction score is lower than the score threshold, the adjacent text segments are judged to be incoherent, and a corresponding warning prompt is displayed on the user terminal.
Therefore, by the optional implementation mode, the NSP module in the BERT model can be adopted to analyze the implicit semantic relation of the adjacent text segments, so that the accurate implicit semantic relation can be calculated, and an accurate data basis is provided for the consistency analysis.
Considering that when the scheme of the invention is applied to a scene of writing evaluation, because the prediction score of a recessive semantic relation in the NSP module is generally low, if a general score threshold is adopted, an algorithm can only detect a very incoherent semantic relation in a text, but cannot identify a semantic relation with a slightly incoherent degree. Therefore, as an alternative embodiment, the score threshold may be determined by:
determining all recessive text segment pairs in the target analysis text; the hidden text segment pair is a pair of adjacent text segments without dominant connecting words between the adjacent text segments;
inputting all recessive text segment pairs into a recessive semantic analysis model to obtain a prediction score of each recessive text segment pair;
and determining the average value of the prediction scores of all the recessive text segment pairs as a score threshold value.
Optionally, the specific details of all pairs of the recessive text segments in the target analysis text are determined, and the implementation may be performed by referring to the technical details of explicit connecting word judgment by using the connecting word judgment model, and similarly, the specific details of the prediction score of each pair of the recessive text segments are obtained by inputting all pairs of the recessive text segments into the recessive semantic analysis model, and the implementation may also be performed by referring to the above-mentioned method of inputting adjacent text segments into the BERT model pre-trained by NSP, and obtaining the probability that the subsequent text segment is the next semantic consecutive segment of the previous text segment, which is not described herein again.
Therefore, by the alternative implementation, the average value of the prediction scores of all the recessive text segment pairs of the target analysis text can be determined as the score threshold, so that the continuity of the target analysis text can be determined more accurately in the following process.
As an alternative embodiment, as shown in fig. 4, the coherence analysis module 303 includes:
the first determining unit 3031 is configured to determine a text continuity requirement corresponding to the target analysis text.
Optionally, the text consistency requirement is used to indicate a semantic relationship parameter requirement corresponding to at least a part of the text of the target analysis text. Optionally, the text continuity requirement may be specifically used to indicate an implicit continuity requirement corresponding to at least a part of text of the target analysis text. Optionally, the text continuity requirement may also be specifically used to indicate a type and/or a number of sentence connection relationships that at least a part of the text of the target analysis text should include.
The determining unit 3032 is configured to determine whether a semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies a text continuity requirement.
A second determining unit 3033, configured to determine, according to the determination result, a coherence parameter corresponding to the target analysis text.
Therefore, through the optional implementation mode, whether the semantic relation corresponding to the text segments in the target analysis text meets the text consistency requirement corresponding to the target analysis text can be judged, and the consistency parameters corresponding to the target analysis text can be determined according to the judgment result, so that whether the semantic relation of the target analysis text is consistent can be judged by combining the preset consistency requirement index, and the realization of more macroscopic semantic consistency analysis is facilitated.
As an optional implementation manner, the specific manner of determining the text consistency requirement corresponding to the target analysis text by the first determining unit 3031 includes:
determining a target text architecture corresponding to a target analysis text;
and determining the text consistency requirement corresponding to the target analysis text according to the target text architecture.
The target text architecture may be a text discussion structure to which a text genre corresponding to the target analysis text should follow, for example, when the text genre corresponding to the target analysis text is an english discussion essay, the target text architecture generally includes three parts: introduction, text and conclusion. In general, in a good english discussion text, the introduction shall be mainly a statement of the fact, the text shall discuss several pros and cons of the author from the standpoint of the subject, from the front and back and by means of a plurality of comparative examples to prove the author's point of view, and in the conclusion we want to summarize the key points of view.
Optionally, the text coherence requirement corresponding to the target analysis text is determined according to the target text architecture, and the text coherence requirement of different parts of texts of the target analysis text may be determined according to the expected semantic relationship of the different parts of texts in the target text architecture. The expected semantic relationship may be an expectation of a statement connection relationship or an expectation of an implicit semantic relationship.
As an optional implementation manner, the specific manner in which the determining unit 3032 determines whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text continuity requirement includes:
for a text fragment pair consisting of any two adjacent text fragments in any part of the text of the target analysis text, determining the implicit consistency of the text fragment pair according to the probability that the next text fragment in the text fragment pair is the next semantic consistent fragment of the previous text fragment;
and judging whether the implicit consistency of at least one text segment pair in at least one part of the target analysis text meets the text consistency requirement.
Optionally, the NSP prediction score corresponding to the probability may be determined as the implicit consistency of the text segment pair. Furthermore, the average value of the implicit coherence of all text segment pairs in at least one part of the target analysis text can be determined as the implicit coherence corresponding to the part of the target analysis text, the implicit coherence corresponding to the part of the target analysis text is compared with the implicit coherence threshold in the text coherence requirement corresponding to the part of the target analysis text, and when the implicit coherence corresponding to the part of the target analysis text is lower than the corresponding implicit coherence threshold, the part of the target analysis text is judged not to conform to the text coherence requirement.
Further, when it is determined that the portion of text does not meet the text consistency requirement, a corresponding warning prompt or a corresponding modification suggestion may be displayed at the user terminal.
Therefore, through the optional implementation mode, the consistency of the text can be analyzed by judging whether the implicit consistency of at least one text segment pair in at least one part of the target analysis text meets the text consistency requirement or not, the consistency degree of the text content can be determined through the implicit semantic relation, and therefore the accurate and effective text analysis effect is achieved.
As an optional implementation manner, the specific manner in which the determining unit 3032 determines whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text continuity requirement includes:
for any part of text of the target analysis text, determining all sentence connection relations in the part of text according to corresponding sentence connection relations between any pair of adjacent text segments in the part of text;
and judging whether all sentence connection relations in at least one part of texts in the target analysis texts meet the text consistency requirement or not.
Optionally, the type and/or number of the sentence connection relations in the part of text may be counted, and the counted information is compared with the type requirement and/or number requirement of the sentence connection relations in the text continuity requirement corresponding to the part of text, so as to determine whether the type and/or number of the sentence connection relations in the part of text meet the type requirement and/or number requirement of the sentence connection relations in the corresponding text continuity requirement.
Further, when it is determined that all sentence connection relations of the part of text do not meet the text consistency requirement, a corresponding warning prompt or a corresponding modification suggestion may be displayed at the user terminal. Optionally, a corresponding modification suggestion may be generated according to a portion where all the sentence connection relationships of the portion of text do not satisfy the text continuity requirement, for example, when there is no sentence connection relationship of the type required in the text continuity requirement in all the sentence connection relationships of the portion of text, a modification suggestion is generated to suggest that the user adds a text segment of the sentence connection relationship of the type in the portion of text, or for example, when the number of the sentence connection relationships of the specific type or the unspecific type in the portion of text does not satisfy the number requirement in the text continuity requirement, a modification suggestion is generated to suggest that the user adds more text segments of the sentence connection relationship of the specific type or the unspecific type in the portion of text, for example, when there are fewer text segments of the comparison relationship of the body portion found in the text of the english discussion essay, a modification suggestion may be generated, students are advised to add more comparative or exemplary discussions in the text section to document their own opinion.
Therefore, through the optional implementation mode, the consistency of the text can be analyzed by judging whether the connection relations of all sentences in at least a part of text in the target analysis text meet the text consistency requirement, the consistency degree of the text content can be determined through the connection relations of the sentences, and the accurate and effective text analysis effect is achieved.
It should be noted that the text continuity analyzing apparatus in the embodiment of the present invention is an execution functional module for implementing each step of the text continuity analyzing method disclosed in the first embodiment and the second embodiment, and therefore, specific technical implementation details and optional other implementation details of each module in the embodiment of the present invention may refer to the technical details disclosed in the first embodiment and the second embodiment, and the present invention is not described herein again.
Example four
Referring to fig. 5, fig. 5 is a schematic diagram illustrating another apparatus for text consistency analysis according to an embodiment of the present invention. The text continuity analyzing apparatus depicted in fig. 5 is applied to a text data analyzing system/a text data analyzing device/a text data analyzing server (where the text data analyzing server includes a local server or a cloud server). As shown in fig. 5, the text continuity analyzing means may include:
a memory 401 storing executable program code;
a processor 402 coupled with the memory 401;
the processor 402 calls the executable program code stored in the memory 401 to execute the steps of the text continuity analysis method described in the first embodiment or the second embodiment.
EXAMPLE five
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the steps of the text continuity analysis method described in the first embodiment or the second embodiment.
EXAMPLE six
An embodiment of the present invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps of the text coherence analysis method described in the first embodiment or the second embodiment.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the method and apparatus for text consistency analysis disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, rather than for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A text continuity analysis method, characterized in that the method comprises:
acquiring a target analysis text; the target analysis text comprises a plurality of text segments;
determining corresponding semantic relations between at least one pair of adjacent text segments in the target analysis text;
determining a consistency parameter corresponding to the target analysis text according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text.
2. The method according to claim 1, wherein the determining the coherence parameter corresponding to the target analysis text according to the semantic relationship comprises:
determining a text consistency requirement corresponding to the target analysis text; the text consistency requirement is used for indicating a semantic relation parameter requirement corresponding to at least one part of text of the target analysis text;
judging whether the corresponding semantic relation between at least one pair of adjacent text segments in at least one part of text in the target analysis text meets the text consistency requirement or not;
and determining the consistency parameters corresponding to the target analysis texts according to the judgment result.
3. The method of text coherence analysis according to claim 2, wherein said semantic relationships comprise explicit semantic relationships and/or implicit semantic relationships.
4. The method according to any one of claims 1-3, wherein said determining corresponding semantic relationships between at least one pair of adjacent text segments in the target analysis text comprises:
inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relations between the adjacent text segments; the semantic analysis algorithm model comprises an explicit semantic analysis model and/or an implicit semantic analysis model.
5. The method according to claim 4, wherein said inputting at least one pair of adjacent text segments in the target analysis text into a semantic analysis algorithm model to output corresponding semantic relationships between the adjacent text segments comprises:
judging whether dominant connecting words exist between at least one pair of adjacent text segments in the target analysis text;
when the judgment result is yes, inputting the adjacent text segments into an explicit semantic analysis model for analysis so as to obtain corresponding explicit semantic relations between the adjacent text segments;
and when the judgment result is negative, inputting the adjacent text segments into a latent semantic analysis model for analysis so as to obtain the corresponding latent semantic relation between the adjacent text segments.
6. The method according to claim 3, wherein the implicit semantic relationship is a probability that a subsequent text segment is a next semantic consistency segment of a previous text segment in the adjacent text segments, and/or the text consistency requirement is specifically used to indicate an implicit consistency requirement corresponding to at least a part of text of the target analysis text;
and/or, the determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text consistency requirement includes:
for a text fragment pair consisting of any two adjacent text fragments in any part of the text of the target analysis text, determining the implicit coherence of the text fragment pair according to the probability that the next text fragment in the text fragment pair is the next semantic coherence fragment of the previous text fragment;
and judging whether the implicit consistency of at least one text segment pair in at least one part of text in the target analysis text meets the text consistency requirement.
7. The method of text coherence analysis according to claim 2, wherein the semantic relationships include multiple types of sentence connection relationships; and/or the text consistency requirement is specifically used for indicating the type and/or the number of sentence connection relations which at least one part of text of the target analysis text should include;
and/or, the determining whether the semantic relationship corresponding to at least one pair of adjacent text segments in at least one part of text in the target analysis text satisfies the text consistency requirement includes:
for any part of text of the target analysis text, determining all sentence connection relations in the part of text according to corresponding sentence connection relations between any pair of adjacent text segments in the part of text;
and judging whether all sentence connection relations in at least one part of texts in the target analysis texts meet the text consistency requirement or not.
8. A text continuity analysis apparatus, characterized in that the apparatus comprises:
the text acquisition module is used for acquiring a target analysis text; the target analysis text comprises a plurality of text segments;
the semantic determining module is used for determining the corresponding semantic relation between at least one pair of adjacent text segments in the target analysis text;
the consistency analysis module is used for determining consistency parameters corresponding to the target analysis texts according to the semantic relation; the consistency parameter is used for indicating the narrative consistency degree of the target analysis text.
9. A text continuity analysis apparatus, characterized in that the apparatus comprises:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to perform the text coherence analysis method of any one of claims 1-7.
10. A computer storage medium having stored thereon computer instructions which, when invoked, perform a method of text coherence analysis according to any one of claims 1 to 7.
CN202111042313.XA 2021-09-07 2021-09-07 Text continuity analysis method and device Pending CN113743125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111042313.XA CN113743125A (en) 2021-09-07 2021-09-07 Text continuity analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111042313.XA CN113743125A (en) 2021-09-07 2021-09-07 Text continuity analysis method and device

Publications (1)

Publication Number Publication Date
CN113743125A true CN113743125A (en) 2021-12-03

Family

ID=78736367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111042313.XA Pending CN113743125A (en) 2021-09-07 2021-09-07 Text continuity analysis method and device

Country Status (1)

Country Link
CN (1) CN113743125A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970491A (en) * 2022-08-02 2022-08-30 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679035A (en) * 2017-10-11 2018-02-09 石河子大学 A kind of information intent detection method, device, equipment and storage medium
CN108897723A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 The recognition methods of scene dialog text, device and terminal
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
KR20200084436A (en) * 2018-12-26 2020-07-13 주식회사 와이즈넛 Aparatus for coherence analyzing between each sentence in a text document and method thereof
CN111832308A (en) * 2020-07-17 2020-10-27 苏州思必驰信息科技有限公司 Method and device for processing consistency of voice recognition text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679035A (en) * 2017-10-11 2018-02-09 石河子大学 A kind of information intent detection method, device, equipment and storage medium
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN108897723A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 The recognition methods of scene dialog text, device and terminal
KR20200084436A (en) * 2018-12-26 2020-07-13 주식회사 와이즈넛 Aparatus for coherence analyzing between each sentence in a text document and method thereof
CN111832308A (en) * 2020-07-17 2020-10-27 苏州思必驰信息科技有限公司 Method and device for processing consistency of voice recognition text

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
严为绒;徐扬;朱珊珊;洪宇;姚建民;朱巧明;: "篇章关系分析研究综述", 中文信息学报, no. 04, 15 July 2016 (2016-07-15) *
刘维东: "Web短文本知识关联模型及其语义连贯计算方法", 中国博士学位论文全文数据库 信息科技辑, 15 February 2017 (2017-02-15), pages 103 - 130 *
姚琴: "语篇连贯:显性连贯与隐性连贯", 江苏大学学报, 30 November 2005 (2005-11-30) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970491A (en) * 2022-08-02 2022-08-30 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium
CN114970491B (en) * 2022-08-02 2022-10-04 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium
WO2023098658A1 (en) * 2022-08-02 2023-06-08 深圳市城市公共安全技术研究院有限公司 Text cohesion determination method and apparatus, and electronic device and storage medium

Similar Documents

Publication Publication Date Title
Urieli Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit
Bengtson et al. Understanding the value of features for coreference resolution
RU2686000C1 (en) Retrieval of information objects using a combination of classifiers analyzing local and non-local signs
US10496756B2 (en) Sentence creation system
US20140136188A1 (en) Natural language processing system and method
RU2607976C1 (en) Extracting information from structured documents containing text in natural language
US10445428B2 (en) Information object extraction using combination of classifiers
Mills et al. Graph-based methods for natural language processing and understanding—A survey and analysis
Silva et al. Recognizing and justifying text entailment through distributional navigation on definition graphs
CN111680159A (en) Data processing method and device and electronic equipment
Opitz et al. Dissecting content and context in argumentative relation analysis
Almeida et al. A joint model for quotation attribution and coreference resolution
US11386270B2 (en) Automatically identifying multi-word expressions
Danlos et al. Primary and secondary discourse connectives: definitions and lexicons
Chen et al. Chinese zero pronoun resolution: An unsupervised approach combining ranking and integer linear programming
CN111859988A (en) Semantic similarity evaluation method and device and computer-readable storage medium
RU2665261C1 (en) Recovery of text annotations related to information objects
CN114926039A (en) Risk assessment method, risk assessment device, electronic device, and storage medium
García-Méndez et al. A system for automatic English text expansion
CN115238039A (en) Text generation method, electronic device and computer-readable storage medium
CN113743125A (en) Text continuity analysis method and device
Vargas Narrative information extraction with non-linear natural language processing pipelines
de Souza Inácio et al. Evaluation metrics for video captioning: A survey
Žitnik et al. SkipCor: Skip-mention coreference resolution using linear-chain conditional random fields
Rocchietti et al. FANCY: A Diagnostic Data-Set for NLI Models.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination