CN113656576A - Article summary generation method and device, computing device and storage medium - Google Patents

Article summary generation method and device, computing device and storage medium Download PDF

Info

Publication number
CN113656576A
CN113656576A CN202110994697.9A CN202110994697A CN113656576A CN 113656576 A CN113656576 A CN 113656576A CN 202110994697 A CN202110994697 A CN 202110994697A CN 113656576 A CN113656576 A CN 113656576A
Authority
CN
China
Prior art keywords
chapter
role
interaction
information
article
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110994697.9A
Other languages
Chinese (zh)
Other versions
CN113656576B (en
Inventor
郑元辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Digital Media Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Digital Media Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Digital Media Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110994697.9A priority Critical patent/CN113656576B/en
Publication of CN113656576A publication Critical patent/CN113656576A/en
Application granted granted Critical
Publication of CN113656576B publication Critical patent/CN113656576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an article summary generation method, an article summary generation device, computing equipment and a storage medium. According to the technical scheme provided by the invention, the interaction frequency data of the target role in the article to be processed, which interacts with other roles in at least two chapter contents, is obtained; respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels; extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information; and combining the key sentences to generate article summary information of the article to be processed. By the method and the device, the summaries of articles such as novel and the like can be quickly generated, so that a user can conveniently look up the summaries, the user experience is improved, and the generation of the novel script is facilitated.

Description

Article summary generation method and device, computing device and storage medium
Technical Field
The invention relates to the technical field of text mining, in particular to an article summary generation method, an article summary generation device, computing equipment and a computer storage medium.
Background
With the development of the text content extraction technology, in the process of reading various articles, a user needs to acquire summary information or summary information of the article content as soon as possible to shorten the time for reading the article content by the user and improve the acquisition efficiency of the article information.
However, the current method for summarizing article contents is mainly directed to various news such as politics, sports, and entertainment, and various articles such as notifications and reports, and there is no method for generating a summary of literary works such as novels. The existing summary generation method for news, notice and report can only extract key sentences, but can not acquire the association between the key sentences from articles, so that the key sentences are independent from each other, and the method is not suitable for the summary generation of literary works such as novel.
Disclosure of Invention
In view of the above, the present invention has been made to provide an article summary generation method and a corresponding article summary generation apparatus, a computing device and a computer storage medium that overcome or at least partially solve the above-mentioned problems.
According to an aspect of the present invention, there is provided an article summary generation method, the method including:
acquiring interaction frequency data of a target role in the article to be processed, which interacts with other roles in at least two chapter contents respectively;
respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels;
extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information;
and combining the key sentences to generate article summary information of the article to be processed.
In the foregoing solution, the obtaining of interaction frequency data that a target character in an article to be processed interacts with other characters in at least two chapter contents further includes:
extracting role information of each role from the at least two chapter contents of the article to be processed, and dividing the role information of the same role into the same role set to obtain a plurality of role sets;
and determining paragraph serial numbers of the roles in the at least two chapter contents according to the role information in the plurality of role sets, and obtaining the interaction frequency data according to the paragraph serial numbers of the roles in the at least two chapter contents.
In the foregoing solution, the determining, according to the role information in the plurality of role sets, the paragraph numbers of the roles appearing in the at least two chapter contents, and obtaining the interaction frequency data according to the paragraph numbers of the roles appearing in the at least two chapter contents further includes:
counting the occurrence frequency of the role information in the role set corresponding to each role in the at least two chapter contents, and recording the paragraph serial number of the role information in each chapter content to obtain the chapter role occurrence set of the role;
and according to the chapter role appearance set of the role and the chapter role appearance sets of other roles, obtaining the interaction frequency data by counting the interaction frequency and non-interaction frequency of the role with other roles in the contents of the at least two chapters.
In the foregoing solution, the determining, according to the interaction frequency data, corresponding chapter interaction information for the at least two chapter contents further includes:
determining chapter types corresponding to the at least two chapter contents according to the interaction frequency and the non-interaction frequency in the interaction frequency data, and setting role interaction relationship labels for the at least two chapter contents; the chapter types include: an interactive section type and a non-interactive section type.
In the foregoing solution, the determining the chapter types corresponding to the at least two chapter contents according to the interaction frequency and the non-interaction frequency in the interaction frequency data, and setting a role interaction relationship label for the at least two chapter contents further includes:
for each chapter content, if the ratio of the interaction frequency and the non-interaction frequency of the target role to other roles in the chapter content is greater than a first preset threshold, determining the chapter type corresponding to the chapter content as an interaction chapter type, and setting corresponding role interaction relationship labels according to the target role and other roles interacted by the target role; if the ratio of the interaction frequency to the non-interaction frequency of the target character to other characters in the chapter content is smaller than or equal to a first preset threshold value, determining the chapter type corresponding to the chapter content as a non-interaction chapter type, and setting a corresponding non-interaction label according to the target character.
In the foregoing solution, the extracting a key sentence from the at least two chapter contents according to the interaction frequency data and the chapter interaction information further includes:
dividing the at least two chapter contents according to the chapter sequences of the at least two chapter contents and the role interaction relationship labels in the chapter interaction information to obtain a plurality of extraction intervals, and determining the types of the plurality of extraction intervals; merging a plurality of chapter contents which are adjacent and have the same role interaction relationship label into an extraction interval;
calculating a role interaction ratio corresponding to each extraction interval according to the interaction frequency data, and determining the extraction quantity of the key sentences of each extraction interval according to the role interaction ratio;
and aiming at each extraction interval, extracting key sentences the number of which is consistent with the extraction number of the key sentences in the extraction interval from the extraction interval according to a preset extraction strategy corresponding to the type of the extraction interval.
In the foregoing solution, the combining the key sentence to generate the article summary information of the article to be processed further includes:
and connecting the key sentences in series according to the sequence of the key sentences from front to back in the article to be processed to generate article summary information of the article to be processed.
According to another aspect of the present invention, there is provided an article summary generation apparatus including: the device comprises an acquisition module, a determination module, an extraction module and a generation module; wherein the content of the first and second substances,
the acquisition module is configured to: acquiring interaction frequency data of a target role in the article to be processed, which are interacted with other roles in at least two chapter contents respectively;
the determination module is to: respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels;
the extraction module is configured to: extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information;
the generation module is configured to: and combining the key sentences to generate article summary information of the article to be processed.
According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the article summary generating method.
According to yet another aspect of the present invention, a computer storage medium is provided, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform operations corresponding to the article summary generating method as described above.
According to the technical scheme provided by the invention, the interaction frequency data of the target role in the article to be processed, which interacts with other roles in at least two chapter contents, is obtained; respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels; extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information; and combining the key sentences to generate article summary information of the article to be processed. Therefore, the defect that the key information in the generated summary text cannot determine the sequence due to the fact that all sentences of articles such as novel sentences are taken as extraction targets in the traditional key information extraction mode is overcome. The beneficial effects that the order of the drama is embodied in the summary text and the key points of the drama are highlighted based on a certain designated role while the full effectiveness of the drama content is ensured are achieved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a flow diagram of an article summary generation method according to one embodiment of the invention;
FIG. 2 is a flow chart illustrating a method for setting a role interaction relationship label for at least two chapter contents according to another embodiment of the present invention;
FIG. 3 shows a block diagram of an article summary generation apparatus according to an embodiment of the invention;
FIG. 4 shows a schematic structural diagram of a computing device according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a flow diagram of an article summary generation method according to an embodiment of the invention, as shown in fig. 1, the method comprising the steps of:
step S101, acquiring interaction frequency data of a target character in the article to be processed interacting with other characters in at least two chapter contents respectively.
The articles to be processed may include literary works such as novels, online articles, and the like.
Specifically, before the calculating the interaction frequency data between the characters in the contents of at least two chapters of the article to be processed, the embodiment further includes:
extracting role information of each role from at least two chapter contents of the article to be processed, and dividing the role information of the same role into the same role set to obtain a plurality of role sets.
Specifically, the role set may be an information set representing the role information of the role in the content of a certain chapter, that is, a role set of the role, obtained by extracting an indicator representing a certain role in the content of a certain chapter as the role information and combining the indicators.
For example:
role a ═ wushibrother, wukuang and wutangtang }
Role B ═ amander, tianyan }
The wu master, wu spang and wutang are all indicator words representing role a and are role information of role a, so that after the indicator words are combined, role a is the role set of role a;
similarly, the amateur and the tianyi are both the indication words representing the character B and are the character information of the character B, and therefore, the character B is the character information of the character B { the amateur and the tianyi }.
In this way, after the character information of each character in the chapter content is extracted, the character information of the same character is divided into the same character set, and the character set of each character is generated for the character information of different characters.
Specifically, according to the role information in the plurality of role sets, the paragraph numbers of the roles appearing in the at least two chapter contents are determined, and the interaction frequency data is obtained according to the paragraph numbers of the roles appearing in the at least two chapter contents.
Determining the frequency of the role appearing in each chapter content aiming at each role from a plurality of roles of each chapter content; specifically, for each role, counting the occurrence frequency of role information in a role set corresponding to the role in at least two chapter contents, and recording the paragraph number of the role information in each chapter content to obtain the chapter role occurrence set of the role;
further, by using the method for determining the occurrence frequency of the role in at least two chapter contents for a single role and obtaining the chapter role occurrence set of the role, the occurrence frequency of each role in each chapter content in the chapter is determined, and the chapter role occurrence set of each role is determined according to the occurrence frequency of each role in at least two chapter contents.
And for each role, counting the interaction frequency and non-interaction frequency of the role with other roles in at least two chapter contents according to the chapter role appearance set of the role and the chapter role appearance sets of other roles, and obtaining interaction frequency data.
Optionally, since the number of chapter contents in the article is not fixed, the chapter contents may be less; alternatively, the chapter content may be just a scene description, independent of role; therefore, the scheme for counting the frequency of interaction between each character and other characters can be as follows: if n adjacent paragraphs of a character contain another character, the interaction between the character and the other character is recorded.
For example: there are three roles in the article: role a, role B, role C. Counting the occurrence frequency of the role A, the role B and the role C in the chapters 1 to 9 respectively, and recording the paragraph numbers of the role A, the role B and the role C in the content of each chapter to obtain the occurrence frequency of each role in the content of different chapters, as shown in table 1:
Figure BDA0003233460700000071
TABLE 1
By summarizing the frequency of occurrence of each character in the table in different chapter contents, a chapter character occurrence set a ═ a is obtained1,a2,......,an},B={b1,b2,......,bn},C={c1,c2,......,cn}. A in the setn、bn、cnThe paragraph numbers of corresponding indicators of the role A, the role B and the role C appearing in the chapter contents are respectively indicated in the representation article.
Optionally, the interaction frequency between the role a and the role B may be defined as:
Figure BDA0003233460700000072
wherein, interval (a)i,bi)<n represents that the interval of the paragraph where the character indicator is located is less than n, and the character interaction is recorded as one-time character interaction;
obtaining the interaction frequency among other roles according to the definition;
in addition, the frequency of character a having no interaction relationship (i.e., the non-interaction frequency of character a) is denoted as a.
Based on the scheme, the interaction frequency and the non-interaction frequency of each character with other characters in different chapter contents are counted, wherein the counting result of the interaction frequency is shown in table 2:
Figure BDA0003233460700000073
TABLE 2
Specifically, according to the chapter role appearance set of the role and the chapter role appearance sets of other roles, the interaction frequency and the non-interaction frequency of the role with other roles in at least two chapter contents are counted to obtain interaction frequency data.
Step S102, according to the interaction frequency data, corresponding chapter interaction information is respectively determined for the at least two chapter contents; the chapter interaction information comprises chapter types and role interaction relationship labels.
For example, with character a as the target character, the appearance frequency and interaction frequency data of character a are obtained from the contents of table 1 and table 2, as shown in table 3:
Figure BDA0003233460700000081
TABLE 3
According to the data in the table 3, chapter interaction information for different chapter contents of the role a (i.e., the target role) is determined.
Specifically, the chapter interaction information includes: chapter type and role interaction relationship labels;
the determining chapter interaction information of at least two chapter contents according to the interaction frequency data further comprises:
determining chapter types corresponding to the contents of at least two chapters according to the interaction frequency and the non-interaction frequency in the interaction frequency data of the target role, and setting role interaction relationship labels for the contents of at least two chapters; the chapter types include: an interactive section type and a non-interactive section type.
And step S103, extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information.
Specifically, the extracting the key sentence from the at least two chapter contents according to the interaction frequency data of the target character and the chapter interaction information of the at least two chapter contents further includes:
dividing the at least two chapter contents according to the chapter sequences of the at least two chapter contents and the role interaction relationship labels in the chapter interaction information of the at least two chapter contents to obtain a plurality of extraction intervals, and determining the types of the plurality of extraction intervals; merging a plurality of chapter contents which are adjacent and have the same role interaction relationship label into an extraction interval;
calculating a role interaction ratio corresponding to each extraction interval according to the interaction frequency data of the target role, and determining the extraction quantity of the key sentences of each extraction interval according to the role interaction ratio;
and aiming at each extraction interval, extracting key sentences the number of which is consistent with the extraction number of the key sentences in the extraction interval from the extraction interval according to a preset extraction strategy corresponding to the type of the extraction interval.
Specifically, the character interaction ratio may be a ratio of the interaction frequency of the target character in the content of one chapter and other characters to the occurrence frequency of the target character in the content of the chapter.
The types of the extraction intervals include: single role information interval, role interaction interval and role irrelevant interval. When the target character appears frequently and interacts less frequently in a certain chapter of content, meaning that the target character appears many times but does not interact in the chapter of content, the chapter of content may be an introduction or a description of the target character, and the chapter of content may be considered as a single character information interval; when the appearance frequency of the target role is high and the interaction frequency is high, which means that the target role appears for many times and interacts with other roles for many times in the content of the section, the content of the section can be regarded as a role interaction interval; when the appearance frequency of the target role is less and the interaction frequency is less, it means that the association between the chapter content and the target role is less, and the chapter content can be considered as a role-independent interval.
In a plurality of role interaction intervals, the larger the value of the role interaction ratio is, the larger the proportion of the target role participating in the interaction in the role interaction interval is, and the content of the chapter can be considered as a main core plot for the target role; otherwise, it means that the smaller the proportion of the target role participating in the interaction in the chapter content is, the chapter content can be considered as a secondary core plot for the target role.
Optionally, the preset extraction policy includes: extracting more key sentences for the extraction interval with high character interaction ratio; for a single role information interval and a role interaction interval, the nominal keywords are all core elements which need to be extracted and serve as key information; for the role-independent intervals, keywords may not be extracted.
Extracting a larger number of key sentences from the core stories to ensure that more core stories can be contained in the article summary information; and for the non-core plot, less key sentences can be extracted, and the main position of the core plot is reflected in the finally generated article summary information through the number of the key sentences under the condition that the corresponding content can be contained.
Optionally, when the key sentence is extracted, the key sentence may be extracted first and then the key sentence may be extracted according to the technical scheme provided by the present invention.
When extracting keywords, determining nouns and special nouns (such as character names, event locations, and the like) in chapter contents as key description information, performing word segmentation on all chapter contents to be extracted by using ICTCCLAS (Institute of Computing Technology, Chinese Lexical Analysis System, Chinese Lexical Analysis System), marking the part of speech of each word, extracting keywords with the most abundant information content by using a textRank algorithm and the like, ensuring that 20% of the words in the whole article are extracted, and the number of the extracted nouns accounts for more than 60% of the total number of the extracted words, for example, the keyword sequencing obtained by the textRank algorithm can be respectively subjected to noun sequencing and sequencing of other words, and the noun sequencing extraction is more than 6;
when extracting key sentences, screening out key words of all sentences in chapter contents as each dimension of a feature vector, and expressing the feature vector by adopting TF-IDF (Term Frequency-Inverse text Frequency index); extracting key sentences from each extraction interval according to the type of the extraction interval, the sequence of the intervals and the preset extraction strategy, for example, counting the key sentences of the text by adopting a textRank algorithm, and calculating the similarity between sentences through the cosine similarity of the sentence characteristic vector; after the key sentences are extracted and the key sentence sequencing of the intervals is obtained, the sentence weight containing the labels is improved according to the role interaction relation labels in the scheme, and then the key sentences of each interval are obtained.
And step S104, combining the key sentences to generate article summary information of the article to be processed.
Specifically, after the extraction of the key sentences is completed, the key sentences are combined to generate article summary information of the article to be processed. Determining the number of key sentences needed to be used in each extraction interval according to the calculated character interaction ratio, wherein the number of key sentences used in intervals with higher character interaction is larger; in the interval with lower role interaction, the number of key sentences is less; preferably, the number of key sentences used in each extraction interval is ensured to be greater than 1.
And connecting the key sentences used in each extraction interval in series according to the sequence of the key sentences from front to back in the article to be processed to generate article summary information of the article to be processed.
By summarizing the sentences with as many chapters and sections as possible in the core and summarizing the unimportant chapters and sections with less sentences, the article summary which can accurately describe the article content, has proper sequence and outstanding core is finally obtained.
According to the article summary generation method provided by the embodiment, interaction frequency data of a target role in an article to be processed interacting with other roles in at least two chapter contents are obtained; respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels; extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information; and combining the key sentences to generate article summary information of the article to be processed. By utilizing the technical scheme provided by the invention, the extracted keywords and key sentences in the article can have correlation by specifying the role information and the chapter content and combining the role information and the chapter content into a plurality of extraction intervals according to the sequence of the chapter content and the role interaction relationship labels thereof, so that the finally generated summary information content is ensured to be sufficient and effective, the description of the whole article is accurate, the sequence is proper, and the core is outstanding. The scheme effectively solves the problems that in the prior art, all sentences of the article are taken as extraction objects, the front-back sequence of the acquired key information (key words and key sentences) cannot be confirmed, and the content of the summarized information cannot be ensured to be in accordance with the sequence of the article.
Fig. 2 is a flowchart illustrating a method for setting a role interaction relationship label for at least two chapter contents according to another embodiment of the present invention, where the method includes the following steps, as shown in fig. 2:
step S201, counting the interaction frequency and the non-interaction frequency in the interaction frequency data of the target role.
Specifically, data aiming at the occurrence frequency and the interaction frequency of the target role are obtained through the occurrence set of the chapter roles and the interaction frequency data of the target role;
for example, the appearance frequency and interaction frequency data of character a are obtained from the contents of table 1 and table 2 by using character a as the target character.
Step S202, determining whether a ratio of the interaction frequency and the non-interaction frequency of the target character with other characters in the chapter content is greater than a first preset threshold.
And determining the chapter type corresponding to the chapter content by judging whether the ratio of the interaction frequency to the non-interaction frequency of the target character to other characters in the chapter content is greater than a first preset threshold value.
Specifically, in the article, compared with the description of a single character or the content irrelevant to the character, the interaction between characters obviously has richer information, so that the chapter content with a small occurrence frequency of the target character can be labeled as irrelevant chapter content.
Optionally, chapter contents with the target role occurrence frequency less than 10% of the occurrence frequency of all the roles can be regarded as irrelevant chapter contents, and the type of the chapter contents is a non-interactive chapter type.
Alternatively, the chapter type of the chapter contents may be determined according to the following definition:
Figure BDA0003233460700000121
wherein X ═ { B, C }N=2X denotes a character set other than character a, N denotes the total number of characters of other characters appearing in the chapter content in addition to character a, N-2 denotes that two characters, namely character B and character C, still exist in the chapter content, and the first preset threshold is set to 0.6, namely 60%.
Step S203, determine the chapter type corresponding to the chapter content as the interactive chapter type.
Specifically, according to the definition of the chapter type in the above scheme, the chapter content of which the ratio of the interaction frequency to the non-interaction frequency of the target character to the other characters is greater than a first preset threshold is determined as the interaction chapter type.
And step S204, setting corresponding role interaction relationship labels according to the target role and other roles interacted with the target role.
Specifically, since a · X ═ a · B + a · C, it is necessary to determine whether the chapter content mainly describes the interaction of the character a with the character B, or the interaction of the character a with the character C, by the following definition:
firstly, counting the average value of all the interaction frequencies of all the roles having interaction relation with the role A to the role A in the content of the section:
Figure BDA0003233460700000122
and if the interaction frequency of a certain role and the role A is greater than the average value, determining a corresponding role interaction relationship label for the content of the section. For example, when the frequency of interaction between character B and character A is greater than the average, i.e., when the frequency of interaction between character B and character A is greater than the average
Figure BDA0003233460700000123
The character interaction relationship label for the chapter content can be determined to be a.b.
For example, according to the data shown in table 3 in the above scheme, it can be determined that the character interaction relationship labels of the contents of different chapters in the article are shown in table 4:
Figure BDA0003233460700000131
TABLE 4
In table 4, the character interaction label corresponding to the chapter content 8 and the chapter content 9 is "/", which means that the chapter content 8 and the chapter content 9 are irrelevant chapter contents with a small frequency of appearance of the target character in step S202, and the character interaction label is "/".
In step S205, the chapter type corresponding to the chapter content is determined as a non-interactive chapter type.
Specifically, according to the definition of the chapter type in the above scheme, chapter content in which a ratio of an interaction frequency to a non-interaction frequency of the target character to other characters is less than or equal to a first preset threshold is determined as the non-interaction chapter type.
And step S206, setting a corresponding non-interactive label according to the target role.
Specifically, according to the definition of the chapter type in the above scheme, a non-interactive label is determined and set for the chapter content of the non-interactive chapter type, for example, according to the content in step S202 and table 3, chapter content 1 or chapter content 4 in table 4 is chapter content with a high non-interactive frequency of role a, and "a ×" is a corresponding non-interactive label; section content 8 and section content 9 are irrelevant section content with less frequent occurrences of role a, and "/" is its corresponding non-interactive label.
Fig. 3 is a block diagram showing a configuration of an article summary generation apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes: an acquisition module 301, a determination module 302, an extraction module 303 and a generation module 304; wherein the content of the first and second substances,
the obtaining module 301 is configured to: and acquiring interaction frequency data of the target role in the article to be processed interacting with other roles in at least two chapter contents respectively.
Specifically, the obtaining module 301 extracts role information of each role from at least two chapter contents of the article to be processed, and divides the role information of the same role into the same role set to obtain a plurality of role sets; determining the paragraph serial numbers of the roles in the at least two chapter contents according to the role information in the plurality of role sets, and calculating the interaction frequency data among the roles according to the paragraph serial numbers of the roles in the at least two chapter contents to obtain the interaction frequency data of the roles.
Further, for each role, counting the occurrence frequency of the role information in the role set corresponding to the role in at least two chapter contents, and recording the paragraph number of the role information in each chapter content to obtain the chapter role occurrence set of the role;
and according to the chapter role appearance set of the role and the chapter role appearance sets of other roles, obtaining interaction frequency data by counting the interaction frequency and non-interaction frequency of the role with other roles in at least two chapter contents.
The determining module 302 is configured to: respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels.
Specifically, the determining module 302 determines chapter types corresponding to at least two chapter contents according to the interaction frequency and the non-interaction frequency in the interaction frequency data, and sets a role interaction relationship label for the at least two chapter contents; the chapter types include: an interactive section type and a non-interactive section type.
Further, for each chapter content, if the ratio of the interaction frequency to the non-interaction frequency of the target character to other characters in the chapter content is greater than a first preset threshold, the determining module 302 determines the chapter type corresponding to the chapter content as an interaction chapter type, and sets a corresponding character interaction relationship tag according to the target character and other characters interacted with the target character; if the ratio of the interaction frequency to the non-interaction frequency of the target character to other characters in the chapter content is less than or equal to a first preset threshold, the determining module 302 determines the chapter type corresponding to the chapter content as a non-interaction chapter type, and sets a corresponding non-interaction tag according to the target character.
The extraction module 303 is configured to: and extracting key sentences from the contents of the at least two chapters according to the interaction frequency data and the chapter interaction information.
Specifically, the at least two chapter contents are divided according to the chapter sequence of the at least two chapter contents and the role interaction relationship labels in the chapter interaction information to obtain a plurality of extraction intervals, and the types of the plurality of extraction intervals are determined; merging a plurality of chapter contents which are adjacent and have the same role interaction relationship label into an extraction interval; calculating a role interaction ratio corresponding to each extraction interval according to the interaction frequency data, and determining the extraction quantity of the key sentences of each extraction interval according to the role interaction ratio; extracting key sentences the number of which is consistent with the extraction number of the key sentences in the extraction interval from the extraction interval according to a preset extraction strategy corresponding to the type of the extraction interval for each extraction interval; wherein the types of the extraction intervals include: single role information interval, role interaction interval and role irrelevant interval.
The generation module 304 is configured to: and combining the key sentences to generate article summary information of the article to be processed.
Specifically, the generating module 304 is further configured to: and connecting the key sentences in series according to the sequence of the key sentences from front to back in the article to be processed to generate article summary information of the article to be processed.
According to the article summary generation device provided by the embodiment, interaction frequency data of a target character in an article to be processed, which interacts with other characters in at least two chapter contents, is acquired; respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels; extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information; and combining the key sentences to generate article summary information of the article to be processed. By utilizing the technical scheme provided by the invention, the extracted keywords and key sentences in the article can have correlation by specifying the role information and the chapter content and combining the role information and the chapter content into a plurality of extraction intervals according to the sequence of the chapter content and the role interaction relationship labels thereof, so that the finally generated summary information content is ensured to be sufficient and effective, the description of the whole article is accurate, the sequence is proper, and the core is outstanding. The scheme effectively solves the problems that in the prior art, all sentences of the article are taken as extraction objects, the front-back sequence of the acquired key information (key words and key sentences) cannot be confirmed, and the content of the summarized information cannot be ensured to be in accordance with the sequence of the article.
The invention also provides a nonvolatile computer storage medium, which stores at least one executable instruction, and the executable instruction can execute the article summary generation method in any method embodiment.
Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein:
the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.
A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402, configured to execute the program 410, may specifically perform relevant steps in the above-described article summary generation method embodiment.
In particular, program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may specifically be configured to cause the processor 402 to execute the article summary generation method in any of the method embodiments described above. For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the above-mentioned article summary generation embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A method of article summary generation, the method comprising:
acquiring interaction frequency data of a target role in the article to be processed, which are interacted with other roles in at least two chapter contents respectively;
respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels;
extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information;
and combining the key sentences to generate article summary information of the article to be processed.
2. The method of claim 1, wherein the obtaining of interaction frequency data of a target character in the article to be processed interacting with other characters in at least two chapter contents further comprises:
extracting role information of each role from the at least two chapter contents of the article to be processed, and dividing the role information of the same role into the same role set to obtain a plurality of role sets;
and determining paragraph serial numbers of the roles in the at least two chapter contents according to the role information in the plurality of role sets, and obtaining the interaction frequency data according to the paragraph serial numbers of the roles in the at least two chapter contents.
3. The method of claim 2, wherein the determining the paragraph number of each character appearing in the at least two chapter contents according to the character information in the plurality of character sets, and obtaining the interaction frequency data according to the paragraph number of each character appearing in the at least two chapter contents further comprises:
counting the occurrence frequency of the role information in the role set corresponding to each role in the at least two chapter contents, and recording the paragraph serial number of the role information in each chapter content to obtain the chapter role occurrence set of the role;
and according to the chapter role appearance set of the role and the chapter role appearance sets of other roles, obtaining the interaction frequency data by counting the interaction frequency and non-interaction frequency of the role with other roles in the contents of the at least two chapters.
4. The method according to claim 1, wherein the determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data further comprises:
determining chapter types corresponding to the at least two chapter contents according to the interaction frequency and the non-interaction frequency in the interaction frequency data, and setting role interaction relationship labels for the at least two chapter contents; the chapter types include: an interactive section type and a non-interactive section type.
5. The method of claim 4, wherein the determining the chapter types corresponding to the at least two chapter contents according to the interaction frequency and the non-interaction frequency in the interaction frequency data, and setting the role interaction relationship labels for the at least two chapter contents further comprises:
for each chapter content, if the ratio of the interaction frequency and the non-interaction frequency of the target role to other roles in the chapter content is greater than a first preset threshold, determining the chapter type corresponding to the chapter content as an interaction chapter type, and setting corresponding role interaction relationship labels according to the target role and other roles interacted by the target role; if the ratio of the interaction frequency to the non-interaction frequency of the target character to other characters in the chapter content is smaller than or equal to a first preset threshold value, determining the chapter type corresponding to the chapter content as a non-interaction chapter type, and setting a corresponding non-interaction label according to the target character.
6. The method of claim 1, wherein the extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information further comprises:
dividing the at least two chapter contents according to the chapter sequences of the at least two chapter contents and the role interaction relationship labels in the chapter interaction information to obtain a plurality of extraction intervals, and determining the types of the plurality of extraction intervals; merging a plurality of chapter contents which are adjacent and have the same role interaction relationship label into an extraction interval;
calculating a role interaction ratio corresponding to each extraction interval according to the interaction frequency data, and determining the extraction quantity of the key sentences of each extraction interval according to the role interaction ratio;
and aiming at each extraction interval, extracting key sentences the number of which is consistent with the extraction number of the key sentences in the extraction interval from the extraction interval according to a preset extraction strategy corresponding to the type of the extraction interval.
7. The method of any of claims 1-6, wherein the combining the key sentences to generate article summary information for the article to be processed further comprises:
and connecting the key sentences in series according to the sequence of the key sentences from front to back in the article to be processed to generate article summary information of the article to be processed.
8. An article summary generation apparatus comprising: the device comprises an acquisition module, a determination module, an extraction module and a generation module; wherein the content of the first and second substances,
the acquisition module is configured to: acquiring interaction frequency data of a target role in the article to be processed, which are interacted with other roles in at least two chapter contents respectively;
the determination module is to: respectively determining corresponding chapter interaction information for the at least two chapter contents according to the interaction frequency data; the chapter interaction information comprises chapter types and role interaction relationship labels;
the extraction module is configured to: extracting key sentences from the at least two chapter contents according to the interaction frequency data and the chapter interaction information;
the generation module is configured to: and combining the key sentences to generate article summary information of the article to be processed.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the article summary generating method of any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the article summary generation method as recited in any one of claims 1-7.
CN202110994697.9A 2021-08-27 2021-08-27 Article summary generation method, apparatus, computing device and storage medium Active CN113656576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110994697.9A CN113656576B (en) 2021-08-27 2021-08-27 Article summary generation method, apparatus, computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110994697.9A CN113656576B (en) 2021-08-27 2021-08-27 Article summary generation method, apparatus, computing device and storage medium

Publications (2)

Publication Number Publication Date
CN113656576A true CN113656576A (en) 2021-11-16
CN113656576B CN113656576B (en) 2024-05-24

Family

ID=78493081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110994697.9A Active CN113656576B (en) 2021-08-27 2021-08-27 Article summary generation method, apparatus, computing device and storage medium

Country Status (1)

Country Link
CN (1) CN113656576B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069936A (en) * 2023-02-28 2023-05-05 北京朗知网络传媒科技股份有限公司 Method and device for generating digital media article

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014096776A (en) * 2012-11-12 2014-05-22 Samsung R&D Institute Japan Co Ltd Chapter setting device
CN105718573A (en) * 2016-01-20 2016-06-29 电子科技大学 Attention relationship extracting and annotating method in view of user interests
US20170039275A1 (en) * 2015-08-03 2017-02-09 International Business Machines Corporation Automated Article Summarization, Visualization and Analysis Using Cognitive Services
CN106656861A (en) * 2016-12-15 2017-05-10 咪咕数字传媒有限公司 Electronic book pushing method and device
US20180336193A1 (en) * 2017-05-18 2018-11-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Based Method and Apparatus for Generating Article
CN109325223A (en) * 2018-07-24 2019-02-12 广州神马移动信息科技有限公司 Article recommended method, device and electronic equipment
CN110516012A (en) * 2019-08-30 2019-11-29 广东工业大学 A kind of character relation map construction method
CN111433767A (en) * 2017-11-20 2020-07-17 乐威指南公司 System and method for filtering supplemental content of an electronic book
CN112329453A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Sample chapter generation method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014096776A (en) * 2012-11-12 2014-05-22 Samsung R&D Institute Japan Co Ltd Chapter setting device
US20170039275A1 (en) * 2015-08-03 2017-02-09 International Business Machines Corporation Automated Article Summarization, Visualization and Analysis Using Cognitive Services
CN105718573A (en) * 2016-01-20 2016-06-29 电子科技大学 Attention relationship extracting and annotating method in view of user interests
CN106656861A (en) * 2016-12-15 2017-05-10 咪咕数字传媒有限公司 Electronic book pushing method and device
US20180336193A1 (en) * 2017-05-18 2018-11-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Based Method and Apparatus for Generating Article
CN111433767A (en) * 2017-11-20 2020-07-17 乐威指南公司 System and method for filtering supplemental content of an electronic book
CN109325223A (en) * 2018-07-24 2019-02-12 广州神马移动信息科技有限公司 Article recommended method, device and electronic equipment
CN110516012A (en) * 2019-08-30 2019-11-29 广东工业大学 A kind of character relation map construction method
CN112329453A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Sample chapter generation method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨凤娇;姚帅;曹慧仪;: "交互式叙事:融媒体语境下的人物特稿写作", 新闻与写作, no. 05, 5 May 2018 (2018-05-05) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069936A (en) * 2023-02-28 2023-05-05 北京朗知网络传媒科技股份有限公司 Method and device for generating digital media article
CN116069936B (en) * 2023-02-28 2023-08-01 北京朗知网络传媒科技股份有限公司 Method and device for generating digital media article

Also Published As

Publication number Publication date
CN113656576B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN108009293B (en) Video tag generation method and device, computer equipment and storage medium
CN109657054B (en) Abstract generation method, device, server and storage medium
CN110362370B (en) Webpage language switching method and device and terminal equipment
Sun et al. Dom based content extraction via text density
CN110263248A (en) A kind of information-pushing method, device, storage medium and server
CN109388801B (en) Method and device for determining similar word set and electronic equipment
CN110162750A (en) Text similarity detection method, electronic equipment and computer readable storage medium
CN107193892B (en) A kind of document subject matter determines method and device
US20150154537A1 (en) Categorizing a use scenario of a product
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN109857853B (en) Searching method based on electronic book, electronic equipment and computer storage medium
CN112036187A (en) Context-based video barrage text auditing method and system
CN113656576A (en) Article summary generation method and device, computing device and storage medium
US11803796B2 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
CN108628875B (en) Text label extraction method and device and server
CN110909247B (en) Text information pushing method, electronic equipment and computer storage medium
CN106919603B (en) Method and device for calculating word segmentation weight in query word mode
CN109542299B (en) Gold sentence display method for electronic book, electronic equipment and computer storage medium
KR102309870B1 (en) Method and apparatus for text summary in display ad
CN108415959B (en) Text classification method and device
CN106919649B (en) Entry weight calculation method and device
CN111144122A (en) Evaluation processing method, evaluation processing device, computer system, and medium
CN107590163B (en) The methods, devices and systems of text feature selection
CN115879442A (en) Method and system for dynamically calculating weight of keyword
GB2608112A (en) System and method for providing media content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant