CN113918708A - Abstract extraction method - Google Patents
Abstract extraction method Download PDFInfo
- Publication number
- CN113918708A CN113918708A CN202111532196.5A CN202111532196A CN113918708A CN 113918708 A CN113918708 A CN 113918708A CN 202111532196 A CN202111532196 A CN 202111532196A CN 113918708 A CN113918708 A CN 113918708A
- Authority
- CN
- China
- Prior art keywords
- words
- word
- level
- semantic
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 210000001072 colon Anatomy 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims 1
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 239000002994 raw material Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012550 audit Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012847 principal component analysis method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, in particular to a method for abstracting an abstract, which comprises the following steps: s1, preprocessing, namely generalizing the numerical value and time type data in the bulletin text; s2, constructing a first word list; s3, constructing a word co-occurrence matrix of the first word list; s4, reducing the dimension of the word co-occurrence matrix, and extracting semantic representations of all words in the first word list; s5, repeating S2 to S4, and extracting semantic representations of all words in the bulletin text; s6, accumulating and combining the semantic representations by taking the sentences as units to form sentence context semantic representations; s7, inputting a key phrase by a user, and extracting semantic representation of the key phrase; and S8, judging the similarity between the semantic representation of the key phrase and the semantic representation of the sentence context, and if the similarity of the key phrase is more than a set value, extracting the bulletin text sentence comprising the key phrase into a public text abstract. The association degree between the content of the abstract and the keywords input by the user is high.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for abstracting an abstract.
Background
At present, the number of enterprises appearing on the market is increasing day by day, and the public notice of the companies appearing on the market, namely the temporary or annual relevant operating conditions of finance, business and the like of the companies appearing on the market, contains a large amount of information; however, the bulletin text lacks of standard writing specifications and is long in space, reading and data analysis are not facilitated, and key sentences and other information need to be manually extracted from the bulletin text by data analysis and auditors, so that the working efficiency is low. Therefore, it is necessary to provide a method for extracting the abstract of the public company bulletin text, which compresses the text of the bulletin text, removes the 'redundant' information which is not concerned by the analysis and audit staff, and improves the work efficiency of the analysis and audit staff.
At present, a relevant abstract extraction method is available, but sentences containing key words or words similar to the key words in semantics are mainly searched through full text, and extracted and synthesized abstracts are obtained. The method mainly adopts the technology of word vector similarity calculation. However, the existing abstract extraction method has some problems when applied to the public company bulletin text, which is mainly reflected in that only semantic associations among keywords are considered, the semantic associations between the keywords and paragraphs and chapters are not considered, and part of the keywords extend through the whole bulletin text, so that the abstract extraction content is not accurate enough.
Disclosure of Invention
The invention provides a method for extracting an abstract, which aims to solve the problem that the content of the abstract extracted by the existing abstract extracting method is not accurate enough and has high association degree with keywords input by a user.
A method for abstracting abstract comprises the following steps:
s1, preprocessing, namely generalizing the numerical value and time type data in the bulletin text;
s2, constructing a first word list;
s3, constructing a word co-occurrence matrix of the first word list;
s4, reducing the dimension of the word co-occurrence matrix, and extracting semantic representations of all words in the first word list;
s5, repeating S2 to S4, and extracting semantic representations of all words in the bulletin text;
s6, accumulating and combining the semantic representations by taking the sentences as units to form sentence context semantic representations;
s7, inputting a key phrase by a user, and extracting semantic representation of the key phrase;
and S8, judging the similarity between the semantic representation of the key phrase and the semantic representation of the sentence context, and if the similarity of the key phrase is more than a set value, extracting the bulletin text sentence comprising the key phrase into a public text abstract.
In the method, the semantic representation of the words is extracted, the similarity between the semantic representation of the sentence context and the semantic representation of the key phrase is judged, the sentences with the similarity larger than a set value are extracted to form a bulletin text abstract, and the association degree between the abstract content and the key words input by a user is high;
further, replacing the bulletin textTextThe numerical value in the text is a Chinese character numerical value, and the announcement text is replacedTextThe middle time is the Chinese character time;
eliminating the mark number in the punctuation mark, and the pause number and the colon number in the point number, and decomposing the bulletin text into sentences by using the reserved point number as a separator; bulletin text by adopting jieba word-separating methodTextPerforming Chinese word segmentation, after eliminating stop words in the Chinese words, weighting the words by adopting TFIDF, and arranging the words from large to small according to weight;
further, the step of constructing the first vocabulary in S2 includes obtaining words of 2000 words before the weight arrangement to construct the first vocabularyWords;
Whereinw i Is shown asiThe number of the words is one,w j is shown asjThe number of the words is one,nis the number of words, 。
Further, the S3 includes the step of,
for any two words appearing in the same sentence, the same paragraph and the same chapterw i Andw j establishing association and constructing word co-occurrence matrix
is a discourse level co-occurrence matrix; matrix row indexiColumn indexjRespectively representing two co-occurrence wordsw i Andw j an index of (2); the elements in the matrix represent the joint probability of two words pointed by row and column indexes。
Further, the step S4 includes reducing dimensions of the sentence-level co-occurrence matrix of words, the paragraph-level co-occurrence matrix of words, and the chapter-level co-occurrence matrix of words by using a principal component analysis method, where the dimensions after the dimension reduction are 2000 × 100, where 2000 represents the number of words and 100 represents the dimension of each semantic vector of words; after dimensionality reduction, the three-level vector of the word co-occurrence matrix is three-level semantic representation; the three-level semantic representations are sentence-level, paragraph-level and chapter-level semantic representations of words; and extracting three-level semantic representations of all the words in the first word list.
Further, the dimension reduction calculation formula is as follows:
Further, the S5 is repeated each time, and S2 constructs a vocabulary respectively until all words in the bulletin text are included, and the vocabulary is sequentially words 2000 before weight arrangement;
further, the statement context three-level semantic representation is
Wherein t is the t-th word in the sentence.
The S7 comprises the steps that a user inputs a key phrase, and three-level semantic representations of all key words of the key phrase are extracted; accumulating and combining the three-level semantic representations of all the keywords to form a keyword group three-level semantic representation; the key phrase three-level semantic representation is;
Wherein t is the t-th word in the key phrase.
Furthermore, a semantic similarity calculation model based on a twin neural network is constructed, the semantic similarity calculation model based on the twin neural network comprises two groups of isomorphic feedback neural networks, the input of the semantic similarity calculation model based on the twin neural network is the three-level semantic representation of statement context and the three-level semantic representation of user key word groups, and the output is similarity;
inputting three-level semantic representations of statement context and three-level semantic representations of user key phrases, and extracting statements corresponding to the input three-level semantic representations of statement context when the similarity is greater than a set value;
and repeating the steps, and sequentially inputting all the three-level semantic representations of the sentence context in the bulletin text until all the sentences with the similarity between the three-level semantic representations of the sentence context in the bulletin text and the three-level semantic representations of the user key phrase being more than a set value are extracted to form the bulletin text abstract.
Has the advantages that: by extracting semantic representations of words, the similarity between the sentence context semantic representation and the keyword group semantic representation is judged, sentences with the similarity larger than a set value are extracted to form a bulletin text abstract, and the association degree between the abstract content and keywords input by a user is high; the 'redundant' information which is not concerned by the user can be effectively removed, and the working efficiency of the user is improved.
Drawings
The invention is described in further detail below with reference to the figures and specific embodiments.
Fig. 1 is a flowchart of the present embodiment.
FIG. 2 is an architecture diagram of the present embodiment of a twin neural network.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the present embodiment provides a method for extracting a summary by taking a public company bulletin text as an example, and specifically includes the following steps.
S1, preprocessing, namely generalizing the numerical value and time type data in the bulletin text; comprises the steps of (a) preparing a mixture of a plurality of raw materials,
replacement bulletin textTextThe numerical value in the text is a Chinese character numerical value, and the announcement text is replacedTextThe middle time is the Chinese character time;
eliminating the mark number in the punctuation mark, and the pause number and the colon number in the point number, and decomposing the bulletin text into sentences by using the reserved point number as a separator; bulletin text by adopting jieba word-separating methodTextPerforming Chinese word segmentation, and obtaining a bulletin text after eliminating stop words in the Chinese word segmentationTextThe term (a);
and weighting the words by adopting TFIDF, and arranging the words from large to small according to the weight.
S2, constructing a first vocabulary, including,
obtaining words of 2000 before weight arrangement to construct first word listWords;
Whereinw i Is shown asiThe number of the words is one,w j is shown asjThe number of the words is one,nis the number of words, 。
S3, constructing a word co-occurrence matrix of the first word list;
for any two words appearing in the same sentence, the same paragraph and the same chapterw i Andw j establishing association and constructing word co-occurrence matrix,
matrix row indexiColumn indexjRespectively representing two co-occurrence wordsw i Andw j an index of (2); the elements in the matrix represent the joint probability of two words pointed by row and column indexes。
S4, reducing the dimension of the word co-occurrence matrix, and extracting semantic representations of all words in the first word list; comprises the steps of (a) preparing a mixture of a plurality of raw materials,
reducing dimensions of the sentence-level word co-occurrence matrix, the paragraph-level word co-occurrence matrix and the chapter-level word co-occurrence matrix by adopting a principal component analysis method, wherein the dimensions after the dimension reduction are 2000 x 100, wherein 2000 represents the number of words, and 100 represents the dimension of each word semantic vector; after dimensionality reduction, the three-level vector of the word co-occurrence matrix is three-level semantic representation; the three-level semantic representations are sentence-level, paragraph-level and chapter-level semantic representations of words; the dimensionality reduction calculation formula is as follows:
to representFirst in the word co-occurrence matrixkThree levels of semantic representation of individual words.
S2-S4 form a three-level semantic coding method, and three-level semantic representations of all words in the bulletin text are extracted by the three-level semantic coding method. And splitting the sentence as a unit, and accumulating and combining the three-level semantic representations of the sentence context words to form the three-level semantic representation of the sentence context.
S5, repeating S2 to S4, and extracting semantic representations of all words in the bulletin text; repeating each time, S2 respectively constructing a word list until all words in the bulletin text are included, wherein the word list is sequentially the words 2000 before the weight arrangement;
s6, accumulating and combining the semantic representations by taking the sentences as units to form sentence context semantic representations;
the sentence context three-level semantic representation is as follows:
wherein t is the t-th word in the sentence.
S7, inputting key phrase by user, extracting semantic representation of key phrase, including
Inputting a key phrase by a user, and extracting three-level semantic representations of all key words of the key phrase; accumulating and combining the three-level semantic representations of all the keywords to form a keyword group three-level semantic representation;
the third-level semantic representation of the key phrase is as follows:
S8, judging the similarity between the sentence context semantic representation and the key phrase semantic representation, and extracting sentences with the similarity larger than a set value to form a bulletin text abstract; comprises the steps of (a) preparing a mixture of a plurality of raw materials,
constructing a semantic similarity calculation model based on the twin neural network as follows:
as shown in fig. 2, the semantic similarity calculation model based on the twin neural network includes two groups of isomorphic feedback neural networks, the inputs of which are three-level semantic representations of sentence context and three-level semantic representations of user key phrases, and the output is similarity; the semantic similarity calculation model based on the twin neural network specifically comprises two independent parallel input layers, two independent parallel hidden layers and an output layer; the input layer dimension is 1 x 100; the hidden layer dimension is 1 x 10; the two independent parallel input layers are respectively connected with the two independent parallel hidden layers by adopting a Sigmoid activated function, and the two independent parallel hidden layers are commonly connected with the output layer by adopting the Sigmoid activated function; the output layer is a cross entropy loss function; the output of the output layer is similarity;
adopting the three-level semantic representation of the statement context and the three-level semantic representation of the user key word group as the input of a semantic similarity calculation model based on a twin neural network, training the semantic similarity calculation model based on the twin neural network, and calculating the similarity of the three-level semantic representation of the statement context and the three-level semantic representation of the user key word group through the semantic similarity calculation model based on the twin neural networkSimilarity(Text,keywords);
Specifically, the sentence context three-level semantic representation adopts a Sigmoid activated function to connect one of two independent parallel input layers, and the keyword group three-level semantic representation adopts a Sigmoid activated function to connect the other input layer;
judging the similarity between the semantic representation of the key phrase and the semantic representation of the sentence context, wherein the set value is 0.7, and when the similarity is highAnd extracting the bulletin text sentences including the key phrases into public text abstracts.
S6-S8 form a summary extraction method for context semantic similarity calculation, and extract sentences containing key information in the bulletin text to form a summary.
The abstract extraction method provided by the embodiment extracts three levels of semantic representations of a sentence level, a paragraph level and a chapter level of words by a three-level semantic coding method; and judging the similarity between the sentence context semantic representation and the key phrase semantic representation by a abstract extraction method of context semantic similarity calculation, and extracting sentences with the similarity larger than a set value to form the abstract of the bulletin text.
The abstract extraction method provided by the implementation considers the relevance of the keywords input by the user with sentences, paragraphs and chapters, and accurately extracts the sentences with high relevance with the keywords input by the user to form abstract texts; the 'redundant' information which is not concerned by the user can be effectively removed, and the working efficiency of the user is improved.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.
Claims (10)
1. A method for abstracting an abstract is characterized by comprising the following steps:
s1, preprocessing, namely generalizing the numerical value and time type data in the bulletin text;
s2, constructing a first word list;
s3, constructing a word co-occurrence matrix of the first word list;
s4, reducing the dimension of the word co-occurrence matrix, and extracting semantic representations of all words in the first word list;
s5, repeating S2 to S4, and extracting semantic representations of all words in the bulletin text;
s6, accumulating and combining the semantic representations by taking the sentences as units to form sentence context semantic representations;
s7, inputting a key phrase by a user, and extracting semantic representation of the key phrase;
and S8, judging the similarity between the semantic representation of the key phrase and the semantic representation of the sentence context, and if the similarity of the key phrase is more than a set value, extracting the bulletin text sentence comprising the key phrase into a public text abstract.
2. The digest extraction method according to claim 1, wherein the S1 includes,
replacement bulletin textTextThe numerical value in the text is a Chinese character numerical value, and the announcement text is replacedTextThe middle time is the Chinese character time;
eliminating the mark number in the punctuation mark, and the pause number and the colon number in the point number, and decomposing the bulletin text into sentences by using the reserved point number as a separator; bulletin text by adopting jieba word-separating methodTextPerforming Chinese word segmentation, after eliminating stop words in the Chinese words, weighting the words by adopting TFIDF, and arranging the words from large to small according to the weight.
3. The method for abstracting abstract as claimed in claim 2, wherein said S2 constructing the first vocabulary includes constructing the first vocabulary by obtaining 2000 words before weight arrangementWords;
4. The digest extraction method according to claim 3, wherein the S3 includes,
for any two words appearing in the same sentence, the same paragraph and the same chapterw i Andw j establishing association and constructing word co-occurrence matrix
5. The abstract extraction method as claimed in claim 4, wherein the S4 includes using principal component analysis to perform dimension reduction on the sentence-level word co-occurrence matrix, paragraph-level word co-occurrence matrix, and chapter-level word co-occurrence matrix, respectively, the dimension after dimension reduction is 2000 x 100, where 2000 represents the number of words and 100 represents the dimension of each word semantic vector; after dimensionality reduction, the three-level vector of the word co-occurrence matrix is three-level semantic representation; the three-level semantic representations are sentence-level, paragraph-level and chapter-level semantic representations of words; and extracting three-level semantic representations of all the words in the first word list.
6. The abstract extraction method as claimed in claim 5, wherein the dimension reduction calculation formula is as follows:
7. The method for abstracting abstract as claimed in claim 6, wherein in S5, each repetition, S2 constructs a vocabulary respectively until all words in the bulletin text are included, and the vocabulary is in turn the words with weight ranking of 2000.
9. The method for abstracting abstract as claimed in claim 8, wherein the step S7 includes inputting a keyword group by a user, extracting three-level semantic representations of all keywords of the keyword group; accumulating and combining the three-level semantic representations of all the keywords to form a keyword group three-level semantic representation; the key phrase three-level semantic representation is
Wherein t is the t-th word in the key phrase.
10. The digest extraction method according to claim 9, wherein the S8 includes,
the method comprises the steps that a semantic similarity calculation model based on a twin neural network is constructed, the semantic similarity calculation model based on the twin neural network comprises two groups of isomorphic feedback neural networks, the input of the semantic similarity calculation model based on the twin neural network is three-level semantic representation of statement context and three-level semantic representation of user key word groups, and the output is similarity;
inputting three-level semantic representations of statement context and three-level semantic representations of user key phrases, and extracting statements corresponding to the input three-level semantic representations of statement context when the similarity is greater than a set value;
and repeating the steps, and sequentially inputting all the three-level semantic representations of the sentence context in the bulletin text until all the sentences with the similarity between the three-level semantic representations of the sentence context in the bulletin text and the three-level semantic representations of the user key phrase being more than a set value are extracted to form the bulletin text abstract.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532196.5A CN113918708B (en) | 2021-12-15 | 2021-12-15 | Abstract extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532196.5A CN113918708B (en) | 2021-12-15 | 2021-12-15 | Abstract extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113918708A true CN113918708A (en) | 2022-01-11 |
CN113918708B CN113918708B (en) | 2022-03-22 |
Family
ID=79248937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111532196.5A Active CN113918708B (en) | 2021-12-15 | 2021-12-15 | Abstract extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113918708B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12008332B1 (en) | 2023-08-18 | 2024-06-11 | Anzer, Inc. | Systems for controllable summarization of content |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102646114A (en) * | 2012-02-17 | 2012-08-22 | 清华大学 | News topic timeline abstract generating method based on breakthrough point |
CN104679730A (en) * | 2015-02-13 | 2015-06-03 | 刘秀磊 | Webpage summarization extraction method and device thereof |
CN110069622A (en) * | 2017-08-01 | 2019-07-30 | 武汉楚鼎信息技术有限公司 | A kind of personal share bulletin abstract intelligent extract method |
CN110188349A (en) * | 2019-05-21 | 2019-08-30 | 清华大学深圳研究生院 | A kind of automation writing method based on extraction-type multiple file summarization method |
CN110851598A (en) * | 2019-10-30 | 2020-02-28 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN111259136A (en) * | 2020-01-09 | 2020-06-09 | 信阳师范学院 | Method for automatically generating theme evaluation abstract based on user preference |
WO2021164231A1 (en) * | 2020-02-18 | 2021-08-26 | 平安科技(深圳)有限公司 | Official document abstract extraction method and apparatus, and device and computer readable storage medium |
US20210342552A1 (en) * | 2020-05-01 | 2021-11-04 | International Business Machines Corporation | Natural language text generation from a set of keywords using machine learning and templates |
-
2021
- 2021-12-15 CN CN202111532196.5A patent/CN113918708B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102646114A (en) * | 2012-02-17 | 2012-08-22 | 清华大学 | News topic timeline abstract generating method based on breakthrough point |
CN104679730A (en) * | 2015-02-13 | 2015-06-03 | 刘秀磊 | Webpage summarization extraction method and device thereof |
CN110069622A (en) * | 2017-08-01 | 2019-07-30 | 武汉楚鼎信息技术有限公司 | A kind of personal share bulletin abstract intelligent extract method |
CN110188349A (en) * | 2019-05-21 | 2019-08-30 | 清华大学深圳研究生院 | A kind of automation writing method based on extraction-type multiple file summarization method |
CN110851598A (en) * | 2019-10-30 | 2020-02-28 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN111259136A (en) * | 2020-01-09 | 2020-06-09 | 信阳师范学院 | Method for automatically generating theme evaluation abstract based on user preference |
WO2021164231A1 (en) * | 2020-02-18 | 2021-08-26 | 平安科技(深圳)有限公司 | Official document abstract extraction method and apparatus, and device and computer readable storage medium |
US20210342552A1 (en) * | 2020-05-01 | 2021-11-04 | International Business Machines Corporation | Natural language text generation from a set of keywords using machine learning and templates |
Non-Patent Citations (2)
Title |
---|
李峰等: "一种领域语料驱动的句子相关性计算方法研究", 《计算机科学》 * |
黄亚明等: "面向Web文本语义挖掘的SKR/MetaMap输出概念共现分析***的开发尝试", 《现代图书情报技术》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12008332B1 (en) | 2023-08-18 | 2024-06-11 | Anzer, Inc. | Systems for controllable summarization of content |
Also Published As
Publication number | Publication date |
---|---|
CN113918708B (en) | 2022-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Daud et al. | Urdu language processing: a survey | |
Oh et al. | Why-question answering using intra-and inter-sentential causal relations | |
CN113704451B (en) | Power user appeal screening method and system, electronic device and storage medium | |
US20080027893A1 (en) | Reference resolution for text enrichment and normalization in mining mixed data | |
Murthy et al. | Language identification from small text samples | |
CN108319583B (en) | Method and system for extracting knowledge from Chinese language material library | |
Petersen et al. | Natural Language Processing Tools for Reading Level Assessment and Text Simplication for Bilingual Education | |
Golpar-Rabooki et al. | Feature extraction in opinion mining through Persian reviews | |
CN113918708B (en) | Abstract extraction method | |
Yan et al. | Chemical name extraction based on automatic training data generation and rich feature set | |
Melero et al. | Holaaa!! writin like u talk is kewl but kinda hard 4 NLP | |
Saleh et al. | TxLASM: A novel language agnostic summarization model for text documents | |
JP6168057B2 (en) | Failure occurrence cause extraction device, failure occurrence cause extraction method, and failure occurrence cause extraction program | |
CN115952794A (en) | Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph | |
Liu et al. | Keyword extraction using PageRank on synonym networks | |
Ali et al. | Word embedding based new corpus for low-resourced language: Sindhi | |
Cui | Converting taxonomic descriptions to new digital formats | |
Das et al. | An improvement of Bengali factoid question answering system using unsupervised statistical methods | |
Hamza et al. | Text mining: A survey of Arabic root extraction algorithms | |
Saneifar et al. | From terminology extraction to terminology validation: an approach adapted to log files | |
Worke | INFORMATION EXTRACTION MODEL FROM GE’EZ TEXTS | |
Temesgen | Afaan Oromo News Text Summarization Using Sentence Scoring Method | |
Modrzejewski | Improvement of the Translation of Named Entities in Neural Machine Translation | |
Saleh et al. | TxLASM: A Novel Language Agnostic Summarization Model for Text Documents | |
Dias | Information digestion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20220111 Assignee: Shenzhen Mingji Agricultural Development Co.,Ltd. Assignor: SHENZHEN DIB ENTERPRISE RISK MANAGEMENT TECHNOLOGY CO.,LTD. Contract record no.: X2023980049635 Denomination of invention: A Summary Extraction Method Granted publication date: 20220322 License type: Common License Record date: 20231204 |