CN110134781A - A kind of automatic abstracting method of finance text snippet - Google Patents
A kind of automatic abstracting method of finance text snippet Download PDFInfo
- Publication number
- CN110134781A CN110134781A CN201910281459.6A CN201910281459A CN110134781A CN 110134781 A CN110134781 A CN 110134781A CN 201910281459 A CN201910281459 A CN 201910281459A CN 110134781 A CN110134781 A CN 110134781A
- Authority
- CN
- China
- Prior art keywords
- sentence
- emotion
- score value
- financial
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000008451 emotion Effects 0.000 claims abstract description 39
- 239000000284 extract Substances 0.000 claims abstract description 6
- 230000002996 emotional effect Effects 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000004458 analytical method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of automatic abstracting methods of financial text snippet, sentence keyword attribute is extracted first with TF_ISF method, then the emotion attribute of sentence and the topic relativity of computing statement are extracted, it is given a mark by weighting and evaluates the significance level that sentence is made a summary in emotion, abstract sentence Candidate Set is finally filtered according to method for measuring similarity, generates final emotion abstract.The present invention can extract the emotion abstract of financial text automatically, there is biggish application value intelligently throwing the financial technology fields such as Gu, financial institution's analyst's viewpoint that count off is contained in is ground as extracted and summarizing magnanimity automatically, there is important directive function to major class Asset Allocation.
Description
Technical field
The present invention relates to the fields such as financial technology, data mining, information retrieval more particularly to a kind of financial text snippets certainly
Dynamic abstracting method.
Background technique
With the fast development of information technology and the arriving of big data era, the automatic processing problems demand of Financial Information
It solves, the method for traditional artificial Extracting Information has been far from satisfying the demand of investor.Financial Information is mainly derived from non-knot
The text data of structure, such as enterprise annual reports, bulletin, news, policies and regulations, market grind report, effectively excavate this type of information to gold
The development for melting business has important value.In this context, text summarization technology starts to get more and more people's extensive concerning.
Text automatic abstracting method is broadly divided into two classes: autoabstract based on semantic understanding and based on word frequency statistics from
Dynamic abstract.The former need by related fields corpus and natural language processing semantic analysis come understand original text and generate text
It plucks, limitation is larger, and technology is not mature enough;The latter is based primarily upon text structural information, however the text feature letter extracted
Cease it is not comprehensive, sentence redundancy, it is discontinuous the problems such as it is more prominent.
Foreign countries are more early to the research starting of autoabstract, and the achievement obtained at present is more.Wherein, method has base earlier
In the method for high frequency words marking;Method based on sentence position and clue word feature;Based on sentence information content, continuity and similar
Property, the method for abstract sentence is ranked up and selected to sentence using integral linear programming;Semantic close is combined using probability statistics
The method of system;Pass through the method etc. for combining the integrated informations such as sentence length and similitude to give a mark sentence.Newest method
There is the method based on LexRank, sentence weight is passed through graphical representation, calculating adjacent node with vector space model by this method
Sentence similarity, extract with the maximum sentence of adjacent node similarity as digest candidate sentence generate abstract;In addition, there are also bases
Sentence is indicated with node in the concept of figure, sentence relationship is indicated with side to construct the automaticabstracting of complex network.
Compared to foreign countries, the country starts late to the research of autoabstract, and current more typical method has to be weighed based on descriptor
The method of weight and feature weight;The more document emotion method of abstracting of Chinese based on PageRank;There are also have prison based on LDA model
Educational inspector's learning method etc..The system of comparative maturity has OA Chinese literature autoabstract of the Shanghai Communications University based on apery algorithm at present
System;Harbin Institute of Technology is based on semantic analysis, HIT-971 English automatic abstracting system of understanding etc..
Summary of the invention
The problem to be solved in the present invention is how to extract the emotion abstract of financial text automatically.In order to solve this problem, originally
Invention proposes a kind of automatic abstracting method of financial text snippet.
The purpose of the present invention is what is be achieved through the following technical solutions: a kind of automatic abstracting method of finance text snippet, packet
Include following steps:
(1) data prediction specifically includes following sub-step:
(1.1) it is successively read each text d of financial text corpusi;
(1.2) it reads and deactivates dictionary, delete text diIn all stop words;
(1.3) financial vocabulary ontology is read, to diEach sentence of content segments, and participle sentence is generated, to diTitle point
Word generates participle title;
(2) emotion critical sentence extracts, and specifically includes following sub-step:
(2.1) for each vocabulary wi, successively count text diIn include wiSentence number;
(2.2) d is successively calculatediIn each sentence siKeyword attribute score value key (si);
(2.3) sentiment dictionary is read, successively match statement siIn each emotion word, obtain its emotion tendency and emotion
Intensity value calculates siEmotion attribute score value sent (si);
(2.4) thesaurus is read, successively computing statement siWith the same words number and synonym number of title t, calculate
Sentence siTopic correlativity score value corr (si, t);
(2.5) according to sentence siKeyword attribute score value key (si), emotion attribute score value sent (si), topic correlativity
Score value corr (si, t) and calculate siEmotion give a mark score (si);
(3) autoabstract is extracted, and specifically includes following sub-step:
(3.1) it is given a mark according to emotion by diAll sentences sort from high to low, K sentence group is combined into candidate and plucks before extracting
Want cand_abs;
(3.2) similarity for calculating every two sentence in cand_abs, if more than threshold value, then by the lower language of emotion score value
Sentence is deleted from cand_abs;
(3.3) by the remaining sentence of cand_abs according in urtext diThe sequencing of middle appearance sorts, and generates most
Whole abstract cand is simultaneously exported.
Further, the step 2.2 includes following sub-step:
(2.2.1) successively counts each vocabulary wiIn siWord frequency, calculate wiTF-ISF score value, and computing statement si's
TF-ISF accumulates score value TFISF (si);
(2.2.2) reads indicative word lists, counts sentence siIn all indicative word number ind (si), computing statement
siKeyword attribute score value key (si)=TFISF (si)·ind(si)。
Further, in the step 2.3, siEmotion attribute score value
Wherein ori (ewI, k) it is sentence siIn k-th of emotion word emotion tendency, cont (ewI, k) it is sentence siIn k-th of emotion word
Emotional intensity value, n be sentence siIn emotion word number.
Further, in the step 2.4, sentence siTopic correlativity score valueWherein sam (si, t) and it is sentence siWith the same words number of title t, syn (si, t) be
Sentence siWith the synonym number of title t.
Further, in the step 2.5, sentence siEmotion give a mark score (si)=key (si)·sent(si)·
corr(si, t).
Further, in the step 3.2, every two sentence siAnd sjSimilarityWherein sam (si, sj) it is sentence siWith sentence sjSame words number, syn (si,
sj) it is sentence siWith sentence sjSynonym number.
The beneficial effects of the present invention are:
1, the marking element that extracting keywords attribute, emotion attribute and topic relativity are made a summary as emotion, from three
Different aspect guarantees the information content of feature, improves the accuracy rate of emotion marking and the representativeness of abstract sentence;
2, the measurement standard using TF-ISF as keyword, has effectively distinguished different vocabulary to the significance level of sentence,
Play a significant role to filtering out noise vocabulary and improving arithmetic accuracy;
3, method of weighting is used in multiple score functions, and the importance of different characteristic can be adjusted flexibly, ensure that calculation
Configuration flexibility of the method in various practical application scenes.
Detailed description of the invention
Fig. 1 is the automatic abstracting method flow chart of financial text snippet.
Specific embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
As shown in Figure 1, the present invention provides a kind of financial text snippet automatic abstracting method, comprising the following steps:
(1) data prediction specifically includes following sub-step:
(1.1) it is successively read each text d of financial text corpus Corpi={ s1, s2..., sN};
(1.2) it reads and deactivates dictionary, delete text diIn all stop words;
(1.3) financial vocabulary ontology is read, to diEach sentence s of contentiParticiple generates participle sentence si=< w1,
w2..., wm>, to diTitle t participle, generate participle title t=< wt1, wt2..., wtm>;
(2) emotion critical sentence extracts, and specifically includes following sub-step:
(2.1) for each vocabulary wi, successively count text diIn include wiSentence number nwi;
(2.2) d is successively calculatediIn each sentence siKeyword attribute score value key (si), specifically:
(2.2.1) successively counts each vocabulary wiIn siWord frequency TF (wi), w is calculated according to formula (1)iTF-ISF point
Value W (wi), according to formula (2) computing statement siTF-ISF accumulate score value TFISF (si);
Wherein W (wI, k) it is sentence siIn k-th of vocabulary TF-ISF score value;
(2.2.2) reads indicative word lists, counts sentence siIn all indicative word number ind (si), according to formula
(3) computing statement siKeyword attribute score value key (si);
key(si)=TFISF (si)·ind(si) (3)
Indicative word lists specifically include adversative word lists and conjunction word lists etc.;
(2.3) sentiment dictionary is read, successively match statement siIn each emotion word ewi, obtain its emotion tendency ori
(ewi) and emotional intensity value cont (ewi), s is calculated according to formula (4)iEmotion attribute score value sent (si);
Wherein ori (ewI, k) it is sentence siIn k-th of emotion word emotion tendency, cont (ewI, k) it is sentence siMiddle kth
The emotional intensity value of a emotion word, n are sentence siIn emotion word number.
(2.4) thesaurus is read, successively computing statement siWith the same words number sam (s of title ti, t) and synonym number
Mesh syn (si, t), according to formula (5) computing statement siTopic correlativity score value corr (si, t);
(2.5) according to formula (6) computing statement siEmotion give a mark score (si);
score(si)=key (si)·sent(si)·corr(si, t) and (6)
(3) autoabstract is extracted, and specifically includes following sub-step:
(3.1) according to emotion marking score (si) by diAll sentence s1~sNIt sorts from high to low, K language before extracting
Sentence is as candidate abstract cand_abs=< s1, s2..., sK>;Parameter K according to application scenarios it needs to be determined that, such as may be selected
First 5, preceding 10 sentences etc. are as candidate abstract;
(3.2) every two sentence s in cand_abs is calculated according to formula (7)iAnd sjSimilarity sim (si, sj), if greatly
In threshold value σ, then the lower sentence of emotion score value is deleted from cand_abs;
Threshold value σ is adjusted according to application scenarios, and value range is that the bigger clip Text of 0~1, σ value is more discrete;
(3.3) by the remaining sentence s of cand_abs1~srAccording in urtext diThe sequencing of middle appearance sorts, raw
At final digest cand and export.
The present invention is directed to the emotion abstract automatic extraction task of financial text, proposes a kind of financial text snippet and takes out automatically
Method is taken, automated decision-making system can be effectively improved to the treatment effeciency of financial text information, intelligently throwing the financial technology such as Gu
It can play a significant role in field.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and
In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (6)
1. a kind of automatic abstracting method of finance text snippet, which comprises the following steps:
(1) data prediction specifically includes following sub-step:
(1.1) it is successively read each text d of financial text corpusi;
(1.2) it reads and deactivates dictionary, delete text diIn all stop words;
(1.3) financial vocabulary ontology is read, to diEach sentence of content segments, and participle sentence is generated, to diTitle participle,
Generate participle title;
(2) emotion critical sentence extracts, and specifically includes following sub-step:
(2.1) for each vocabulary wi, successively count text diIn include wiSentence number;
(2.2) d is successively calculatediIn each sentence siKeyword attribute score value key (si);
(2.3) sentiment dictionary is read, successively match statement siIn each emotion word, obtain its emotion tendency and emotional intensity
Value calculates siEmotion attribute score value sent (si);
(2.4) thesaurus is read, successively computing statement siWith the same words number and synonym number of title t, computing statement si
Topic correlativity score value corr (si, t);
(2.5) according to sentence siKeyword attribute score value key (si), emotion attribute score value sent (si), topic correlativity score value
corr(si, t) and calculate siEmotion give a mark score (si);
(3) autoabstract is extracted, and specifically includes following sub-step:
(3.1) it is given a mark according to emotion by diAll sentences sort from high to low, extract before K sentence group be combined into candidate abstract
cand_abs;
(3.2) calculate cand_abs in every two sentence similarity, if more than threshold value, then by the lower sentence of emotion score value from
Cand_abs is deleted;
(3.3) by the remaining sentence of cand_abs according in urtext diThe sequencing of middle appearance sorts, and generates final digest
Cand is simultaneously exported.
2. the automatic abstracting method of a kind of financial text snippet according to claim 1, which is characterized in that step 2.2 packet
Include following sub-step:
(2.2.1) successively counts each vocabulary wiIn siWord frequency, calculate wiTF-ISF score value, and computing statement siTF-ISF
Accumulate score value TFISF (si);
(2.2.2) reads indicative word lists, counts sentence siIn all indicative word number ind (si), computing statement si's
Keyword attribute score value key (si)=TFISF (si)·ind(si)。
3. the automatic abstracting method of a kind of financial text snippet according to claim 1, which is characterized in that in the step 2.3,
siEmotion attribute score valueWherein ori (ewI, k) it is sentence siIn k-th of feelings
Feel the emotion tendency of word, cont (ewI, k) it is sentence siIn k-th of emotion word emotional intensity value, n be sentence siIn emotion
Word number.
4. the automatic abstracting method of a kind of financial text snippet according to claim 1, which is characterized in that in the step 2.4,
Sentence siTopic correlativity score valueWherein sam (si, t) and it is sentence siWith title t's
Same words number, syn (si, t) and it is sentence siWith the synonym number of title t.
5. the automatic abstracting method of a kind of financial text snippet according to claim 1, which is characterized in that in the step 2.5,
Sentence siEmotion give a mark score (si)=key (si)·sent(si)·corr(si, t).
6. the automatic abstracting method of a kind of financial text snippet according to claim 1, which is characterized in that in the step 3.2,
Every two sentence siAnd sjSimilarityWherein sam (si, sj) it is sentence siWith sentence
sjSame words number, syn (si, sj) it is sentence siWith sentence sjSynonym number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910281459.6A CN110134781A (en) | 2019-04-09 | 2019-04-09 | A kind of automatic abstracting method of finance text snippet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910281459.6A CN110134781A (en) | 2019-04-09 | 2019-04-09 | A kind of automatic abstracting method of finance text snippet |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110134781A true CN110134781A (en) | 2019-08-16 |
Family
ID=67569516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910281459.6A Pending CN110134781A (en) | 2019-04-09 | 2019-04-09 | A kind of automatic abstracting method of finance text snippet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110134781A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401045A (en) * | 2020-03-16 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Text generation method and device, storage medium and electronic equipment |
CN112784585A (en) * | 2021-02-07 | 2021-05-11 | 新华智云科技有限公司 | Abstract extraction method and terminal for financial bulletin |
CN114417821A (en) * | 2022-03-29 | 2022-04-29 | 南昌华梦达航空科技发展有限公司 | Financial text checking and analyzing system based on cloud platform |
IT202200007820A1 (en) | 2022-04-20 | 2022-07-20 | Orma Lab Srl | SYSTEM AND METHOD FOR THE AUTOMATIC SUGGESTION OF FACILITATED FINANCE INSTRUMENTS WITH IMPROVEMENT OF REPUTATIONAL PERFORMANCE |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130103623A1 (en) * | 2011-10-21 | 2013-04-25 | Educational Testing Service | Computer-Implemented Systems and Methods for Detection of Sentiment in Writing |
CN104281645A (en) * | 2014-08-27 | 2015-01-14 | 北京理工大学 | Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency |
CN105022725A (en) * | 2015-07-10 | 2015-11-04 | 河海大学 | Text emotional tendency analysis method applied to field of financial Web |
-
2019
- 2019-04-09 CN CN201910281459.6A patent/CN110134781A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130103623A1 (en) * | 2011-10-21 | 2013-04-25 | Educational Testing Service | Computer-Implemented Systems and Methods for Detection of Sentiment in Writing |
CN104281645A (en) * | 2014-08-27 | 2015-01-14 | 北京理工大学 | Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency |
CN105022725A (en) * | 2015-07-10 | 2015-11-04 | 河海大学 | Text emotional tendency analysis method applied to field of financial Web |
Non-Patent Citations (1)
Title |
---|
李宪毅: "面向评论文本的多文档情感摘要研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401045A (en) * | 2020-03-16 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Text generation method and device, storage medium and electronic equipment |
CN111401045B (en) * | 2020-03-16 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Text generation method and device, storage medium and electronic equipment |
CN112784585A (en) * | 2021-02-07 | 2021-05-11 | 新华智云科技有限公司 | Abstract extraction method and terminal for financial bulletin |
CN114417821A (en) * | 2022-03-29 | 2022-04-29 | 南昌华梦达航空科技发展有限公司 | Financial text checking and analyzing system based on cloud platform |
IT202200007820A1 (en) | 2022-04-20 | 2022-07-20 | Orma Lab Srl | SYSTEM AND METHOD FOR THE AUTOMATIC SUGGESTION OF FACILITATED FINANCE INSTRUMENTS WITH IMPROVEMENT OF REPUTATIONAL PERFORMANCE |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020103654A4 (en) | Method for intelligent construction of place name annotated corpus based on interactive and iterative learning | |
CN110134781A (en) | A kind of automatic abstracting method of finance text snippet | |
Wang et al. | Integrating extractive and abstractive models for long text summarization | |
Han et al. | Lexical normalization for social media text | |
CN103049435B (en) | Text fine granularity sentiment analysis method and device | |
CN105243129A (en) | Commodity property characteristic word clustering method | |
CN104462378A (en) | Data processing method and device for text recognition | |
CN103914494A (en) | Method and system for identifying identity of microblog user | |
CN110807326B (en) | Short text keyword extraction method combining GPU-DMM and text features | |
CN110738033B (en) | Report template generation method, device and storage medium | |
CN110309400A (en) | A kind of method and system that intelligent Understanding user query are intended to | |
CN111626042B (en) | Reference digestion method and device | |
CN102955771A (en) | Technology and system for automatically recognizing Chinese new words in single-word-string mode and affix mode | |
CN103324626A (en) | Method for setting multi-granularity dictionary and segmenting words and device thereof | |
CN103514213A (en) | Term extraction method and device | |
CN109086355A (en) | Hot spot association relationship analysis method and system based on theme of news word | |
CN109815401A (en) | A kind of name disambiguation method applied to Web people search | |
CN111310467B (en) | Topic extraction method and system combining semantic inference in long text | |
Ao et al. | News keywords extraction algorithm based on TextRank and classified TF-IDF | |
Laboreiro et al. | Determining language variant in microblog messages | |
CN112528640A (en) | Automatic domain term extraction method based on abnormal subgraph detection | |
CN110705285A (en) | Government affair text subject word bank construction method, device, server and readable storage medium | |
CN115617965A (en) | Rapid retrieval method for language structure big data | |
CN108595434B (en) | Syntax dependence method based on conditional random field and rule adjustment | |
CN111767730A (en) | Event type identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |
|
RJ01 | Rejection of invention patent application after publication |