CN103473263B - News event development process-oriented visual display method - Google Patents

News event development process-oriented visual display method Download PDF

Info

Publication number
CN103473263B
CN103473263B CN201310303085.6A CN201310303085A CN103473263B CN 103473263 B CN103473263 B CN 103473263B CN 201310303085 A CN201310303085 A CN 201310303085A CN 103473263 B CN103473263 B CN 103473263B
Authority
CN
China
Prior art keywords
news
data subset
event
sentence
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310303085.6A
Other languages
Chinese (zh)
Other versions
CN103473263A (en
Inventor
郭艳卿
赵锐
孔祥维
蒋金平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201310303085.6A priority Critical patent/CN103473263B/en
Publication of CN103473263A publication Critical patent/CN103473263A/en
Application granted granted Critical
Publication of CN103473263B publication Critical patent/CN103473263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention belongs to the field of natural language processing and computer application and relates to a news event development process-oriented visual display method. The method comprises the following steps: acquiring a related news report webpage of a specific news event from a news source and removing duplicates to obtain a non-repeat news report data set of the event; dividing data subsets according to news report time to obtain data subsets which are ordered according to time; extracting character and place factors of the report time from each data subset to generate an event abstract; performing visual display of the event by taking character, place and event abstract sentences, meeting the importance in the data subset corresponding to the report time, as nodes and incidence relations among the sentences as edges. The method contributes to complete, concise and intuitive display of the development process of the entire news event and the incidence relations of the key factors such as the character and the place, so that the cost for a reader to acquire the news information is reduced.

Description

A kind of visualization exhibiting method towards media event evolution process
Technical field
The invention belongs to natural language processing and Computer Applied Technology field, drill towards media event particularly to one kind The visualization exhibiting method of change process.
Background technology
Development with network technology and multimedia technology and popularization, the approach that people obtain social news event information is sent out Give birth to huge change.The core status of the news medias such as newspaper, broadcast, TV are just gradually replaced by Internet news media, Internet news report has become popular main information acquisition platform.But Internet news reports numerous and complicated, content is close even The news report repeating wastes valuable the consulting the time of people, and " fragment type " report of different visual angles makes people be difficult to fast The ins and outs of media event are grasped on ground, be therefore badly in need of at present a kind of comprehensively, refining, intuitively media event exhibiting method to be dropping The low popular time cost obtaining news information.The exhibiting method of Internet news event mainly has two categories below at present:
First kind method:By Professional News worker, the Internet news report of particular news event is manually compiled Volume, taxonomic revision is so that relevant information is more orderly, be easy to user is read.For grave news event, Sina, phoenix, The news websites such as Netease then more meticulously sort out special news report, to provide more comprehensively relevant information.The method Advantage, be the evolution and detailed information that user can be helped more fully to grasp whole media event;Shortcoming is artificial Editor, arrange relatively costly, and the special report quantity that sorted out for grave news event still more it is still necessary to Take a significant amount of time the evolution that the report read in special topic could grasp media event.
Equations of The Second Kind method:Using search, cluster etc. the information processing technology relevant information of media event is collected and Arrange, and more careful classification can be carried out and represented by the source of information and type (as news, forum, blog, video etc.), with When can be ranked up by the sequencing of information issuing time.The advantage of the method, although be to adopt at the information such as search, cluster Reason technology achieves the automatic collection to media event relevant information, arrangement and refines theme, greatly reduces human-edited, whole The cost of reason;But shortcoming is to refine, intuitively show the evolution of media event, also cannot be to people in media event The incidence relation of the key elements such as thing, place and evolutionary process.
Content of the invention
Present invention is generally directed to the deficiency of existing two class Internet news technology for revealing, invention is a kind of to be developed towards media event The visualization exhibiting method of process, according to the computer program write, concrete each step executing methods described.Reducing people While work editor, arrangement cost, evolution and personage, place comprehensive, that refine, intuitively represent whole media event Incidence relation Deng key element.
The technical solution used in the present invention is a kind of visualization exhibiting method towards media event evolution process, according to volume The computer program write, concrete each step executing methods described.The method specifically includes:
A kind of visualization exhibiting method towards media event evolution process is it is characterised in that obtain from news sources first The related news report webpage of certain particular news event simultaneously carries out duplicate removal process, obtains the no repetition news report data of this event Collection;And then by news report data set, carry out data subset division by the news report time, obtain by report time order and function sequence Data subset;Extract personage, the place media event key element of this report time from each data subset, generate on this basis The event summary of this report time;Finally in each news report time corresponding data subset, to meet importance requirement Personage, place element of news and event summary sentence are node, and the incidence relation between them, as side, carries out this time point Incident visualization represents.
A kind of described visualization exhibiting method towards media event evolution process is it is characterised in that to certain obtaining The related news report webpage of particular news event simultaneously carries out duplicate removal process, specifically includes in every news report web page text Character carry out occurrence number statistics, will appear from the characteristic character as this webpage for the character that number of times exceedes preset requirement;By institute There is characteristic character to be arranged in feature string by occurrence number, utilize Hash method to generate a regular length on its basis Condition code;If the condition code of two news report webpages identical then it is assumed that one of them be repeated pages.
A kind of described visualization exhibiting method towards media event evolution process is it is characterised in that to data subset Selection, if specifically include comprise in data subset personage, place sentence quantity be more than predetermined threshold value, by this data subset It is determined as significant data subset, be determined as the important report of this particular news event corresponding for this data subset news report time The road time.
A kind of described visualization exhibiting method towards media event evolution process is it is characterised in that event summary Generating process, specifically includes the descending sequence of importance according to sentence in data subset, selects importance to be more than default wanting The sentence asked is as the event summary of this report time data subset;Wherein, the definition of sentence importance is related to word frequency statisticses power Value and element of news weights two aspect factor:The computational methods of sentence word frequency statisticses weights employ TFIDF weight calculation method; The relative frequency that the computational methods of sentence element of news weights are personage, place element of news occurs in data subset;Specifically Process includes:
First, extract the media event key elements such as personage, place from each data subset and obtain element of news phrase set (such as Place key element phrase set:L={ l1,l2,...ln), and weights quantization is carried out to key element phrase set;With place key element phrase As a example set, the weights quantizing process of each place key element is:
p ( l i ) = n l i | L | - - - ( 1 )
In formula (1)It is place key element liOccurrence number in data subset, | L | is that in data subset, all places will The total number of element, by p (li) as key element liWeight;
Secondly, calculate the weight of word in each data subset, calculate each word w in data subset using TFIDF methodi? Divide Score (wi), concrete formula is represented by:
S c o r e ( w i ) = T F ( w i ) × I D F ( w i ) | W | - - - ( 2 )
TF (w in formula (2)i) it is word wiOccurrence number in data subset, Represent word wiThe inverse of the data subset frequency occurring in total data set, | W | is the total number of word in data subset;Will Score(wi) as word wiWeight, if word wiFor key element phrase, then use the weight of the weighed value adjusting word of element of news, otherwise word Weight constant;With word wiAs a example the key element of place, the concrete formula of adjustment weight method can be expressed as:
π ( w i ) = ( 1 + p ( w i ) ) · S c o r e ( w i ) w i ∈ L S c o r e ( w i ) w i ∉ L - - - ( 3 )
π (w in formula (3)i) represent word wiWeight;
Again, according to sentence SjIn comprise the weight of word and give weight, the weight to the word that this sentence is comprised to sentence Average, formula is:
π ( S j ) Σ i = 1 n w i n w i ∈ S j - - - ( 4 )
π (S in formula (4)j) represent sentence SjWeight, n represents sentence SjIn the number of word that comprises;
Finally, according to sentence weight is descending, sentence is ranked up, and chooses the forward some sentence conducts of sequence The event summary of this time point.
5. a kind of visualization exhibiting method towards media event evolution process described in is it is characterised in that media event The visualization of evolution process represents, and wherein personage, the importance computational methods of place element of news are personage, place element of news The relative frequency occurring in data subset;The generating process of the above-mentioned event summary of system of selection of event summary sentence is identical; Centrad analysis is carried out to fixed node and side, the bigger node of importance is bigger in the size of in figure, and placement location is got over Tend to the middle part of figure;If comprising certain personage or certain place in event summary sentence node, by this event summary sentence of in figure Node is connected with this personage or this point node line, is otherwise not connected between node.
The invention has the beneficial effects as follows, can assist reader comprehensively, refining, intuitively represent the development of whole media event The incidence relation of the key element such as journey and personage, place.
Brief description
A kind of visualization exhibiting method flow chart towards media event evolution process of Fig. 1 present invention,
Fig. 2 sometime puts a preferred embodiment of media event visual presentation figure for the present invention,
Fig. 3 is a preferred embodiment of the present invention multiple continuous time point media event visual presentation figure.
Specific embodiment
Describe the specific embodiment of the present invention below in conjunction with the accompanying drawings with technical scheme in detail.
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with instantiation to this Invention is described in detail.
Fig. 1 is the visualization exhibiting method flow chart towards media event evolution process provided in an embodiment of the present invention, such as Shown in Fig. 1, the method comprises the following steps:
Step one:Obtain the related news report webpage of certain particular news event from news sources.
In this step, it is possible to use the interface that news platform provides, the keyword according to particular news event obtains phase Close news report webpage.Wherein, news platform can be the news channel of any website or that search engine is collected is new Hear data.
Step 2:Carry out duplicate removal process to obtaining news report webpage, obtain news report data set.
News report removing duplicate webpages in this step are processed, can be using the removing duplicate webpages side based on character frequency hash value Method, specific duplicate removal process can include:
First, occurrence number statistics is carried out to the text character of the every news report webpage obtaining, will appear from number of times and surpass Cross the characteristic character as this news report webpage for the non-stop words character of preset requirement;
Secondly, all characteristic characters in every news report webpage are arranged in feature string by occurrence number, and should With Hash method, this feature character string maps are become the condition code of a regular length;
Finally, more all two-by-two condition codes obtaining news report webpage, the such as condition code of two news report webpages Identical then it is assumed that one of them is repeated pages, and get rid of from news report data set.
Certainly, above-mentioned is only the side of being preferable to carry out that the present invention provides based on the removing duplicate webpages method of character frequency hash value Formula, it would however also be possible to employ other removing duplicate webpages methods.
Step 3:By report time and importance, data subset according to time sequence is determined to news report data set.
In this step, data subset determines that the detailed process of method can be:
First, the web page text in news report data set is carried out data subset division by the news report time, will report Road time identical web page text is classified as the same data subset with this report time as label;
Secondly, word segmentation processing is carried out to all web page texts in each data subset, and mark in which sentence and comprise The media event key element such as personage, place;
Again, preset if the sentence quantity comprising the news key element such as personage, place in a certain data subset is more than Threshold value, then be determined as significant data subset by this data subset, and continued to employ;Otherwise, then get rid of this report time Data subset;
Finally, by the sequencing of report time, rearrange each data subset continued to employ.
Step 4:Extract the media event key elements such as personage, place from each data subset, and form the thing of this time point Part is made a summary.
In this step, on each time point event summary extraction, can be using being plucked based on many documents of media event key element Want extracting method, detailed process includes:
First, extract the media event key elements such as personage, place from each data subset and obtain element of news phrase set (such as Place key element phrase set:L={ l1,l2,...ln), and weights quantization is carried out to key element phrase set.With place key element phrase As a example set, the weights quantizing process of each place key element is:
p ( l i ) = n l i | L | - - - ( 1 )
In formula (1)It is place key element liOccurrence number in data subset, | L | is that in data subset, all places will The total number of element.By p (li) as key element liWeight.
Secondly, calculate the weight of word in each data subset.Calculate each word w in data subset using TFIDF methodi? Divide Score (wi), concrete formula is represented by:
S c o r e ( w i ) = T F ( w i ) × I D F ( w i ) | W | - - - ( 2 )
TF (w in formula (2)i) it is word wiOccurrence number in data subset, Represent word wiThe inverse of the data subset frequency occurring in total data set, | W | is the total number of word in data subset.Will Score(wi) as word wiWeight.If word wiFor key element phrase, then use the weight of the weighed value adjusting word of element of news, otherwise word Weight constant.With word wiAs a example the key element of place, the concrete formula of adjustment weight method can be expressed as:
π ( w i ) = ( 1 + p ( w i ) ) · S c o r e ( w i ) w i ∈ L S co r e ( w i ) w i ∉ L - - - ( 3 )
π (w in formula (3)i) represent word wiWeight.
Again, according to sentence SjIn comprise the weight of word and give weight, the weight to the word that this sentence is comprised to sentence Average, formula is:
π ( S j ) = Σ i = 1 n w i n w i ∈ S j - - - ( 4 )
π (S in formula (4)j) represent sentence SjWeight, n represents sentence SjIn the number of word that comprises.
Finally, according to sentence weight is descending, sentence is ranked up, and chooses the forward some sentence conducts of sequence The event summary of this time point.
Certainly, above-mentioned is only the preferred reality that the present invention provides based on the multi-document summary extracting method of media event key element Apply mode, it would however also be possible to employ other event summary extracting methods.
Step 5:With meet importance requirement personage, the element of news such as place and event summary sentence as node, with it Between incidence relation be side, visualization is carried out to the event of each time point and represents.
In this step, the detailed process of the incident visualization exhibiting method of each time point may include:
First, calculate the relative frequency that the elements of news such as personage, place occur in sometime point data subset, and make Measure for element of news importance;
Secondly, by importance size, element of news is ranked up, and retain importance exceed the news of preset requirement will Element, gets rid of the element of news that importance is not less than preset requirement, obtains the element of news node of this time point in this approach;
Again, generate the summary sentence of this time point by the event summary generation method in abovementioned steps four, and will make a summary The weighted value of sentence, as the measure of its importance, obtains the event summary sentence node of this time point in this approach;
Finally, with the incidence relation between above-mentioned element of news node and event summary node as side, draw each time The incident visualization of point represents figure.Wherein, the incidence relation between node can be described as:If wrapping in event summary sentence node Containing certain personage (or certain place), then this event summary sentence node of in figure is connected with line with this personage (or this place) node Connect, be otherwise not connected between node.Additionally, visualization represents figure and is built upon node and side are carried out with the basis of centrad analysis On, the bigger node of importance is bigger in the size of in figure, and placement location more tends to the middle part of figure.According to the computer write Program, execution step one, two, three, four, five methods described.
Step 6:Connect the node in adjacent time point visualized graphs with incidence relation, represent this particular news thing Part evolution process.
In this step, the joint connecting method in adjacent time point visualized graphs can be:If adjacent time point There is identical node in visualized graphs, then connect this two identical nodes, be otherwise not connected to.
Fig. 2 gives the preferred embodiment that visualization represents figure.As illustrated, triangle represents sometime Personage's key element in point data subset, the size of triangle represents the importance of personage;Square represents this time point data Place key element in subset, foursquare size has been similarly represented as the importance in place;Circular representative is from this time point data The event summary sentence extracting in collection, circular size represents the importance of summary sentence.In figure most important summary sentence For " Minister of Railways contains light ancestral, Ministry of Railways vice-minister Hu Yadong rushes towards Wenzhou motor-car scene of the accident commander ", wherein comprise " to contain light Ancestral ", " Hu Yadong " two personage's key elements and " Wenzhou " this place key element, therefore maximum circle " contains light with representing respectively Ancestral ", two triangles of " Hu Yadong " are connected with the square representing " Wenzhou ".
Fig. 3 gives a preferred embodiment of adjacent time point visualized graphs connection.In figure presses from top to bottom suitable Sequence gives the partial visual present graphical of " derailing of Wenzhou rear end collision of motor train " event, and the arrow of in figure is connected to adjacent time point Same place.
The feature of this embodiment visualization provided by the present invention exhibiting method, can assist reader comprehensively, refining, straight See ground and represent the evolution of whole media event and the incidence relation of the key element such as personage, place.

Claims (3)

1. a kind of visualization exhibiting method towards media event evolution process is it is characterised in that obtain certain from news sources first The related news report webpage of particular news event simultaneously carries out duplicate removal process, obtains the no repetition news report data of this event Collection;And then the data set of news report is carried out data subset division by the news report time, obtain by report time order and function row The data subset of sequence;Extract personage, the place media event key element of this report time from each data subset, give birth on this basis Become the event summary of this report time;Finally in each news report time corresponding data subset, to meet importance requirement Personage, place element of news and event summary sentence be node, the incidence relation between them, as side, carries out this time point Incident visualization represent, according to the computer program write, execute above-mentioned steps;
Selection to data subset, specifically includes:If comprising personage in data subset, the quantity of place sentence is more than default threshold Value, then be determined as significant data subset by this data subset, corresponding for this data subset news report time be determined as this spy Determine the important report time of media event;
The generating process of event summary, specifically includes the descending sequence of importance according to sentence in data subset, selects weight The property wanted is more than the event summary as this report time data subset for the sentence of preset requirement;Wherein, the definition of sentence importance It is related to word frequency statisticses weights and element of news weights two aspect factor:The computational methods of sentence word frequency statisticses weights employ TFIDF weight calculation method;The computational methods of sentence element of news weights are personage, place element of news goes out in data subset Existing relative frequency;Detailed process includes:
First, extract personage from each data subset, place media event key element obtains element of news phrase set, such as place will Plain phrase set:L={ l1,l2,...ln, and weights quantization is carried out to key element phrase set;The weights amount of each place key element Change process is:
In formula (1)It is place key element liOccurrence number in data subset, | L | is all places key element in data subset Total number, by p (li) as key element liWeight;
Secondly, calculate the weight of word in each data subset, calculate each word w in data subset using TFIDF methodiScore Score(wi), concrete formula is represented by:
TF (w in formula (2)i) it is word wiOccurrence number in data subset, Represent word wiThe inverse of the data subset frequency occurring in total data set, | W | is the total number of word in data subset;Will Score(wi) as word wiWeight, if word wiFor key element phrase, then use the weight of the weighed value adjusting word of element of news, otherwise word Weight constant;With word wiAs a example the key element of place, the concrete formula of adjustment weight method can be expressed as:
π (w in formula (3)i) represent word wiWeight;
Again, according to sentence SjIn comprise the weight of word and give weight to sentence, the weight of the word that this sentence is comprised is averaging It is worth, formula is:
π (S in formula (4)j) represent sentence SjWeight, n represents sentence SjIn the number of word that comprises;
Finally, when sentence being ranked up according to sentence weight is descending, and choosing the forward some sentences of sequence as this Between point event summary.
2. a kind of visualization exhibiting method towards media event evolution process according to claim 1 it is characterised in that Obtain the related news report webpage of certain particular news event and carry out duplicate removal process, specifically include to every news report webpage Character in text carries out occurrence number statistics, will appear from the tagged word as this webpage for the character that number of times exceedes preset requirement Symbol;All characteristic characters are arranged in feature string by occurrence number, utilize Hash method to generate one admittedly on its basis The condition code of measured length;If the condition code of two news report webpages identical then it is assumed that one of them be repeated pages.
3. a kind of visualization exhibiting method towards media event evolution process according to claim 1 and 2, its feature exists In, the visualization of media event evolution process represents, wherein personage, place element of news importance computational methods be personage, The relative frequency that place element of news occurs in data subset;In the system of selection of event summary sentence and claim 1 Event summary generating process is identical;Centrad analysis is carried out to fixed node and side, the bigger node of importance is in figure Size bigger, placement location more tends to the middle part of figure;If comprising certain personage or certain place in event summary sentence node, will This event summary sentence node of in figure is connected with this personage or this point node line, is otherwise not connected between node.
CN201310303085.6A 2013-07-18 2013-07-18 News event development process-oriented visual display method Active CN103473263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310303085.6A CN103473263B (en) 2013-07-18 2013-07-18 News event development process-oriented visual display method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310303085.6A CN103473263B (en) 2013-07-18 2013-07-18 News event development process-oriented visual display method

Publications (2)

Publication Number Publication Date
CN103473263A CN103473263A (en) 2013-12-25
CN103473263B true CN103473263B (en) 2017-02-08

Family

ID=49798111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310303085.6A Active CN103473263B (en) 2013-07-18 2013-07-18 News event development process-oriented visual display method

Country Status (1)

Country Link
CN (1) CN103473263B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182504B (en) * 2014-08-18 2017-06-06 合肥工业大学 A kind of dynamic tracking of media event and summary algorithm
CN104408093B (en) * 2014-11-14 2018-01-26 中国科学院计算技术研究所 A kind of media event key element abstracting method and device
CN106156042B (en) * 2015-03-26 2020-02-07 科大讯飞股份有限公司 Hotspot information display method and system
CN104915446B (en) * 2015-06-29 2019-01-29 华南理工大学 Event Evolvement extraction method and its system based on news
WO2017107010A1 (en) * 2015-12-21 2017-06-29 浙江核新同花顺网络信息股份有限公司 Information analysis system and method based on event regression test
CN105787095B (en) * 2016-03-16 2019-09-27 广州索答信息科技有限公司 The automatic generation method and device of internet news
CN105912526A (en) * 2016-04-15 2016-08-31 北京大学 Sports game live broadcasting text based sports news automatic constructing method and device
CN106709968A (en) * 2016-11-30 2017-05-24 剧加科技(厦门)有限公司 Data visualization method and system for play story information
CN106874419B (en) * 2017-01-22 2019-09-10 北京航空航天大学 A kind of real-time hot spot polymerization of more granularities
CN110020104B (en) * 2017-09-05 2023-04-07 腾讯科技(北京)有限公司 News processing method and device, storage medium and computer equipment
CN110110089B (en) * 2018-01-09 2021-03-30 网智天元科技集团股份有限公司 Cultural relation graph generation method and system
CN108170838B (en) * 2018-01-12 2022-07-08 平安科技(深圳)有限公司 Topic evolution visualization display method, application server and computer readable storage medium
CN108427761B (en) * 2018-03-21 2022-01-14 腾讯科技(深圳)有限公司 News event processing method, terminal, server and storage medium
CN110162651B (en) * 2019-04-23 2023-07-14 南京邮电大学 News content image-text disagreement identification system and identification method based on semantic content abstract
CN111931092B (en) * 2020-07-07 2022-07-12 浙江大学 Data visualization exploration system based on Scrollytelling technology
CN112328856A (en) * 2020-10-30 2021-02-05 中国平安人寿保险股份有限公司 Common event tracking method and device, computer equipment and computer readable medium
CN113626668B (en) * 2021-07-02 2024-05-14 武汉大学 News multi-scale visualization method for map
CN114040518A (en) * 2021-11-26 2022-02-11 中国银行股份有限公司 Network node display method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646114A (en) * 2012-02-17 2012-08-22 清华大学 News topic timeline abstract generating method based on breakthrough point

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646114A (en) * 2012-02-17 2012-08-22 清华大学 News topic timeline abstract generating method based on breakthrough point

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于字符统计的新闻网页去重方法研究;蒋金平等;《中国科技论文在线》;20130513;第4页第3段第3-6行、第6段,第5页第2段第1-3行 *
文本可视化在新闻事件演变中的应用;刘晓娟等;《图书情报工作》;20100920;第54卷(第18期);全文 *
新闻话题事件演变关系自动生成***的设计和实现;李斌;《万方数据知识服务平台》;20120531;第22页第3段,第23页第5-6段,第39页第1段 *

Also Published As

Publication number Publication date
CN103473263A (en) 2013-12-25

Similar Documents

Publication Publication Date Title
CN103473263B (en) News event development process-oriented visual display method
JP5886733B2 (en) Video group reconstruction / summarization apparatus, video group reconstruction / summarization method, and video group reconstruction / summarization program
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
CN102591612B (en) General webpage text extraction method based on punctuation continuity and system thereof
CN103544176A (en) Method and device for generating page structure template corresponding to multiple pages
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
CN104778209A (en) Opinion mining method for ten-million-scale news comments
CN103810251B (en) Method and device for extracting text
CN112667940B (en) Webpage text extraction method based on deep learning
CN103870001A (en) Input method candidate item generating method and electronic device
WO2016057984A1 (en) Methods and systems for base map and inference mapping
CN103678412A (en) Document retrieval method and device
CN111460162B (en) Text classification method and device, terminal equipment and computer readable storage medium
WO2014000130A1 (en) Method or system for automated extraction of hyper-local events from one or more web pages
CN106294330A (en) A kind of scientific text selection method and device
CN102955853A (en) Method and device for generating cross-language abstract
Ma et al. Extracting unstructured data from template generated web documents
CN109522410A (en) Document clustering method and platform, server and computer-readable medium
CN106485525A (en) Information processing method and device
Liu et al. Main content extraction from web pages based on node characteristics
Li et al. Text mining and visualization of papers reviews using R language
D’Silva et al. Development of a Konkani language dataset for automatic text summarization and its challenges
Das et al. Sentiment analysis: what is the end user's requirement?
CN1604073A (en) Method for conducting title and text logic connection for newspaper pages
Hsu et al. Hierarchical comments-based clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant