CN104572899A - Article processing method and article processing device - Google Patents

Article processing method and article processing device Download PDF

Info

Publication number
CN104572899A
CN104572899A CN201410826879.5A CN201410826879A CN104572899A CN 104572899 A CN104572899 A CN 104572899A CN 201410826879 A CN201410826879 A CN 201410826879A CN 104572899 A CN104572899 A CN 104572899A
Authority
CN
China
Prior art keywords
industry
article
correlation
degree
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410826879.5A
Other languages
Chinese (zh)
Inventor
刘严
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN201410826879.5A priority Critical patent/CN104572899A/en
Publication of CN104572899A publication Critical patent/CN104572899A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an article processing method and an article processing device. The article processing method comprises the following steps: receiving a to-be-processed article; extracting industry key words in the to-be-processed article according to a key word library; classifying the to-be-processed article by use of an industry correlation degree discrimination model trained by an industry correlation article sample library on the basis of the industry key words, and obtaining the industry correlation degree of the to-be-processed article. According to the article processing method and the article processing device which are provided by the embodiment of the invention, the article industry correlation degree calculation processing efficiency is improved.

Description

The method and apparatus of process article
Technical field
The embodiment of the present invention relates to data mining technology field, particularly relates to a kind of method and apparatus processing article.
Background technology
The industry degree of correlation refers to the number percent existing between things and connect each other.The industry degree of correlation is the concept with subjective colo(u)r, different people can be different to the industry degree of correlation given by same word, the industry degree of correlation of the word therefore stored in industry dictionary or word is all provided by domain expert, or extracts representative query word and remit composition industry dictionary.Document or just can perfect representation content wherein by several or some words, phrase, so these words just become to the industry degree of correlation in this field and determine document or the key factor relevant with field.
The industry degree of correlation is all very conventional analytical parameters in multiple fields such as finance, internets.Such as, in the process of M & A, can provide according to the industry degree of correlation participating in the enterprise merged the suggestion whether taking merger measure.But existing industry relatedness computation mainly relies on artificial calculating, the tediously long complexity of computation process, wastes time and energy.
Summary of the invention
In view of this, the embodiment of the present invention proposes a kind of method and apparatus processing article, to determine the industry degree of correlation of article efficiently.
First aspect, embodiments provide a kind of method processing article, described method comprises:
Receive pending article;
The industry keyword in described pending article is extracted according to keywords database;
Based on described industry keyword, utilize the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtain the industry degree of correlation of described pending article.
Second aspect, embodiments provide a kind of device processing article, described device comprises:
Article receiver module, for receiving pending article;
Keyword extracting module, for extracting the industry keyword in described pending article according to keywords database;
Industry relatedness computation module, for based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
The method and apparatus of the process article that the embodiment of the present invention provides, by receiving pending article, the industry keyword in described pending article is extracted according to keywords database, based on described industry keyword, the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse is utilized to classify to described pending article, obtain the industry degree of correlation of described pending article, thus improve the efficiency of the industry degree of correlation determining article.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the process flow diagram of the method for the process article that first embodiment of the invention provides;
Fig. 2 is the process flow diagram of the method for the process article that second embodiment of the invention provides;
Fig. 3 is the process flow diagram that in the method for the process article that third embodiment of the invention provides, keywords database generates;
Fig. 4 is the process flow diagram of model training in the method for the process article that fourth embodiment of the invention provides;
Fig. 5 is the schematic flow sheet of the method for the process article that fifth embodiment of the invention provides.
Fig. 6 is the structural drawing of the device of the process article that sixth embodiment of the invention provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
First embodiment
Present embodiments provide a kind of method processing article.See Fig. 1, the method for described process article comprises: operation 11 to operation 13.
Operation 11, receives pending article.
Described pending article is the article needing to determine the industry degree of correlation.Described pending article can be the article that user gets from internet, also can be the article that user gets from other data sources.
Operation 12, extracts the industry keyword in described pending article according to keywords database.
In the present embodiment, after receiving pending article, from described pending article, extract industry keyword with reference to keywords database.The dictionary that described keywords database is made up of the keyword extracted from the industry related article industry related article Sample Storehouse in advance.It should be noted that, in described keywords database, not only comprise industry keyword, also comprise the industry degree of correlation of described industry keyword.Such as, comprise industry keyword " extraction " in described keywords database, the relevant category of employment that described keywords database also further should comprise the sector keyword is chemical industry, and the industry degree of correlation of the sector keyword and chemical industry is 90%.
Preferably, the industry degree of correlation of described industry keyword is an industry relevance vector.Such as, industry keyword " casting " is 90% with the industry degree of correlation of steel industry, 5% with the industry degree of correlation of automobile industry, then the industry degree of correlation of described industry keyword " casting " is a vector, this vector at least should comprise the industry degree of correlation 90% of the sector keyword and steel industry, and the industry degree of correlation 5% of itself and automobile industry.
Preferred further, the category of employment that described and described industry keyword is relevant is a category of employment sorted table with at least one level.So, relevant to described industry keyword category of employment can be " science and technology-industry-International CES special topic " such multi-layer category of employment.
Operation 13, based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
At the present embodiment, the industry correlated samples article in preset industry related article Sample Storehouse is utilized to trained industry degree of correlation discrimination model.Described industry degree of correlation discrimination model comprises a sorter.After this sorter receives described pending article, the category of employment belonging to described pending article can be provided.After providing the category of employment belonging to described pending article, described industry degree of correlation discrimination model according to the category of employment of described pending article, can provide the industry degree of correlation of described pending article.
Preferably, the sorter comprised in described industry degree of correlation discrimination model is a support vector machine (Support vector machine, SVM) sorter.After described sorter provides the category of employment of described pending article, larger weighted value is given to the category of employment belonging to described pending article, then the industry degree of correlation of the industry keyword comprised in described pending article is weighted on average, obtains the industry degree of correlation of described pending article.
The present embodiment is by receiving pending article, the industry keyword in described pending article is extracted according to keywords database, and based on described industry keyword, the industry degree of correlation discrimination model after by the industry correlated samples article training in industry related article Sample Storehouse is utilized to classify to described pending article, obtain the industry degree of correlation of described pending article, thus determine the industry degree of correlation of pending article efficiently.
Second embodiment
Present embodiments provide a kind of a kind of technical scheme processing the method for article.The method of described process article, based on the above embodiment of the present invention, further, before receiving pending article, also comprises: from industry correlated samples article, extract industry keyword, to generate keywords database; Before receiving pending article, extract industry keyword from industry correlated samples article after, also comprise: utilize industry correlated samples article to train described industry degree of correlation discrimination model.
See Fig. 2, the method for described process article comprises: operation 21 to operation 25.
Operation 21, extracts industry keyword, to generate keywords database from the industry correlated samples article described industry related article Sample Storehouse.
When processing described pending article, need to use the keywords database generated in advance.In the present embodiment, before the described pending article of reception, from the industry correlated samples article described industry related article Sample Storehouse, keyword is extracted.Which preferably, keyword can be determined to be defined as industry keyword in the word frequency that described industry correlated samples article occurs according to different keywords.
At extraction industry keyword, when generating described keywords database, also need the industry degree of correlation determining industry keyword in described keywords database.Preferably, the industry degree of correlation of described industry keyword is an industry relevance vector.
Operation 22, utilizes the industry correlated samples article in described industry related article Sample Storehouse to train described industry degree of correlation discrimination model.
When utilizing the industry correlated samples article in described industry related article Sample Storehouse to generate described keywords database, the industry correlated samples article in described industry related article Sample Storehouse has just been endowed the industry degree of correlation.Therefore, using the industry correlated samples article in described industry related article Sample Storehouse as training data, described industry degree of correlation discrimination model can be trained.
Operation 23, receives pending article.
Operation 24, extracts the industry keyword in described pending article according to keywords database.
Operation 25, based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
The present embodiment is by before receiving pending article, industry keyword is extracted from industry correlated samples article, to generate keywords database, and utilize industry correlated samples article to train described industry degree of correlation discrimination model, achieve efficiently determining the industry degree of correlation of pending article.
3rd embodiment
Present embodiments provide a kind of technical scheme of the method for process article.In the technical program, keyword is extracted from the industry correlated samples article described industry related article Sample Storehouse, comprise to generate keywords database: the industry correlated samples article in described industry related article Sample Storehouse is mated with preset keywords database, to extract keyword from described industry correlated samples article; Cluster is carried out to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the keyword extracted; According to occurring that word frequency extracts new keyword from described industry correlated samples article, and determine the industry degree of correlation of new keyword according to the industry degree of correlation of described industry correlated samples article; New keyword is added into described keywords database, repeats aforesaid operations, until can not the keyword made new advances be being extracted from described industry correlated samples article.
See Fig. 3, from the industry correlated samples article described industry related article Sample Storehouse, extract keyword, comprise to generate keywords database: operation 31 to operation 34.
Operation 31, mates the industry correlated samples article in described industry related article Sample Storehouse with preset keywords database, to extract industry keyword from described industry correlated samples article.
Before generation keywords database, system intialization has a small-scale keywords database.Industry keyword the most conventional and the industry degree of correlation parameter of these industry keywords is store in described keywords database.
Different industry correlated samples articles is stored in described industry related article Sample Storehouse.Industry correlated samples article in described industry related article Sample Storehouse and the industry keyword in preset keywords database are carried out text matches, thus extract industry keyword from described industry correlated samples article.
Operation 32, carries out cluster to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the industry keyword extracted.
After keyword extraction is performed to described industry correlated samples article, using described industry keyword the representing as described industry correlated samples article of extracting, text cluster is carried out to described industry correlated samples article.After carrying out text cluster to described industry correlated samples article, gathering be the industry correlated samples article of a class is exactly the industry correlated samples article classifying as same category of employment.
Because described category of employment is a category of employment sorted table with multiple level, therefore, in the process performing cluster operation, need generic industry correlated samples article execution cluster again.
After the category of employment determining described industry correlated samples article, determine the industry degree of correlation of described industry correlated samples article.Preferably, can by trying to achieve the industry degree of correlation of described industry correlated samples article to the weighted mean of the industry degree of correlation of the industry keyword comprised in described industry correlated samples article.
Operation 33, according to occurring that word frequency extracts new industry keyword from the described industry correlated samples article after cluster, and determines the industry degree of correlation of new industry keyword according to the industry degree of correlation of described industry correlated samples article.
After determining the industry degree of correlation of described industry correlated samples article, according to the appearance word frequency of word in described industry correlated samples article, from described industry correlated samples article, extract new industry keyword.Concrete, when the appearance word frequency of a word is greater than default appearance word frequency threshold value, using this word as the new keyword extracted.
Extract new keyword from described industry correlated samples article after, the industry degree of correlation according to described industry correlated samples article determines the industry degree of correlation of described keyword.Preferably, the industry degree of correlation of the industry degree of correlation as described industry keyword of the industry correlated samples article of described industry keyword can be extracted.Further, when described industry keyword has at least two source industry correlated samples articles, the industry degree of correlation of described at least two source industry correlated samples articles is weighted on average, as the industry degree of correlation of described new industry keyword.
Operation 34, is added into described keywords database by new industry keyword, repeats aforesaid operations, until can not extract the industry keyword made new advances again from described industry correlated samples article.
Extract new industry keyword from described industry correlated samples article after, new industry keyword is added into described keywords database.And then the keywords database utilizing renewal later extracts keyword, carries out text cluster, and again extract keyword from described industry correlated samples article, to upgrade described keywords database to described industry correlated samples article.The execution of iteration like this, until can not extract the industry keyword made new advances again from described industry correlated samples article.
The present embodiment is by mating the industry correlated samples article in described industry related article Sample Storehouse with preset keywords database, to extract keyword from described industry correlated samples article, according to the keyword extracted, cluster is carried out to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article, according to occurring that word frequency extracts new keyword from described industry correlated samples article, and the industry degree of correlation of new keyword is determined according to the industry degree of correlation of described industry correlated samples article, and new keyword is added into described keywords database, repeat aforesaid operations, until can not the keyword made new advances be being extracted from described industry correlated samples article, thus industry keyword is extracted from the industry correlated samples article described industry related article Sample Storehouse, generate keywords database.
4th embodiment
Present embodiments provide a kind of technical scheme of the method for process article.In the technical program, the industry correlated samples article in described industry related article Sample Storehouse is utilized to train described industry degree of correlation discrimination model to comprise: using determining the industry correlated samples article of the industry degree of correlation in described industry related article Sample Storehouse as training data, to train described industry degree of correlation discrimination model; The industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article; Using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
See Fig. 4, the industry correlated samples article in described industry related article Sample Storehouse is utilized to train described industry degree of correlation discrimination model to comprise: operation 41 to operation 43.
Operation 41, using determining the industry correlated samples article of the industry degree of correlation in described industry related article Sample Storehouse as training data, trains described industry degree of correlation discrimination model.
Described industry degree of correlation discrimination model is a sorter, after it receives pending article, can export the category of employment of described industry correlated samples article.Preferred described industry degree of correlation discrimination model is a SVM classifier.
Due in the process generating keywords database, for the industry correlated samples article in described industry related article Sample Storehouse determines the industry degree of correlation, so using described industry correlated samples article as training data, described industry degree of correlation discrimination model can be trained.
Operation 42, the industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article.
After having carried out described industry degree of correlation discrimination model once training, the industry degree of correlation discrimination model after training has been utilized to calculate the industry degree of correlation of described industry correlated samples article.
Operation 43, using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
After utilizing described industry degree of correlation discrimination model to calculate the industry degree of correlation of described industry correlated samples article, different industry correlated samples articles is to there being the different industry degrees of correlation.Using the described industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, again described industry degree of correlation discrimination model is trained, until all parameters of described industry degree of correlation discrimination model all restrain.So-called convergence, refer to after again utilizing the industry degree of correlation to train described industry degree of correlation discrimination model higher than the industry correlated samples article of industry relevance threshold preset, the parameter of described industry degree of correlation discrimination model no longer changes, or the change of parameter is less than default parameter variation range.
The present embodiment is by determining the industry correlated samples article of the industry degree of correlation as training data in described industry related article Sample Storehouse, train described industry degree of correlation discrimination model, the industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article, and using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold as training data, repeat aforesaid operations, until described industry degree of correlation discrimination model convergence, thus achieve the accurate training to described industry degree of correlation discrimination model.
5th embodiment
Present embodiments provide a kind of technical scheme of the method for process article.See Fig. 5, the method for described process article comprises: operation 50 to operation 59.
Operation 50, mates the industry correlated samples article in described industry related article Sample Storehouse with preset keywords database, to extract industry keyword from described industry correlated samples article.
Operation 51, carries out cluster to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the industry keyword extracted.
Operation 52, according to occurring that word frequency extracts new industry keyword from the described industry correlated samples article after cluster, and determines the industry degree of correlation of new industry keyword according to the industry degree of correlation of described industry correlated samples article.
Operation 53, is added into described keywords database by new industry keyword, repeats aforesaid operations, until can not extract the industry keyword made new advances again from described industry correlated samples article.
Operation 54, using determining the industry correlated samples article of the industry degree of correlation as training data, trains described industry degree of correlation discrimination model.
Operation 55, the industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article.
Operation 56, using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
Operation 57, receives pending article.
Operation 58, extracts the industry keyword in described pending article according to keywords database.
Operation 59, based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
The present embodiment passes through before receiving pending article, the training industry degree of correlation discrimination model setting up keywords database and iteration of iteration, thus determines the industry degree of correlation of pending article efficiently.
6th embodiment
Present embodiments provide a kind of technical scheme of the device of process article.See Fig. 6, the device of described process article comprises: article receiver module 63, keyword extracting module 64 and industry relatedness computation module 65.
Described article receiver module 63 is for receiving pending article.
Described keyword extracting module 64 is for extracting the industry keyword in described pending article according to keywords database.
Described industry relatedness computation module 65, for based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
Further, the device of described process article also comprises: keywords database generation module 61.
Described keywords database generation module 61, for before receiving pending article, extracts industry keyword, to generate keywords database from the industry correlated samples article described industry related article Sample Storehouse.
Further, described keyword generation module 61 specifically for:
Described industry correlated samples article is mated with preset keywords database, to extract industry keyword from described industry correlated samples article;
Cluster is carried out to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the industry keyword extracted;
According to occurring that word frequency extracts new industry keyword from described industry correlated samples article, and determine the industry degree of correlation of new industry keyword according to the industry degree of correlation of described industry correlated samples article;
New industry keyword is added into described keywords database, repeats aforesaid operations, until can not the industry keyword made new advances be being extracted from described industry correlated samples article.
Further, the device of described process article also comprises: model training module 62.
Described model training module 62, for before receiving pending article, after extracting industry keyword, utilizes industry correlated samples article to train described industry degree of correlation discrimination model from the industry correlated samples article described industry related article Sample Storehouse.
Further, described model training module 62 specifically for:
Using determining the industry correlated samples article of the industry degree of correlation in described industry related article Sample Storehouse as training data, train described industry degree of correlation discrimination model;
The industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article;
Using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
Those of ordinary skill in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of computer installation, thus they storages can be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, the same or analogous part between each embodiment mutually see.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. process a method for article, it is characterized in that, comprising:
Receive pending article;
The industry keyword in described pending article is extracted according to keywords database;
Based on described industry keyword, utilize the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtain the industry degree of correlation of described pending article.
2. method according to claim 1, is characterized in that, before receiving pending article, also comprises:
Industry keyword is extracted, to generate keywords database from the industry correlated samples article described industry related article Sample Storehouse.
3. method according to claim 2, is characterized in that, extracts industry keyword, comprise to generate keywords database from the industry correlated samples article described industry related article Sample Storehouse:
Industry correlated samples article in described industry related article Sample Storehouse is mated with preset keywords database, to extract industry keyword from described industry correlated samples article;
Cluster is carried out to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the industry keyword extracted;
According to occurring that word frequency extracts new industry keyword from the described industry correlated samples article after cluster, and determine the industry degree of correlation of new industry keyword according to the industry degree of correlation of described industry correlated samples article;
New industry keyword is added into described keywords database, repeats aforesaid operations, until the industry keyword made new advances can not be extracted again from described industry correlated samples article.
4. method according to claim 3, is characterized in that, before receiving pending article, after extracting industry keyword, also comprises from industry correlated samples article:
The industry correlated samples article in described industry related article Sample Storehouse is utilized to train described industry degree of correlation discrimination model.
5. method according to claim 4, is characterized in that, utilizes the industry correlated samples article in described industry related article Sample Storehouse to train described industry degree of correlation discrimination model to comprise:
Using determining the industry correlated samples article of the industry degree of correlation in described industry related article Sample Storehouse as training data, train described industry degree of correlation discrimination model;
The industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article;
Using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
6. process a device for article, it is characterized in that, comprising:
Article receiver module, for receiving pending article;
Keyword extracting module, for extracting the industry keyword in described pending article according to keywords database;
Industry relatedness computation module, for based on described industry keyword, utilizes the industry degree of correlation discrimination model after by the training of industry related article Sample Storehouse to classify to described pending article, obtains the industry degree of correlation of described pending article.
7. device according to claim 6, is characterized in that, also comprises:
Keywords database generation module, for before receiving pending article, extracts industry keyword, to generate keywords database from the industry correlated samples article described industry related article Sample Storehouse.
8. device according to claim 7, is characterized in that, described keyword generation module specifically for:
Described industry correlated samples article is mated with preset keywords database, to extract industry keyword from described industry correlated samples article;
Cluster is carried out to described industry correlated samples article, to determine the industry degree of correlation of described industry correlated samples article according to the industry keyword extracted;
According to occurring that word frequency extracts new industry keyword described industry correlated samples article after cluster, and determine the industry degree of correlation of new industry keyword according to the industry degree of correlation of described industry correlated samples article;
New industry keyword is added into described keywords database, repeats aforesaid operations, until can not the industry keyword made new advances be being extracted from described industry correlated samples article.
9. device according to claim 8, is characterized in that, also comprises:
Model training module, for before receiving pending article, after extracting industry keyword, utilizes industry correlated samples article to train described industry degree of correlation discrimination model from the industry correlated samples article described industry related article Sample Storehouse.
10. device according to claim 9, is characterized in that, described model training module specifically for:
Using determining the industry correlated samples article of the industry degree of correlation as training data, train described industry degree of correlation discrimination model;
The industry degree of correlation discrimination model obtained after utilizing training calculates the industry degree of correlation of described industry correlated samples article;
Using the industry degree of correlation higher than the industry correlated samples article of industry relevance threshold preset as training data, repeat aforesaid operations, until the convergence of described industry degree of correlation discrimination model.
CN201410826879.5A 2014-12-25 2014-12-25 Article processing method and article processing device Pending CN104572899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410826879.5A CN104572899A (en) 2014-12-25 2014-12-25 Article processing method and article processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410826879.5A CN104572899A (en) 2014-12-25 2014-12-25 Article processing method and article processing device

Publications (1)

Publication Number Publication Date
CN104572899A true CN104572899A (en) 2015-04-29

Family

ID=53088961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410826879.5A Pending CN104572899A (en) 2014-12-25 2014-12-25 Article processing method and article processing device

Country Status (1)

Country Link
CN (1) CN104572899A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009248A (en) * 2017-11-30 2018-05-08 国信优易数据有限公司 A kind of data classification method and system
CN108304868A (en) * 2018-01-25 2018-07-20 阿里巴巴集团控股有限公司 Model training method, data type recognition methods and computer equipment
CN109543049A (en) * 2018-11-23 2019-03-29 广东小天才科技有限公司 A kind of method and system for writing techniques automatic push material

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6990496B1 (en) * 2000-07-26 2006-01-24 Koninklijke Philips Electronics N.V. System and method for automated classification of text by time slicing
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN103186675A (en) * 2013-04-03 2013-07-03 南京安讯科技有限责任公司 Automatic webpage classification method based on network hot word identification
CN103365867A (en) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 Method and device for emotion analysis of user evaluation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6990496B1 (en) * 2000-07-26 2006-01-24 Koninklijke Philips Electronics N.V. System and method for automated classification of text by time slicing
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN103365867A (en) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 Method and device for emotion analysis of user evaluation
CN103186675A (en) * 2013-04-03 2013-07-03 南京安讯科技有限责任公司 Automatic webpage classification method based on network hot word identification

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009248A (en) * 2017-11-30 2018-05-08 国信优易数据有限公司 A kind of data classification method and system
CN108304868A (en) * 2018-01-25 2018-07-20 阿里巴巴集团控股有限公司 Model training method, data type recognition methods and computer equipment
CN109543049A (en) * 2018-11-23 2019-03-29 广东小天才科技有限公司 A kind of method and system for writing techniques automatic push material
CN109543049B (en) * 2018-11-23 2021-09-07 广东小天才科技有限公司 Method and system for automatically pushing materials according to writing characteristics

Similar Documents

Publication Publication Date Title
CN104156436B (en) Social association cloud media collaborative filtering and recommending method
CN106407484B (en) Video tag extraction method based on barrage semantic association
CN103207899B (en) Text recommends method and system
Fleischhacker et al. Detecting errors in numerical linked data using cross-checked outlier detection
CN103106262B (en) The method and apparatus that document classification, supporting vector machine model generate
CN105824802A (en) Method and device for acquiring knowledge graph vectoring expression
CN103235812B (en) Method and system for identifying multiple query intents
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same
CN104391942A (en) Short text characteristic expanding method based on semantic atlas
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
Yang et al. Finding interesting posts in twitter based on retweet graph analysis
JP2013529805A5 (en) Search method, search system and computer program
CN104484343A (en) Topic detection and tracking method for microblog
CN103914494A (en) Method and system for identifying identity of microblog user
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
CN102567304A (en) Filtering method and device for network malicious information
US20180210897A1 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
CN106294418B (en) Search method and searching system
Ferrara et al. Automatic wrapper adaptation by tree edit distance matching
CN109165382A (en) A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN106055539A (en) Name disambiguation method and apparatus
CN104050556A (en) Feature selection method and detection method of junk mails
CN106156041A (en) Hot information finds method and system
WO2016009419A1 (en) System and method for ranking news feeds
CN104978320A (en) Knowledge recommendation method and equipment based on similarity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150429