CN106776923A - Improved clustering algorithm realizes that search engine keywords optimize - Google Patents

Improved clustering algorithm realizes that search engine keywords optimize Download PDF

Info

Publication number
CN106776923A
CN106776923A CN201611089248.5A CN201611089248A CN106776923A CN 106776923 A CN106776923 A CN 106776923A CN 201611089248 A CN201611089248 A CN 201611089248A CN 106776923 A CN106776923 A CN 106776923A
Authority
CN
China
Prior art keywords
keyword
search engine
follows
clustering algorithm
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611089248.5A
Other languages
Chinese (zh)
Inventor
金平艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yonglian Information Technology Co Ltd
Original Assignee
Sichuan Yonglian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yonglian Information Technology Co Ltd filed Critical Sichuan Yonglian Information Technology Co Ltd
Priority to CN201611089248.5A priority Critical patent/CN106776923A/en
Publication of CN106776923A publication Critical patent/CN106776923A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Improved clustering algorithm realizes that search engine keywords optimize, and kernel keyword, the corresponding data item of search keyword, such as national monthly volumes of searches, degree of contention and each clicking cost of estimation are determined according to business eventDeng, dimension-reduction treatment again is carried out to above-mentioned keyword set, each keyword First Five-Year Plan dimensional vector is represented, increase homepage webpage number and total searched page number, and then the four-dimension is reduced to again by five dimensions, finally using improved clustering algorithm to keyword clustering, its global objective function isInventive algorithm is more simple and effective, run time complexity is low, processing speed is faster, classification results more meet empirical value, with more preferable data process effects, can help the ranking of website its keyword of fast lifting in a short time, for enterprise web site brings certain flow and inquiry, so as to reach preferable web information flow target.

Description

Improved clustering algorithm realizes that search engine keywords optimize
Technical field
The present invention relates to Semantic Web technology field, and in particular to a kind of improved clustering algorithm realizes that search engine is crucial Word optimizes.
Background technology
Search engine plays vital effect to improving website visiting amount, because user is in Internal retrieval information, Common means are scanned for using search engine.Therefore, in network promotion field, SEO (Search Engine Optimization, search engine optimization) very important effect is just provided with, so as to enjoy the attention of vast website.Search Engine optimisation technique includes black cap technology and white cap technology, wherein black cap technology represents the malice for violating principle of optimality of search engine Optimisation technique, shows as piling up keyword in the page or placing unrelated keyword being searched to improve in keyword optimisation technique The ranking in holding up is indexed, current each search engine has been incorporated into correlation technique and rule is punished the website using black cap technology Penalize;White cap technology then represents the optimisation technique of searched engine accreditation.Real SEO is to be easily to search for engine index by use Reasonable approach, make website more friendly to user and search engine (Search Engine Friendly), so that easily quilt Search engine is included and priority ordering.One business website obtains nature ranking with its core keyword in main flow search engine Preferentially, in the business community of today, there is extraordinary value.Therefore keyword is also commonly known as being whole search application Foundation stone.Theoretical research and technology application at present both at home and abroad to keyword optimization is relatively more, but does not propose an effective side temporarily Method simplifies key word analysis flow, and also neither one perfect mechanism manages keyword optimisation strategy and progress.Based on upper Demand is stated, realizes that search engine keywords optimize the invention provides improved clustering algorithm.
The content of the invention
The technical problem that search engine optimization is realized in keyword optimization is directed to, the invention provides improved clustering algorithm Realize that search engine keywords optimize.
In order to solve the above problems, the present invention is achieved by the following technical solutions:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e. each keyword dimensionality reduction be four-dimensional again by five dimensional vectors.
Step 4:Using improved clustering algorithm, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using the k-means algorithm initialization clusters based on ε fields;
Step 4.2:Initialize the object function in each ε fieldFollowing judgements are pressed from set of data objects D Condition selects k initial cluster center;
Step 4.3:To every class keywords i, (i ∈ (1,2 ..., m)) are redistributed, poly- by probability function p (i) selection Class center j ';
Step 4.4:According to the result of decision function Δ (g), Ge Cu centers are recalculated;
Step 4.5:If cluster center changes, step 4.2 is gone to, otherwise iteration terminates, export cluster result.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and selection is suitable crucial Word optimisation strategy reaches web information flow target.
Present invention has the advantages that:
1, this algorithm can simplify key word analysis flow, and then reduce whole web information flow workload.
2, the run time complexity of this algorithm is low, and processing speed is faster.
3rd, this algorithm has bigger value.
4th, the ranking of website its keyword of fast lifting in a short time can be helped.
5th, for enterprise web site brings certain flow and inquiry, so as to reach preferable web information flow target.
6th, the degree of accuracy of this algorithm classification result more meets empirical value.
7th, this algorithm is more simple and effective.
8th, the effect of data processing is more preferable.
Brief description of the drawings
The improved clustering algorithms of Fig. 1 realize that search engine keywords optimize structure flow chart
Applicating flow chart of the improved clustering algorithms of Fig. 2 in cluster analysis
Specific embodiment
In order to solve the technical problem that search engine optimization is realized in keyword optimization, the present invention is carried out with reference to Fig. 1-Fig. 2 Describe in detail, its specific implementation step is as follows:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation Deng.
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e. each keyword dimensionality reduction be four-dimensional, its specific meter again by five dimensional vectors Calculation process is as follows:
Here associative key number is m, existing following m × 5 matrix:
Ni、Ldi、CPCi、NiS、NiYIt is followed successively by monthly volumes of searches, degree of contention, the estimation of i-th corresponding this country of keyword Each clicking cost (CPC), homepage webpage number, total searched page number.
Dimensionality reduction is the four-dimension again, i.e.,
XI ∈ (1,2 ..., m)It is search efficiency, ZI ∈ (1,2 ..., m)It is value rate, as following formula:
Step 4:Using improved clustering algorithm, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using the k-means algorithm initialization clusters based on ε fields.
Step 4.2:Initialize the object function in each ε fieldFollowing judgements are pressed from set of data objects D Condition selects k initial cluster center, and its specific calculating process is as follows:
Above formula nεIt is the number of data object in each ε field,It is compactness total in each ε field, α, β Respectively quantity nε, compactnessInfluence coefficient, and alpha+beta=1, its value can go out suitable value according to experiment iteration.
Above formula
I-th crucial term vector and its cluster center vector in for spaceInner product.
Decision condition is as follows:
γ is the threshold value for setting, and only meets above formula condition and is then classified as cluster, then screen k classes out.
Step 4.3:To every class keywords i, (i ∈ (1,2 ..., m)) are redistributed, poly- by probability function p (i) selection Class center j ', its specific calculating process is as follows:
By the corresponding cluster centre j ' of p (i) value MAXIMUM SELECTIONs.
Step 4.4:According to the result of decision function Δ (g), Ge Cu centers are recalculated, its specific calculating process is as follows:
gi∈kIt is the global objective function that iv-th iteration is obtained,It is the object function of iv-th iteration jth class cluster.
Δ (g)=gi∈k N-gi∈k N-1> 0
Meet above formula, then recalculate Ge Cu centers.
Step 4.5:If cluster center changes, step 4.2 is gone to, otherwise iteration terminates, export cluster result.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and selection is suitable crucial Word optimisation strategy reaches web information flow target.
Improved clustering algorithm realizes that search engine keywords optimize, its false code process
Input:The kernel keyword that website is extracted, cluster is initialized based on ε fields, initializes the target letter in each ε field Number
Output:Global objective function gi∈kThe maximum k cluster of summation.

Claims (2)

1. improved clustering algorithm realizes that search engine keywords optimize, and the present invention relates to Semantic Web technology field, specifically relates to And a kind of improved clustering algorithm realizes that search engine keywords optimize, it is characterized in that, comprise the following steps:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these keywords exist There are corresponding data items in search engine, such as national monthly volumes of searches, degree of contention and each clicking cost of estimationDeng
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, remember here Dimensionality reduction is four-dimensional again by five dimensional vectors for record homepage webpage number and total searched page number, i.e. each keyword, and it was specifically calculated Journey is as follows:
Here associative key number is m, existing followingMatrix:
Be followed successively by the corresponding this country of i-th keyword monthly volumes of searches, degree of contention, estimate Calculate each clicking cost, homepage webpage number, total searched page number
Dimensionality reduction is the four-dimension again, i.e.,
It is search efficiency,It is value rate, as following formula:
Step 4:Using improved clustering algorithm, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using being based onThe k-means algorithm initialization clusters in field;
Step 4.2:Initialize eachThe object function in field, following decision conditions are pressed from set of data objects D K initial cluster center of selection;
Step 4.3:To every class keywordsRedistributed, by probability functionSelection is poly- Class center
Step 4.4:According to decision functionResult, recalculate Ge Cu centers;
Step 4.5:If cluster center changes, step 4.2 is gone to, otherwise iteration terminates, export cluster result
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and select suitable keyword excellent Change strategy and reach web information flow target.
2. the improved clustering algorithm according to claim 1 realizes that search engine keywords optimize, it is characterized in that, the above Specific calculating process in the step 4 is as follows:
Step 4:Using improved clustering algorithm, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using being based onThe k-means algorithm initialization clusters in field
Step 4.2:Initialize eachThe object function in field, following decision conditions are pressed from set of data objects D K initial cluster center of selection, its specific calculating process is as follows:
Above formulaFor eachThe number of data object in field,For eachTotal compactness in field,Respectively quantity, compactnessInfluence coefficient, and, its value can according to experiment iteration go out Suitable value
Above formula
I-th crucial term vector and its cluster center vector in for spaceInner product
Decision condition is as follows:
It is the threshold value for setting, only meets above formula condition and be then classified as cluster, then screens k classes out
Step 4.3:To every class keywordsRedistributed, by probability functionSelection is poly- Class center, its specific calculating process is as follows:
PressThe corresponding cluster centre of value MAXIMUM SELECTION
Step 4.4:According to decision functionResult, recalculate Ge Cu centers, its specific calculating process is as follows:
It is the global objective function that iv-th iteration is obtained,It is the object function of iv-th iteration jth class cluster
Meet above formula, then recalculate Ge Cu centers
Step 4.5:If cluster center changes, step 4.2 is gone to, otherwise iteration terminates, export cluster result.
CN201611089248.5A 2016-11-30 2016-11-30 Improved clustering algorithm realizes that search engine keywords optimize Pending CN106776923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611089248.5A CN106776923A (en) 2016-11-30 2016-11-30 Improved clustering algorithm realizes that search engine keywords optimize

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611089248.5A CN106776923A (en) 2016-11-30 2016-11-30 Improved clustering algorithm realizes that search engine keywords optimize

Publications (1)

Publication Number Publication Date
CN106776923A true CN106776923A (en) 2017-05-31

Family

ID=58913423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611089248.5A Pending CN106776923A (en) 2016-11-30 2016-11-30 Improved clustering algorithm realizes that search engine keywords optimize

Country Status (1)

Country Link
CN (1) CN106776923A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297438A (en) * 2021-05-21 2021-08-24 深圳市智尊宝数据开发有限公司 Information retrieval method, electronic equipment and related products

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林元国 等: "K-means算法在关键词优化中的应用", 《计算机***应用》 *
邓健爽 等: "基于搜索引擎的关键词自动聚类法", 《计算机科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297438A (en) * 2021-05-21 2021-08-24 深圳市智尊宝数据开发有限公司 Information retrieval method, electronic equipment and related products
CN113297438B (en) * 2021-05-21 2022-02-22 深圳市智尊宝数据开发有限公司 Information retrieval method, electronic equipment and related products

Similar Documents

Publication Publication Date Title
Chu et al. A hybrid recommendation system considering visual information for predicting favorite restaurants
Li et al. An improved collaborative filtering recommendation algorithm and recommendation strategy
US20080040342A1 (en) Data processing apparatus and methods
CN104834693A (en) Depth-search-based visual image searching method and system thereof
Xie et al. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb
CN106649616A (en) Clustering algorithm achieving search engine keyword optimization
Lu et al. Personalized search on flickr based on searcher's preference prediction
CN103761286B (en) A kind of Service Source search method based on user interest
Cong Personalized recommendation of film and television culture based on an intelligent classification algorithm
CN106933954A (en) Search engine optimization technology is realized based on Decision Tree Algorithm
CN104598604A (en) Browsing method of website navigation applied in various browsers
CN106909626A (en) Improved Decision Tree Algorithm realizes search engine optimization technology
CN106933953A (en) A kind of fuzzy K mean cluster algorithm realizes search engine optimization technology
CN106776923A (en) Improved clustering algorithm realizes that search engine keywords optimize
CN106874376A (en) A kind of method of verification search engine keyword optimisation technique
CN106874377A (en) The improved clustering algorithm based on constraints realizes that search engine keywords optimize
CN106897356A (en) Improved Fuzzy C mean algorithm realizes that search engine keywords optimize
CN106802945A (en) Fuzzy c-Means Clustering Algorithm based on VSM realizes that search engine keywords optimize
CN106776915A (en) A kind of new clustering algorithm realizes that search engine keywords optimize
CN106649537A (en) Search engine keyword optimization technology based on improved swarm intelligence algorithm
CN106599118A (en) Method for realizing search engine keyword optimization by improved density clustering algorithm
Yang et al. A hot topic detection approach on Chinese microblogging
Li et al. Instance image retrieval with generative adversarial training
CN106933950A (en) New Model tying algorithm realizes search engine optimization technology
Lu et al. Data mining and social networks processing method based on support vector machine and k-nearest neighbor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170531