CN106777317A - Improved c mean algorithms realize that search engine keywords optimize - Google Patents

Improved c mean algorithms realize that search engine keywords optimize Download PDF

Info

Publication number
CN106777317A
CN106777317A CN201710003652.4A CN201710003652A CN106777317A CN 106777317 A CN106777317 A CN 106777317A CN 201710003652 A CN201710003652 A CN 201710003652A CN 106777317 A CN106777317 A CN 106777317A
Authority
CN
China
Prior art keywords
keyword
improved
search engine
classes
mean algorithms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710003652.4A
Other languages
Chinese (zh)
Inventor
金平艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yonglian Information Technology Co Ltd
Original Assignee
Sichuan Yonglian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yonglian Information Technology Co Ltd filed Critical Sichuan Yonglian Information Technology Co Ltd
Priority to CN201710003652.4A priority Critical patent/CN106777317A/en
Publication of CN106777317A publication Critical patent/CN106777317A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Improved c mean algorithms realize that search engine keywords optimize, and kernel keyword, the corresponding data item of search keyword, such as national monthly volumes of searches, degree of contention and each clicking cost of estimation are determined according to business eventDeng, dimension-reduction treatment again is carried out to above-mentioned keyword set, each keyword is represented with First Five-Year Plan dimensional vector, increase homepage webpage number and total searched page number, and then the four-dimension is reduced to again by five dimensions, using c mean algorithms are improved, calculate Subject Matrix J, structure is subordinate to constraints, comprehensive c classes catalogue scalar functionsComposition one draws and meets the maximum necessary condition of catalogue scalar functions with the m equation group of Lagrange multiplierIf meeting decision condition, then optimal cluster result is exported, do not find otherwise, present invention, avoiding cluster result Premature Convergence, data SNR is improved, run time complexity is low, can be with fast lifting keyword ranking, with bigger value.

Description

Improved c- mean algorithms realize that search engine keywords optimize
Technical field
The present invention relates to Semantic Web technology field, and in particular to a kind of improved c- mean algorithms realize that search engine is closed Keyword optimizes.
Background technology
In recent years, the support energetically with country to Internet industry, network speed is substantially improved, and network rate decline, knowledge Economy and informatization fast development, the network information are presented explosive growth, and the mankind have progressed into the big data epoch.It is numerous The network information enriches the information source of people, also gives people quick obtaining information and causes puzzlement.Search engine precision, people The information retrieval service of property is approved by numerous users.User improves to the utilization rate of search engine so that search engine is fast Speed development.
Many researchs at present find, search engine user is general only to pay close attention to net in the top in result of page searching Stand, the clicking rate of these websites is also relatively higher.Therefore, it is positive enterprise to improve ranking of the website in keyword search results Thinking, many enterprises are to improve ranking of the website in search engine search results and obtain visit capacity actively to scan for engine Marketing.Search engine optimization (Search Engine Optimization, abbreviation SEO) refers to that website is entered using correlation technique Row series of optimum, so as to improve corresponding keyword ranking on a search engine, is finally reached the purpose of website marketing, SEO After all it is the optimization of keyword.But for keyword selection mostly by virtue of experience and subjective factor, also neither one Perfect mechanism manages keyword optimisation strategy and progress.To make the selection more scientific and objectivity of keyword, it is based on The demand, realizes that search engine keywords optimize the invention provides improved c- mean algorithms.
The content of the invention
The technical problem that search engine optimization is realized in keyword optimization is directed to, it is equal the invention provides a kind of improved c- Value-based algorithm realizes that search engine keywords optimize.
In order to solve the above problems, the present invention is achieved by the following technical solutions:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e. each keyword dimensionality reduction be four-dimensional again by five dimensional vectors.
Step 4:Using improved c- mean algorithms, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:It is c classes using the k-means algorithm initializations cluster based on ε fields.
Step 4.2:With the number initialization Subject Matrix J between value [0,1], the whole constraints for being subordinate to its satisfaction.
Step 4.3:Initialize each field object function L (S2)start, build c class catalogue scalar functions.
Step 4.4:Using following formula decision function Δ (S2) judge the above results accuracy.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and selection is suitable crucial Word optimisation strategy reaches web information flow target.
Present invention has the advantages that:
1, this algorithm can simplify key word analysis flow, and then reduce whole web information flow workload.
2, the run time complexity of this algorithm is low, and processing speed is faster.
3rd, this algorithm has bigger value.
4th, the ranking of website its keyword of fast lifting in a short time can be helped.
5th, for enterprise web site brings certain flow and inquiry, so as to reach preferable web information flow target.
6th, cluster result Premature Convergence is avoided;
7th, influence of the isolated point to cluster result is reduced, the signal to noise ratio of data result is improved.
Brief description of the drawings
The improved c- mean algorithms of Fig. 1 realize that search engine keywords optimize structure flow chart
Applicating flow chart of the improved c- mean algorithms of Fig. 2 in cluster analysis
Specific embodiment
In order to solve the technical problem that search engine optimization is realized in keyword optimization, the present invention is carried out with reference to Fig. 1-Fig. 2 Describe in detail, its specific implementation step is as follows:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation Deng.
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e. each keyword dimensionality reduction be four-dimensional, its specific meter again by five dimensional vectors Calculation process is as follows:
Here associative key number is m, existing following m × 5 matrix:
Ni、Ldi、CPCi、NiS、NiYIt is followed successively by monthly volumes of searches, degree of contention, the estimation of i-th corresponding this country of keyword Each clicking cost (CPC), homepage webpage number, total searched page number.
Dimensionality reduction is the four-dimension again, i.e.,
XI ∈ (1,2 ..., m)It is search efficiency, ZI ∈ (1,2 ..., m)It is value rate, as following formula:
Step 4:Using improved c- mean algorithms, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:It is c classes using the k-means algorithm initializations cluster based on ε fields.
Step 4.2:With the number initialization Subject Matrix J between value [0,1], it is set to meet the whole constraints being subordinate to, its Specific calculating process is as follows:
Above formula wijBelong to the degree coefficient of j classes for keyword i, i.e. j ∈ (1,2 ..., c), i ∈ (1,2 ..., m).dijFor The distance at keyword i to j classes center.
Initialization Subject Matrix J is m × c:
The whole constraints being subordinate to is:
Step 4.3:Initialize each field object function L (S2)start, build c class catalogue scalar functions, its specific calculating Process is as follows:
Above formula NεjIt is the number of data object in j class ε fields, xihVector corresponding to data object in j class ε fields, yihIt is corresponding cluster centre data object vectors in j class ε fields.
Build c class catalogue scalar functions L (S2∑j∈cFor:
Comprehensive constraint condition, constructs following fresh target function, and can try to achieve makes L (S2)∑j∈cReach the necessary condition of maximum:
Above formula λi(i=1,2 ..., m) be the whole constraints being subordinate to Lagrange multiplier.To all parameter derivations, Make formula reach maximum necessary condition be:
Above formulaVector corresponding to keyword i;
Step 4.4:Using following formula decision function Δ (S2) judge the above results accuracy, its specific calculating process is such as Under:
Decision function Δ (S2):
Δ(S2)=L (S2)new ∑j∈c-L(S2)old ∑j∈c< θ
Above formula L (S2)new ∑j∈cIt is new catalogue scalar functions, L (S2)old ∑j∈cFor the catalogue offer of tender that last iteration draws Number.θ is a sufficiently small number, only meets above-mentioned condition, then have found optimal classification, is not found otherwise.
Concrete structure flow such as Fig. 2 of improved C- mean algorithms.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and selection is suitable crucial Word optimisation strategy reaches web information flow target.
Improved c- mean algorithms realize that search engine keywords optimize, its false code process
Input:The kernel keyword that website is extracted, c classes are initialized based on ε fields
Output:High-quality keyword after series of optimum.

Claims (2)

1. improved c- mean algorithms realize that search engine keywords optimize, the present invention relates to Semantic Web technology field, specifically It is related to improved c- mean algorithms to realize that search engine keywords optimize, it is characterized in that, comprise the following steps:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these keywords exist There are corresponding data items in search engine, such as national monthly volumes of searches, degree of contention and each clicking cost of estimation(CPC)Deng
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set for searching of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, remember here Dimensionality reduction is four-dimensional again by five dimensional vectors for record homepage webpage number and total searched page number, i.e. each keyword, and it was specifically calculated Journey is as follows:
Here associative key number is m, existing followingMatrix:
It is followed successively by monthly volumes of searches, degree of contention, the estimation of i-th corresponding this country of keyword Each clicking cost(CPC), homepage webpage number, total searched page number dimensionality reduction again
It is the four-dimension, i.e.,
It is search efficiency,It is value rate, as following formula:
Step 4:Using improved c- mean algorithms, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using being based onThe k-means algorithm initializations cluster in field is c classes
Step 4.2:With the number initialization Subject Matrix J between value [0,1], the whole constraints for being subordinate to its satisfaction
Step 4.3:Initialize each field object function, build c class catalogue scalar functions
Step 4.4:Using following formula decision functionJudge the accuracy of the above results
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and value rate optimize, and select suitable keyword excellent Change strategy and reach web information flow target.
2. the improved C- mean algorithms according to claim 1 realize that search engine keywords optimize, it is characterized in that, with Specific calculating process in the upper step 4 is as follows:
Step 4:Using improved c- mean algorithms, clustering processing is carried out to above-mentioned keyword, its specific sub-step is as follows:
Step 4.1:Using being based onThe k-means algorithm initializations cluster in field is c classes
Step 4.2:With the number initialization Subject Matrix J between value [0,1], the whole constraints for being subordinate to its satisfaction, its is specific Calculating process is as follows:
Above formulaBelong to the degree coefficient of j classes for keyword i, i.e.,
It is the distance at keyword i to j classes center
Initializing Subject Matrix J is
The whole constraints being subordinate to is:
Step 4.3:Initialize each field object function, build c class catalogue scalar functions, its specific calculating Process is as follows:
Above formulaIt is j classesThe number of data object in field,It is j classesIn field corresponding to data object to Amount,It is j classesCorresponding cluster centre data object vectors in field
Build c class catalogue scalar functionsFor:
Comprehensive constraint condition, constructs following fresh target function, and can try to achieve makesReach the necessary condition of maximum:
Above formulaIt is the Lagrange multiplier of the whole constraints being subordinate to, to all parameter derivations, makes Formula reaches maximum necessary condition:
Above formulaVector corresponding to keyword i;
Step 4.4:Using following formula decision functionJudge the accuracy of the above results, its specific calculating process is as follows:
Decision function
Above formulaIt is new catalogue scalar functions,For the catalogue offer of tender that last iteration draws Number,It is a sufficiently small number, only meets above-mentioned condition, then have found optimal classification, does not find otherwise
Concrete structure flow such as Fig. 2 of improved C- mean algorithms.
CN201710003652.4A 2017-01-03 2017-01-03 Improved c mean algorithms realize that search engine keywords optimize Pending CN106777317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710003652.4A CN106777317A (en) 2017-01-03 2017-01-03 Improved c mean algorithms realize that search engine keywords optimize

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710003652.4A CN106777317A (en) 2017-01-03 2017-01-03 Improved c mean algorithms realize that search engine keywords optimize

Publications (1)

Publication Number Publication Date
CN106777317A true CN106777317A (en) 2017-05-31

Family

ID=58949629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710003652.4A Pending CN106777317A (en) 2017-01-03 2017-01-03 Improved c mean algorithms realize that search engine keywords optimize

Country Status (1)

Country Link
CN (1) CN106777317A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林元国 等: "K-means算法在关键词优化中的应用", 《计算机***应用》 *
邓健爽 等: "基于搜索引擎的关键词自动聚类法", 《计算机科学》 *

Similar Documents

Publication Publication Date Title
Wu et al. Collaborative topic regression with social trust ensemble for recommendation in social media systems
CN111444395B (en) Method, system and equipment for obtaining relation expression between entities and advertisement recall system
Liu et al. Real-time social recommendation based on graph embedding and temporal context
CN106649616A (en) Clustering algorithm achieving search engine keyword optimization
Xie et al. Application of improved recommendation system based on spark platform in big data analysis
CN106933954A (en) Search engine optimization technology is realized based on Decision Tree Algorithm
Cai et al. Global-local neighborhood based network representation for citation recommendation
Li et al. From edge data to recommendation: A double attention-based deformable convolutional network
CN106909626A (en) Improved Decision Tree Algorithm realizes search engine optimization technology
CN106933953A (en) A kind of fuzzy K mean cluster algorithm realizes search engine optimization technology
Xie et al. Predicting miRNA-disease associations based on PPMI and attention network
Hu et al. WSHE: User feedback-based weighted signed heterogeneous information network embedding
Lin et al. Deep-profiling: a deep neural network model for scholarly web user profiling
Jiang et al. Cultural tourism attraction recommendation model based on optimized weighted association rule algorithm
CN106874376A (en) A kind of method of verification search engine keyword optimisation technique
Wu et al. How Airbnb tells you will enjoy sunset sailing in Barcelona? Recommendation in a two-sided travel marketplace
CN106897356A (en) Improved Fuzzy C mean algorithm realizes that search engine keywords optimize
CN106874377A (en) The improved clustering algorithm based on constraints realizes that search engine keywords optimize
CN106897376A (en) Fuzzy C-Mean Algorithm based on ant colony realizes that keyword optimizes
CN106777317A (en) Improved c mean algorithms realize that search engine keywords optimize
CN106802945A (en) Fuzzy c-Means Clustering Algorithm based on VSM realizes that search engine keywords optimize
CN106933950A (en) New Model tying algorithm realizes search engine optimization technology
CN106649537A (en) Search engine keyword optimization technology based on improved swarm intelligence algorithm
Lu et al. Genderpredictor: a method to predict gender of customers from e-commerce website
Xue et al. Optimizing biomedical ontology alignment in lexical vector space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170531

WD01 Invention patent application deemed withdrawn after publication