CN103116587B - A kind of can the method for digging of default keyword, data search method and equipment - Google Patents

A kind of can the method for digging of default keyword, data search method and equipment Download PDF

Info

Publication number
CN103116587B
CN103116587B CN201110365011.6A CN201110365011A CN103116587B CN 103116587 B CN103116587 B CN 103116587B CN 201110365011 A CN201110365011 A CN 201110365011A CN 103116587 B CN103116587 B CN 103116587B
Authority
CN
China
Prior art keywords
search results
keyword
search
keywords
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110365011.6A
Other languages
Chinese (zh)
Other versions
CN103116587A (en
Inventor
王跃
岳淑珍
金凯民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201110365011.6A priority Critical patent/CN103116587B/en
Publication of CN103116587A publication Critical patent/CN103116587A/en
Priority to HK13108380.6A priority patent/HK1181152A1/en
Application granted granted Critical
Publication of CN103116587B publication Critical patent/CN103116587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application propose a kind of can the method for digging of default keyword, data search method and equipment, main contents are: by the first corresponding for the first inquiry request of comprising at least two keywords Search Results, with comprise described in the second Search Results corresponding to second inquiry request of subset of at least two keywords carry out Similarity Measure, afterwards the threshold value of the Similarity value obtained and setting is compared, and then determine can be default in the keyword that this first inquiry request comprises keyword, so that when carrying out data search, can can search for by other keywords except default keyword except described according in inquiry request, because the keyword quantity used in search reduces, therefore, the duration that search engine takies in search procedure can be reduced, improve search efficiency, and reduce the resource occupation amount of search engine to search procedure, in addition, reduce keyword and can also search out the Search Results meeting user search demand more.

Description

A kind of can the method for digging of default keyword, data search method and equipment
Technical field
The application relates to field of computer technology, particularly relate to a kind of can the method for digging of default keyword, data search method and equipment.
Background technology
Along with developing rapidly of infotech, internet has been deep into each side of people's life, in the face of magnanimity information, user only need input the keyword needing query contents in the search box of internet search engine, can obtain the Search Results of the such as info web comprising described keyword.
In the prior art, the mode utilizing search engine to carry out keyword search is:
First, search engine receives inquiry request, comprises for carrying out the keyword searched in described inquiry request.
Then, search engine utilizes described keyword, inquires with the webpage of described Keywords matching as Search Results from search database.
In the mode of above-mentioned data search, if only comprise a keyword in user's input search condition, then search engine inquiry goes out the Search Results with this Keywords matching; If comprise multiple keyword in the search condition of user's input, then the Search Results that search engine inquiry goes out needs to mate with each keyword.
When comprising multiple keyword in the search condition of user's input, because search engine needs each keyword to carry out search operation as search condition, therefore, if Partial key word is when combining with other keywords, this Partial key word relative to other keywords be can by default (namely this Partial key word does not have a significant effect to final Search Results) when, search engine is still utilized to carry out the words of searching for for each keyword, additionally may take the resource of search engine, increase the time overhead of search, reduce the efficiency of search, simultaneously, for can by for default keyword, this can be had certain restriction to Search Results by default keyword when searching for, cause user actual wish the Search Results that obtains may due to can by default keyword to search for as search condition less than.
Such as: the keyword comprised in the searching request of user's input has 3, " Nokia ", " A model ", " mobile phone " respectively, search engine, after receiving the searching request comprising these 3 keywords, will inquire the webpage that all mates with these 3 keywords as Search Results from search database.But, " Nokia " in keyword, " A model " have shown that Search Results that user wishes is the Search Results about Nokia's mobile phone, therefore, " mobile phone " this keyword is can by default keyword for " Nokia ", " A model ".According to above-mentioned way of search, search engine will additionally be searched for for " mobile phone " this keyword assignment resource, increases the time overhead of search, reduces the efficiency of search; Simultaneously, the Search Results obtained due to user's actual needs is and " Nokia ", Search Results that " A model " is relevant, " mobile phone " this keyword will limit the Search Results finally obtained, some is made to comprise " Nokia ", " A model ", but the page not comprising " mobile phone " as Search Results, cannot cause the accuracy decrease to some degree of Search Results.
In addition, because the keyword comprised in inquiry request is ever-changing, also not having at present a kind of effective mode can carry out the keyword of user's input can the judgement of default keyword, identify that determine can default keyword manually iff dependence, be difficult to be applied in actual search procedure.
Summary of the invention
The embodiment of the present application provide a kind of can the method for digging of default keyword, data search method and equipment, the problem that the data search process holding time existed in prior art is long, search efficiency is low in order to solve, resource occupation amount large and Search Results can not be met consumers' demand very well.
Can the method for digging of default keyword, comprising:
Determine to comprise the first Search Results that the first inquiry request of at least two keywords is corresponding;
Determine the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords;
Described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results;
By at least two keywords described in corresponding for described first Search Results, except the keyword reached with the similarity that at least one is determined except keyword corresponding to the second Search Results of setting threshold value, as can default keyword.
A kind of data search method, comprising:
Receiving the inquiry request comprising at least two keywords, and at least two keywords described in determining according to above-mentioned can the method for digging method of default keyword determine can default keyword time, whether have at least two keywords described in judgement can default keyword;
Judged result for have can default keyword time, can search for by the keyword except default keyword except described according in inquiry request.
Can the excavating equipment of default keyword, comprising:
First Search Results determination module, the first Search Results that the first inquiry request for determining to comprise at least two keywords is corresponding;
Second Search Results determination module, for determining the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords;
Similarity computing module, for described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results;
Can default keyword determination module, for by least two keywords described in corresponding for described first Search Results, except the keyword reached with the similarity that at least one is determined except keyword corresponding to the second Search Results of setting threshold value, as can default keyword.
A kind of data search device, comprising:
Can default keyword judge module, for receiving the inquiry request comprising at least two keywords, and at least two keywords described in determining according to above-mentioned can the excavating equipment of default keyword determine can default keyword time, whether have at least two keywords described in judgement can default keyword;
Search module, for judged result for have can default keyword time, can search for by the keyword except default keyword except described according in inquiry request.
The application's beneficial effect is as follows:
The scheme that the embodiment of the present application provides, by the first corresponding for the first inquiry request comprising at least two keywords Search Results, with comprise described in the second Search Results corresponding to second inquiry request of subset of at least two keywords carry out Similarity Measure, afterwards the threshold value of the Similarity value obtained and setting is compared, and then determine can be default in the keyword that this first inquiry request comprises keyword, so that when carrying out data search, can can search for by other keywords except default keyword except described according in inquiry request, because the keyword quantity used in search reduces, therefore, the duration that search engine takies in search procedure can be reduced, improve search efficiency, and reduce the resource occupation amount of search engine to search procedure, in addition, reduce keyword and can also search out the Search Results meeting user search demand more.
Accompanying drawing explanation
Fig. 1 is that the embodiment of the present application one can the method for digging step schematic diagram of default keyword;
Fig. 2 is the embodiment of the present application two data search method step schematic diagram;
Fig. 3 is that the embodiment of the present application three can the excavating equipment structural representation of default keyword;
Fig. 4 is the embodiment of the present application four data search device structural representation.
Embodiment
For realizing the application's object, the embodiment of the present application propose a kind of can the excavation of default keyword and data search scheme, by the first corresponding for the first inquiry request comprising at least two keywords Search Results, with comprise described in the second Search Results corresponding to second inquiry request of subset of at least two keywords carry out Similarity Measure, determine second Search Results higher with the first Search Results similarity, keyword corresponding for the second Search Results determined is seen as and can be used in carrying out for the first inquiry request the keyword inquired about, therefore, in first inquiry request except keyword corresponding to the second Search Results determined, the keyword less on Search Results impact can be thought in remaining keyword, get final product default keyword, so that when carrying out data search, can can search for by other keywords except default keyword except described according in inquiry request, because the keyword quantity used in search reduces, therefore, the duration that search engine takies in search procedure can be reduced, improve search efficiency, and reduce the resource occupation amount of search engine to search procedure, in addition, reduce keyword and can also search out the Search Results meeting user search demand more.
Be described in detail below in conjunction with the scheme of Figure of description to the embodiment of the present application.
Embodiment one
As shown in Figure 1, for can the method for digging step schematic diagram of default keyword in the embodiment of the present application one, said method comprising the steps of:
Step 101: determine to comprise the first Search Results that the first inquiry request of at least two keywords is corresponding.
For need to carry out can default keyword excavate multiple keyword time, with described multiple keyword for querying condition is searched for, and corresponding Search Results can be obtained.Search Results can be but be not limited to the Search Results of form web page.
Step 102: determine the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords.
The first Search Results in this step 101 and step 102 and the second Search Results can store in the mode of log information.
More preferably, in order to reduce follow-up Similarity Measure amount, the keyword inequality that can set in any two the second inquiry request is identical, to avoid repetitive operation.
The subset of described at least two keywords refers to: the Partial key word at least two keywords described in the keyword comprised in arbitrary subset is.
Such as: the keyword in the first inquiry request is " A ", " B ", " C ", then the subset for these three keywords is respectively:
Subset 1: " A ";
Subset 2: " B ";
Subset 3: " C ";
Subset 4: " A ", " B ";
Subset 5: " A ", " C ";
Subset 6: " B ", " C ".
In the scheme of this step 102, the keyword comprised in the second involved inquiry request can for the keyword in above-mentioned arbitrary subset.
More preferably, the second Search Results that 6 the second inquiry request comprising above-mentioned subset are respectively corresponding can be determined, also can comprise the second inquiry request of above-mentioned subset respectively from 6, select part second inquiry request, and determine the second Search Results that the second inquiry request of selection is corresponding.
Step 103: described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results.
Still for above-mentioned the first inquiry request comprising " A ", " B ", " C " 3 keywords, suppose:
The Search Results of the first inquiry request is Result_0;
The Search Results comprising the second inquiry request of subset 1 is Result_1;
The Search Results comprising the second inquiry request of subset 2 is Result_2;
The Search Results comprising the second inquiry request of subset 3 is Result_3;
The Search Results comprising the second inquiry request of subset 4 is Result_4;
The Search Results comprising the second inquiry request of subset 5 is Result_5;
The Search Results comprising the second inquiry request of subset 6 is Result_6.
Then in this step 103, need Result_0 to do similarity computing with Result_1 ~ Result_6 respectively, obtain the similarity of Result_0 respectively and between Result_1 ~ Result_6.
For arbitrary second Search Results, whether reach threshold value by the similarity calculated with under type between this second Search Results and first Search Results:
Page title in page title in first Search Results and the second Search Results is carried out similarity computing, if the similarity of page title reaches setting threshold value, then determines that the similarity of this second Search Results and the first Search Results reaches setting threshold value; Otherwise, determine that the similarity of this second Search Results and the first Search Results does not arrive and set threshold value.
Particularly, in the first Search Results in page title and the second Search Results page title to carry out the mode of similarity computing as follows:
The first step: participle is carried out to the page title in the first Search Results, and be each participle setting weighted value, obtain the first eigenvector that the first Search Results is corresponding, and participle is carried out to the page title in the second Search Results, and be each participle setting weighted value, obtain the second feature vector that the second Search Results is corresponding.
Described is that each participle setting weighted value can pass through but be not limited to following methods to realize:
Participle setting weighted value for page title in the first Search Results:
After carrying out participle, first participle set is obtained to page title in the first Search Results, comprises in described first participle set and carry out all participles after participle for page title in the first Search Results.For the arbitrary participle in first participle set, determine word frequency (the Term Frequency of this participle, TF) and reverse document-frequency (Inverse Document Frequency, IDF), and using the weighted value of the product of this TF and IDF as this participle.Described TF is directly proportional to the participle frequency that each page title occurs in the first Search Results, and described IDF and participle to carry out in the database of keyword search the frequency of occurrences in each webpage at search engine and be inversely proportional to.
After for each participle setting weighted value, using the element of the weighted value of this participle as proper vector, the proper vector of each Search Results can be obtained.
Such as, if there be n participle in first participle set, the weighted value of each participle is respectively: W q11, W q12, W q13... .W q1n, then first eigenvector V (Q 1)=(W q11, W q12, W q13... .W q1n); If there be n participle in second point of set of words, the weighted value of each participle is respectively: W q21, W q22, W q23... .W q2n, then second feature vector V (Q 2)=(W q21, W q22, W q23... .W q2n).
Second step: described first eigenvector and second feature vector are carried out cosine similarity computing, obtains the similarity of page title in page title and the second Search Results in the first Search Results.
Suppose that first eigenvector and second feature vector are as shown in the first step, then can utilize formula below to carry out the calculating of cosine similarity:
cos ( V ( Q 1 ) , V ( Q 2 ) ) = W Q 11 × W Q 21 + W Q 12 × W Q 22 + . . . + W Q 1 n × W Q 2 n W 2 Q 11 + W 2 Q 12 + . . . W 2 Q 1 n × W 2 Q 21 + W 2 Q 22 + . . . + W 2 Q 2 n
If the cosine value obtained is more close to 1, then think that the similarity of the first Search Results and the second Search Results is higher.
When carrying out cosine similarity computing, for the participle only appeared in a point of set of words, can be set in another point of set of words, the weighted value of this participle is 0.
Such as: after the web page title supposing in the first Search Results of the first inquiry request comprising keyword " A " " B " " C " carries out participle, first participle set is obtained for { a b c d}; After the web page title comprised in the second Search Results of second inquiry request of keyword " A " " B " carries out participle, obtain second point of set of words for { a c d e}.
Wherein: participle a, c, d are the participles simultaneously appearing at first participle set and second point of set of words, and participle b only appears at first participle set, participle e only appears at second point of set of words.Therefore, the weighted value of participle b in second point of set of words can be set to 0, the weighted value of participle e in first participle set is set to 0.
Now, in first participle set, the weighted value of each participle is set as:
The weighted value of participle a is W q11, the weighted value of participle b is W q12, the weighted value of participle c is W q13, the weighted value of participle d is W q14, the weighted value of participle e is 0.
In second point of set of words, the weighted value of each participle is set as:
The weighted value of participle a is W q21, the weighted value of participle b is 0, and the weighted value of participle c is W q23, the weighted value of participle d is W q24, the weighted value of participle e is W q25.
First eigenvector V (the Q obtained 1)=(W q11, W q12, W q13, W q14, 0); Second feature vector V (Q 2)=(W q21, 0, W q23, W q24, W q25).After utilizing above-mentioned two proper vectors to carry out cosine similarity computing, the similarity of these two Search Results can be determined.
It should be noted that, in this step 103, carry out the similarity algorithm that Similarity Measure uses and be not limited to cosine similarity algorithm, also latent semantic analysis (Latent Semantic Analysis can be used, LSA) algorithm and latent semantic analysis (Probabilistic Latent Semantic Analysis, the PLSA) algorithm based on probability statistics.
Step 104: by least two keywords described in corresponding for described first Search Results, reaches remaining keyword except keyword corresponding to the second Search Results of setting threshold value except with the similarity that at least one is determined, as can default keyword.
After step 103, may determine and multiplely reach with the similarity of the first Search Results the second Search Results setting threshold value, then in this step 104, at least one second Search Results can be selected from multiple second Search Results determined to determine can default keyword, and concrete selection mode includes but not limited to following several:
1, reach from the similarity with the first Search Results and set multiple second Search Results of threshold value, select second Search Results arbitrarily.
2, setting in multiple second Search Results of threshold value for reaching from the similarity with the first Search Results, selecting at least one second Search Results that corresponding keyword is minimum.Such as: determine that 3 reach with the similarity of the first Search Results the second Search Results setting threshold value, the keyword of its correspondence is respectively:
The keyword of the second Search Results _ 1 correspondence is " A ", and the keyword of the second Search Results _ 2 correspondence is " B ", and the keyword of the second Search Results _ 3 correspondence is " A ", " B ".Then can select the second Search Results _ 1 or the second Search Results _ 2 in this step 104.
Suppose that the keyword that in this step 104, first Search Results is corresponding is " A ", " B ", " C ", one that selects reaches keyword corresponding to the second Search Results of setting threshold value as " A " with the similarity of the first Search Results, then determine that the keyword that the first Search Results is corresponding is that in " A ", " B ", " C ", " B " and " C " is keyword that can be default.
It should be noted that, what determine can default keyword be not isolated to exist, but is contained in same inquiry request relative to other and keyword after performing above-mentioned steps 101 to step 104 can be default.
Step 105: set up described at least two keywords in the first inquiry request and wherein can corresponding relation between default keyword.
It should be noted that, this step is preferred steps, in the scheme performing above-mentioned steps 101 to step 104, can will set up and store at least two keywords described in the first inquiry request and wherein can corresponding relation between default keyword, so that when carrying out data search, can directly according to the multiple keywords in this corresponding relation determination inquiry request whether exist can be default keyword.
The scheme of the present embodiment one can but be not limited to be applied in and have in point website of classification query function, under specifically can be applicable to following two kinds of situations:
The first situation:
To comprise in inquiry request and classification has nothing to do and context-free keyword carry out can the excavation of default keyword.
Such as: the query page supposing residing for user is the page of the portal website of certain shopping website, have input " western-style clothes ", " 2011 " these two keywords in the search box, in this specific scene of shopping, " 2011 " keyword seems nonsensical, and the scheme can applying the present embodiment one is carried out can the excavation of default keyword.
Second case:
When the page residing for inquiring about is a certain class page now, the keyword relevant with this classification comprised in inquiry request can be carried out can the excavation of default keyword.
Such as: the query page supposing residing for user is this class of mobile phone page now of certain shopping website, have input " Nokia ", " N8 ", " mobile phone " these three keywords in the search box, at this classification page of mobile phone, " mobile phone " this keyword is the information of redundancy, and the scheme can applying the present embodiment one is carried out can the excavation of default keyword.
Again such as: the query page supposing residing for user is this class of mobile phone page now of certain shopping website, have input " Apple iphone " these two keywords in the search box, at this classification page of mobile phone, when " iphone " this keyword occurs, " Apple " can be considered to keyword that can be default, and the scheme also can applying the present embodiment one is carried out can the excavation of default keyword.
Embodiment two
As shown in Figure 2, be data search method step schematic diagram a kind of in the embodiment of the present application two, said method comprising the steps of:
Step 201: when receiving the inquiry request comprising at least two keywords, whether at least two keywords described in judgement carried out according to the scheme in embodiment one can the excavation of default keyword, if so, then performs step 202; Otherwise, perform step 204.
In the scheme of this step 201, described at least two keywords in the first inquiry request that the scheme can inquiring about embodiment one is set up and wherein can corresponding relation between default keyword, and then determine whether the keyword in the inquiry request that is currently received carried out the dredge operation of default keyword.
Step 202: whether have at least two keywords comprised in the inquiry request received in determining step 201 can default keyword, if so, then performs step 203; Otherwise, perform step 204.
In the scheme of this step 202, described at least two keywords in the first inquiry request that the scheme still can inquiring about embodiment one is set up and wherein can corresponding relation between default keyword, so the keyword determining in the inquiry request that is currently received whether have can default keyword.
Step 203: can search for by the keyword except default keyword except described according in the inquiry request received in step 201, complete data search process.
Step 204: search for according to all keywords in the inquiry request received in step 201, complete data search process.
In the scheme of the present embodiment two, when search engine receives the inquiry request needing to carry out data search, whether the keyword utilizing the scheme of embodiment one to determine to comprise in current inquiry request has can default keyword, and determine wherein to have can default keyword time, with in current inquiry request except searching for for condition keyword except default keyword, decrease the quantity of keyword in search condition on the one hand, decrease search and take duration, improve search efficiency and decrease the occupancy of system resource; On the other hand, owing to decreasing the quantity of keyword in search condition, thus decrease the restriction to Search Results, therefore Search Results can meet the demand of user preferably.
Embodiment three
Based on the same idea with the embodiment of the present application one, the embodiment of the present application three propose a kind of can the excavating equipment of default keyword, as shown in Figure 3, comprise: the first Search Results determination module 11, second Search Results determination module 12, similarity computing module 13 and can default keyword determination module 14, wherein:
First Search Results determination module 11, the first Search Results that the first inquiry request for determining to comprise at least two keywords is corresponding.
Second Search Results determination module 12, for determining the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords.
Similarity computing module 13, for described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results.
Can default keyword determination module 14, for by least two keywords described in corresponding for described first Search Results, except the keyword reached with the similarity that at least one is determined except keyword corresponding to the second Search Results of setting threshold value, as can default keyword.
Preferably, described similarity computing module 13, specifically for page title in page title in the first Search Results and the second Search Results is carried out similarity computing, if the similarity of page title reaches setting threshold value, then determine that the similarity of this second Search Results and the first Search Results reaches setting threshold value.
Described similarity computing module 13 specifically comprises:
Proper vector determining unit 21, for carrying out participle to page title in the first Search Results, and be each participle setting weighted value, obtain the first eigenvector that the first Search Results is corresponding, and participle is carried out to page title in the second Search Results, and be each participle setting weighted value, obtain the second feature vector that the second Search Results is corresponding;
Similarity arithmetic element 22, for described first eigenvector and second feature vector are carried out cosine similarity computing, obtains the similarity of page title in page title and the second Search Results in the first Search Results.
Preferably, describedly can the excavating equipment of default keyword also to comprise:
Corresponding relation building module 15, at least two keywords described in setting up and wherein can corresponding relation between default keyword.
Embodiment four
Based on the same idea with the embodiment of the present application two, the embodiment of the present application four proposes a kind of data search device, as shown in Figure 4, comprising: can default keyword judge module 31 and search module 32, wherein:
Can default keyword judge module 31, for receiving the inquiry request comprising at least two keywords, and at least two keywords described in determining according in embodiment three can default keyword excavating equipment determine can default keyword time, whether have at least two keywords described in judgement can default keyword;
Search module 32, for judged result for have can default keyword time, can search for by the keyword except default keyword except described according in inquiry request.
Those skilled in the art should understand, the embodiment of the application can be provided as method, system or computer program.Therefore, the application can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the application can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The application describes with reference to according to the process flow diagram of the method for the embodiment of the present application, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although described the preferred embodiment of the application, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the application's scope.
Obviously, those skilled in the art can carry out various change and modification to the application and not depart from the spirit and scope of the application.Like this, if these amendments of the application and modification belong within the scope of the application's claim and equivalent technologies thereof, then the application is also intended to comprise these change and modification.

Claims (10)

1. can the method for digging of default keyword, it is characterized in that, comprising:
Determine to comprise the first Search Results that the first inquiry request of at least two keywords is corresponding;
Determine the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords;
Described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results;
By at least two keywords described in corresponding for described first Search Results, except other keywords reached with the similarity that at least one is determined except keyword corresponding to the second Search Results of setting threshold value, as can default keyword.
2. as claimed in claim 1 can the method for digging of default keyword, it is characterized in that, determine in the following manner to reach the second Search Results setting threshold value with the similarity of the first Search Results:
Page title in page title in first Search Results and the second Search Results is carried out similarity computing, if the similarity of page title reaches setting threshold value, then determines that the similarity of this second Search Results and the first Search Results reaches setting threshold value.
3. as claimed in claim 2 can the method for digging of default keyword, it is characterized in that, in the following manner page title in page title in the first Search Results and the second Search Results is carried out similarity computing:
Participle is carried out to page title in the first Search Results, and be each participle setting weighted value, obtain the first eigenvector that the first Search Results is corresponding, and participle is carried out to page title in the second Search Results, and be each participle setting weighted value, obtain the second feature vector that the second Search Results is corresponding;
Described first eigenvector and second feature vector are carried out cosine similarity computing, obtains the similarity of page title in page title and the second Search Results in the first Search Results.
4. as claimed in claim 1 can default keyword method for digging, it is characterized in that, at least two keywords described in determining can after default keyword, described method also comprises:
At least two keywords described in foundation and wherein can corresponding relation between default keyword.
5. a data search method, is characterized in that, comprising:
Receiving the inquiry request comprising at least two keywords, and at least two keywords described in determining according to claim 1 to claim 4 either method determine can default keyword time, whether have at least two keywords described in judgement can default keyword;
Judged result for have can default keyword time, can search for by the keyword except default keyword except described according in inquiry request.
6. can the excavating equipment of default keyword, it is characterized in that, comprising:
First Search Results determination module, the first Search Results that the first inquiry request for determining to comprise at least two keywords is corresponding;
Second Search Results determination module, for determining the second Search Results that at least one second inquiry request is corresponding respectively, the keyword in described second inquiry request for described in the subset of at least two keywords;
Similarity computing module, for described first Search Results is carried out similarity computing with each second Search Results respectively, determines to reach the second Search Results setting threshold value with the similarity of the first Search Results;
Can default keyword determination module, for by least two keywords described in corresponding for described first Search Results, except other keywords reached with the similarity that at least one is determined except keyword corresponding to the second Search Results of setting threshold value, as can default keyword.
7. as claimed in claim 6 can the excavating equipment of default keyword, it is characterized in that, described similarity computing module, specifically for page title in page title in the first Search Results and the second Search Results is carried out similarity computing, if the similarity of page title reaches setting threshold value, then determine that the similarity of this second Search Results and the first Search Results reaches setting threshold value.
8. as claimed in claim 7 can the excavating equipment of default keyword, it is characterized in that, described similarity computing module specifically comprises:
Proper vector determining unit, for carrying out participle to page title in the first Search Results, and be each participle setting weighted value, obtain the first eigenvector that the first Search Results is corresponding, and participle is carried out to page title in the second Search Results, and be each participle setting weighted value, obtain the second feature vector that the second Search Results is corresponding;
Similarity arithmetic element, for described first eigenvector and second feature vector are carried out cosine similarity computing, obtains the similarity of page title in page title and the second Search Results in the first Search Results.
9. as claimed in claim 6 can default keyword excavating equipment, it is characterized in that, described equipment also comprises:
Corresponding relation building module, at least two keywords described in setting up and wherein can corresponding relation between default keyword.
10. a data search device, is characterized in that, comprising:
Can default keyword judge module, for receiving the inquiry request comprising at least two keywords, and at least two keywords described in determining according to claim 6 to the arbitrary equipment of claim 9 determine can default keyword time, whether have at least two keywords described in judgement can default keyword;
Search module, for judged result for have can default keyword time, can search for by the keyword except default keyword except described according in inquiry request.
CN201110365011.6A 2011-11-17 2011-11-17 A kind of can the method for digging of default keyword, data search method and equipment Active CN103116587B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110365011.6A CN103116587B (en) 2011-11-17 2011-11-17 A kind of can the method for digging of default keyword, data search method and equipment
HK13108380.6A HK1181152A1 (en) 2011-11-17 2013-07-17 Method for excavating a dispensable keyword, method and device for searching data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110365011.6A CN103116587B (en) 2011-11-17 2011-11-17 A kind of can the method for digging of default keyword, data search method and equipment

Publications (2)

Publication Number Publication Date
CN103116587A CN103116587A (en) 2013-05-22
CN103116587B true CN103116587B (en) 2015-09-09

Family

ID=48414964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110365011.6A Active CN103116587B (en) 2011-11-17 2011-11-17 A kind of can the method for digging of default keyword, data search method and equipment

Country Status (2)

Country Link
CN (1) CN103116587B (en)
HK (1) HK1181152A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224555B (en) * 2014-06-12 2019-12-10 北京搜狗科技发展有限公司 Searching method, device and system
CN104166712B (en) * 2014-08-13 2018-01-30 东北电力大学 Indexing of Scien. and Tech. Literature method and system
CN106445973B (en) * 2015-08-12 2019-08-09 阿里巴巴集团控股有限公司 The monitoring method and device of search engine
CN107545035A (en) * 2017-07-25 2018-01-05 无锡天脉聚源传媒科技有限公司 A kind of information search method and device
CN111368100A (en) * 2020-02-28 2020-07-03 青岛聚看云科技有限公司 Media asset merging method and device thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4222811B2 (en) * 2002-10-30 2009-02-12 株式会社リコー Keyword extracting apparatus, program, and recording medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055188A1 (en) * 2009-08-31 2011-03-03 Seaton Gras Construction of boolean search strings for semantic search

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4222811B2 (en) * 2002-10-30 2009-02-12 株式会社リコー Keyword extracting apparatus, program, and recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《中文科技期刊数据库文献分类与检索》;吕月娥;《临沂师范学院学报》;20081231;全文 *
《中英文新闻网页关键词抽qui技术研究》;李星华;《万方学位论文数据库》;20091019;全文 *

Also Published As

Publication number Publication date
CN103116587A (en) 2013-05-22
HK1181152A1 (en) 2013-11-01

Similar Documents

Publication Publication Date Title
US10055762B2 (en) Deep application crawling
CN103116587B (en) A kind of can the method for digging of default keyword, data search method and equipment
CN109299383B (en) Method and device for generating recommended word, electronic equipment and storage medium
CN103870505A (en) Query term recommending method and query term recommending system
CN101996195A (en) Searching method and device of voice information in audio files and equipment
Konwar et al. Continuity and Banach contraction principle in intuitionistic fuzzy n-normed linear spaces
CN105022801A (en) Hot video mining method and hot video mining device
US20110208715A1 (en) Automatically mining intents of a group of queries
CN103123632B (en) Search center word defining method and device, searching method and search equipment
KR20120042307A (en) System and method for recommending locality-based key word
CN102193929A (en) Method and equipment for determining word information entropy and searching by using word information entropy
CN112818226B (en) Data processing method, recommendation device, electronic equipment and storage medium
CN103136342A (en) Searching method, system and searching server of application programs (APP)
CN104424302A (en) Method and device for matching homogeneous data objects
KR102089348B1 (en) Search engine system and method based on distributed data storing apparatus search method thereof
CN103207881A (en) Query method and unit
CN104915860A (en) Commodity recommendation method and device
CN103853769A (en) Method and device for processing map query request
CN104142945A (en) Search method and device based on search term
Safar et al. Optimized skyline queries on road networks using nearest neighbors
CN104123285A (en) Navigation method and device for search results
CN104484413A (en) Method and device for obtaining searching results
CN110287444A (en) Website detection method, device and storage medium
CN106897198B (en) Log data processing method and device
US20130151517A1 (en) File search apparatus and method using tag graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1181152

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1181152

Country of ref document: HK