CN105528430B - A kind of method and apparatus of the weight of determining search terms - Google Patents

A kind of method and apparatus of the weight of determining search terms Download PDF

Info

Publication number
CN105528430B
CN105528430B CN201510917486.XA CN201510917486A CN105528430B CN 105528430 B CN105528430 B CN 105528430B CN 201510917486 A CN201510917486 A CN 201510917486A CN 105528430 B CN105528430 B CN 105528430B
Authority
CN
China
Prior art keywords
search
value
key
search terms
terms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510917486.XA
Other languages
Chinese (zh)
Other versions
CN105528430A (en
Inventor
陈进平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510917486.XA priority Critical patent/CN105528430B/en
Publication of CN105528430A publication Critical patent/CN105528430A/en
Application granted granted Critical
Publication of CN105528430B publication Critical patent/CN105528430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and apparatus of the weight of determining search terms.This method comprises: obtaining the set of search data pair;Wherein described search data are to including: search term and corresponding search result content;According to the set of search data pair, the probability that each search terms for including in each search term occur in search result content is determined;According to the probability that each search terms occur in search result content, the weight of each search terms is determined.According to the technical solution of the present invention, it can fully consider the importance for appearing in each search terms content in search result, the probability that each search terms that search data centering segment and segment include occur in search result is excavated on a large scale, and according to the probability excavated, the weight of each search terms is determined.

Description

A kind of method and apparatus of the weight of determining search terms
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of method and apparatus of the weight of determining search terms.
Background technique
It is more and more common by web search data with the development of computer networking technology, and with network information How according to the demand of user more and more huger, the data that user may search for are also more and more, be in the data of magnanimity User provides most accurately information, improves search efficiency, becoming major search engine will solve the problems, such as.
In the prior art, search result is provided according to the weight of search terms (term) each in search term, in magnanimity Data in for user provide most accurately search result information.But how the weight of each search terms in search term calculates The problem of accurate search result is urgent need to resolve can be provided
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind The method and apparatus for stating the weight of the determination search terms of problem.
According to one aspect of the present invention, a kind of method of the weight of determining search terms is provided, this method comprises:
Obtain the set of search data pair;Wherein described search data are to including: in search term and corresponding search result Hold;
According to the set of search data pair, determine that each search terms for including in each search term occur in search result content Probability;
According to the probability that each search terms occur in search result content, the weight of each search terms is determined.
Optionally, the set according to search data pair determines that each search terms for including in each search term are tied in search The probability occurred in fruit content includes:
For search data to each search data pair in set, determine that the search term from each search data pair can obtain Each continuous segment;
Using segment as key, and each search terms for including with the segment are in the corresponding search result of search term where the segment The case where whether occurring in content is value, exports key-value pair;
In the key-value pair set of output, by the value in the identical each key-value pair of stat key, respectively searching in the key is obtained The probability that rope item occurs in search result content.
Optionally, the set for obtaining search data pair includes:
Search data are obtained from search engine click logs to gather composition.
Optionally, each search terms for including with the segment are in the corresponding search result of search term where the segment The case where whether occurring in appearance is that value includes:
Determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and indicate corresponding with two kinds of possible values of every bit Whether search terms occur in corresponding search result content.
Optionally, the value by the identical each key-value pair of stat key, each search terms obtained in the key are being searched for The probability occurred in resultant content includes:
For each search terms in the identical key, the search terms are counted in the value of the identical each key-value pair of the key The number occurred in search result content is shown as, the first numerical value is denoted as;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content Rate.
Optionally, each search terms for including with the segment are in the corresponding search result of search term where the segment The case where whether occurring in appearance comprises determining that the search item number N, N that include in the segment are natural number for value;With the two of N into Number processed is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search result content Occur, indicates do not occur when value 0;
The value by the identical each key-value pair of stat key, obtains each search terms in the key in search result content The probability of middle appearance includes: to count the search terms in the identical each key of the key for each search terms in the identical key Value is 1 number in the value of value pair, is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content Rate.
Optionally, described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Optionally, this method further comprises:
Each search terms and corresponding weight are saved in weight database;
It is multiple search terms by the search word segmentation when receiving search term;
The corresponding weight of multiple search terms is obtained from the weight database;
It scans for handling according to the corresponding weight of multiple search terms.
According to another aspect of the invention, a kind of device of the weight of determining search terms is provided, wherein the device packet It includes:
Data capture unit, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term and right The search result content answered;
Probability determining unit determines that each search terms for including in each search term exist suitable for the set according to search data pair The probability occurred in search result content;
Weight determining unit determines each search terms suitable for the probability occurred in search result content according to each search terms Weight.
Optionally, the probability determining unit further comprises:
Key-value pair output unit, suitable for, to each search data pair in set, being determined from each search number for search data According to pair each continuous segment that can obtain of search term;Using segment as key, and each search terms for including with the segment are in the piece The case where whether occurring in the corresponding search result content of search term where section is value, exports key-value pair;
Statistic unit, suitable for by the value in the identical each key-value pair of stat key, obtaining in the key-value pair set of output The probability that each search terms in the key occur in search result content.
Optionally, the data capture unit is suitable for obtaining search data from search engine click logs to composition collection It closes.
Optionally, the key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number; Using N binary numbers as the value, and indicate that corresponding search terms exist with two kinds of possible values of every bit Whether occur in corresponding search result content.
Optionally, the statistic unit, suitable for counting the search terms in institute for each search terms in the identical key It states in the value of the identical each key-value pair of key and shows as the number occurred in search result content, be denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;According to first numerical value and the second number The ratio of value determines the probability that the search terms occur in search result content.
Optionally, the key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number; Using N binary numbers as the value, and with every bit value 1 when indicates that corresponding search terms are searched corresponding Rope resultant content occurs, and indicates do not occur when value 0;
The statistic unit, suitable for counting the search terms in the key phase for each search terms in the identical key Value is 1 number in the value of same each key-value pair, is denoted as the first numerical value;The number of the identical each key-value pair of the key is counted, is remembered For second value;According to the ratio of first numerical value and second value, determine that the search terms occur in search result content Probability.
Optionally, described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Optionally,
The weight determining unit is further adapted for for each search terms and corresponding weight being saved in weight database;
The device further comprises:
Storage unit is suitable for storing the weight database;
Search processing, suitable for being multiple search terms by the search word segmentation when receiving search term;From the power The corresponding weight of multiple search terms is obtained in weight database;It is searched according to the corresponding weight of multiple search terms Rope processing.
The set for obtaining search data pair according to the technique and scheme of the present invention determines each according to the set of search data pair The probability that each search terms for including in search term occur in search result content, according to each search terms in search result content The probability of appearance determines the weight of each search terms.According to the technical solution of the present invention, it can fully consider and appear in search result In each search terms content importance, excavate segment and segment each search terms for including in search data on a large scale and searching for As a result the probability occurred in, and according to the probability excavated, determine the weight of each search terms.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the method for the weight of determining search terms according to an embodiment of the invention;
Fig. 2 shows a kind of schematic diagrames of the device of the weight of determining search terms according to an embodiment of the invention;
Fig. 3 shows a kind of determine the probability of the device of the weight of determining search terms in accordance with another embodiment of the present invention Cell schematics;
Fig. 4 shows a kind of schematic diagram of the device of the weight of determining search terms in accordance with another embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Fig. 1 shows a kind of method flow diagram of the weight of determining search terms according to an embodiment of the invention.Such as Fig. 1 It is shown, this method comprises:
Step S110 obtains the set of search data pair;Wherein search data to include: search term and corresponding search knot Fruit content.
It include one or more search data pair in the set of the search data pair got.
Step S120 determines each search terms for including in each search term in search result according to the set of search data pair The probability occurred in content.
Step S130 determines the weight of each search terms according to the probability that each search terms occur in search result content.
Each search terms in a kind of search term are given in method shown in FIG. 1 in the probability of occurrence in search result Method for Accurate Calculation may further determine that the weight of each search terms according to the obtained probability of this method, thus according to respectively searching The weight of rope item provides search result, substantially increases the accuracy of search engine.
In one embodiment of the invention, the step S120 in method shown in Fig. 1 is according to the set for searching for data pair, really The probability that each search terms for including in fixed each search term occur in search result content includes:
Step S121 determines the search from each search data pair for search data to each search data pair in set Each continuous segment that word can obtain.
Step S122, using segment as key, and each search terms for including with the segment are corresponding in the search term where the segment Search result content in the case where whether occurring be value, output key-value pair.
Step S123, by the value in the identical each key-value pair of stat key, obtains the key in the key-value pair set of output In the probability that occurs in search result content of each search terms.
In one embodiment of the invention, the step S110 in method shown in Fig. 1 obtains the set packet of search data pair It includes: obtaining search data from search engine click logs and composition is gathered.
It obtains search data from search engine click logs to gather composition, the data volume that can be utilized is big, and data obtain It tries to please easily, and since it considers user's request and clicks the correlation of result, more meets user demand, the degree of correlation is high.
Here, each pair of search data centering obtained from search engine click logs, search term are what user inputted Search query word, search result content are the search results that user finally clicks.As it can be seen that from search engine click logs It obtains search data and expectation of the family to search result is shared to composition set symbol.
In one embodiment of the invention, each search terms for including with the segment described in step S122 are in the segment institute The corresponding search result content of search term in whether occur the case where for value include:
Step S1221 determines that the search item number N, N that include in the segment are natural number.
Step S1222, using N binary numbers as the value, and with two kinds of possible values of every bit Indicate whether corresponding search terms occur in corresponding search result content.
For example, segment includes three search terms, then use three binary numbers as value, and 1 expression is corresponding searches Rope item occurs in the search result of the search data pair, and 0 indicates do not occur.Then value 110 indicates that first in the segment searches Rope item and the second search terms occur in the search result of the search data pair, search of the third search terms in the search data pair As a result do not occur in.
In one embodiment of the invention, pass through the value in the identical each key-value pair of stat key described in step S123, Obtaining the probability that each search terms in the key occur in search result content includes:
Step S1231 counts the search terms in the identical each key-value pair of key for each search terms in the identical key Value in show as the number occurred in search result content, be denoted as the first numerical value.
Step S1232, the number of the identical each key-value pair of stat key, is denoted as second value.
Step S1233 determines that the search terms go out in search result content according to the ratio of the first numerical value and second value Existing probability.
By taking segment " ABC " as an example, there is following key-value pair: [ABC, 111] [ABC, 100] [ABC, 011] [ABC, 101] [ABC, 110].
Probability=4/5=0.8 that A search terms occur in search result;
Probability=3/5=0.6 that B search terms occur in search result;
Probability=3/5=0.6 that C search terms occur in search result.
In one embodiment of the invention, each search terms for including with the segment described in step S122 are in the segment institute The corresponding search result content of search term in whether occur the case where for value include:
Step S1221 ' determines that the search item number N, N that include in the segment are natural number.
Step S1222 ', using N binary numbers as the value, and with every bit value 1 when indicates pair The search terms answered occur in corresponding search result content, indicate do not occur when value 0.
Then by the value in the identical each key-value pair of stat key described in step S123, each search terms obtained in the key exist The probability occurred in search result content includes:
Step S1231 ' counts the search terms in the identical each key assignments of key for each search terms in the identical key Pair value in value be 1 number, be denoted as the first numerical value.
Step S1232 ', the number of the identical each key-value pair of stat key, is denoted as second value.
Step S1233 ' determines the search terms in search result content according to the ratio of the first numerical value and second value The probability of appearance.
In one embodiment of the invention, search result content is any one in following in method shown in Fig. 1;User The title of the search results pages of click;The abstract for the search results pages that user clicks;The whole for the search results pages that user clicks Content.
For example, the set of data pair is searched in acquisition from search engine click logs, it is " ABCDE " with search term, search The title content of result page is that a pair of of " FGACDHJ " searches for for data pair.It is obtainable continuous from search term " ABCDE " Segment includes:
1. including the segment of 1 search terms: A, B, C, D, E.
2. including the segment of 2 search terms: AB, BC, CD, DE.
3. including the segment of 3 search terms: ABC, BCD, CDE.
4. including the segment of 4 search terms: ABCD, BCDE.
5. including the segment of 5 search terms: ABCDE.
Using each segment as key, and each search terms for including with the segment are in the search result content of the search data pair The case where whether occurring is value, exports the key-value pair about the search data pair.In this example, the search that occurs in search result Item is A, C and D, it is determined that the corresponding value of A, C and D is 1, remaining corresponding value of each search terms content not occurred is 0.To this Search term and corresponding search result content are handled, and following key-value pair can be exported:
1. including the segment of 1 search terms: A:1, B:0, C:1, D:1, E:0.
2. including the segment of 2 search terms: AB:10, BC:01, CD:11, DE:10.
3. including the segment of 3 search terms: ABC:101, BCD:011, CDE:110.
4. including the segment of 4 search terms: ABCD:1011, BCDE:0110.
5. including the segment of 5 search terms: ABCDE:10110.
To every a pair of of search data to treatment process as above is all carried out, all search of the set of search data pair have been handled After word and corresponding search result content, key-value pair set is obtained.In key-value pair set in each key-value pair identical to key Value is counted, and for each search terms in the identical key, counts the search terms in the value of the identical each key-value pair of key The number that value is 1, is denoted as the first numerical value;The number of the identical each key-value pair of stat key, is denoted as second value;According to the first number The ratio of value and second value, determines the probability that the search terms occur in search result content.By taking A, B and C as an example, it is assumed that In each key-value pair of same keys " ABC ", the probability that each search terms occur in search result content is counted, it is available similar to such as Lower data:
ABC:0.7,0.3,0.9.
The data are expressed as follows meaning: all search data centerings comprising segment " ABC ", wrap in the search result of click Probability containing A is 0.7, and the probability comprising B is 0.3, and the probability comprising C is 0.9.It can be considered that the important ratio of A and C compared with Height, and the important ratio of B is lower.
In one embodiment of the invention, each search terms and corresponding weight are saved in weight database, then existed On the basis of above-mentioned, this method further comprises:
The search word segmentation is multiple search terms when receiving search term by step S140.
Step S150 obtains the corresponding weight of multiple search terms from weight database.
Step S160 scans for handling according to the corresponding weight of multiple search terms.
Using the weight of each search terms in obtained probability calculation segment, because probability calculation process considers search result In each search terms content importance, so the weight more meets user demand, accuracy is high, and gained weight is saved in power In weight database, during on-line search, scans for handling using the weight, the search matter of search engine can be effectively improved Amount.
Fig. 2 shows a kind of schematic device of the weight of determining search terms according to an embodiment of the invention, such as Fig. 2 Shown, the device 200 of the weight of the determination search terms includes:
Data capture unit 210, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term With corresponding search result content.
Probability determining unit 220 determines each search terms for including in each search term suitable for the set according to search data pair The probability occurred in search result content.
Weight determining unit 230 determines each search suitable for the probability occurred in search result content according to each search terms The weight of item.
Fig. 3 shows a kind of determine the probability of the device of the weight of determining search terms in accordance with another embodiment of the present invention Cell schematics, as shown in figure 3, probability determining unit 220 further comprises:
Key-value pair output unit 221 and statistic unit 222.
Key-value pair output unit 221, suitable for, to each search data pair in set, being determined from each search for search data Each continuous segment that the search term of data pair can obtain;Using segment as key, and each search terms for including with the segment are at this The case where whether occurring in the corresponding search result content of search term where segment is value, exports key-value pair.
Statistic unit 222, suitable in the key-value pair set of output, by the value in the identical each key-value pair of stat key, Obtain the probability that each search terms in the key occur in search result content.
In one embodiment of the invention, data capture unit 210 are searched suitable for obtaining from search engine click logs Rope data set.
In one embodiment of the invention, key-value pair output unit 221 is adapted to determine that the search terms for including in the segment Number N, N is natural number;Using N binary numbers as the value, and indicated with two kinds of possible values of every bit Whether corresponding search terms occur in corresponding search result content.
In one embodiment of the invention, statistic unit 222, suitable for for each search terms in the identical key, It counts the search terms and shows as the number occurred in search result content in the value of the identical each key-value pair of key, be denoted as first Numerical value;The number of the identical each key-value pair of stat key, is denoted as second value;According to the ratio of first numerical value and second value Value, determines the probability that the search terms occur in search result content.
For example, key-value pair output unit 221, is adapted to determine that the search item number N, N that include in the segment are natural number;With N The binary number of position is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search Resultant content occurs, and indicates do not occur when value 0.
Correspondingly, statistic unit 222, suitable for counting the search terms in key for each search terms in the identical key The number that value is 1 in the value of identical each key-value pair, is denoted as the first numerical value;The number of the identical each key-value pair of stat key, note For second value;According to the ratio of the first numerical value and second value, determine that the search terms occur general in search result content Rate.
In one embodiment of the invention, search result content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Fig. 4 shows a kind of schematic device of the weight of determining search terms in accordance with another embodiment of the present invention, such as Shown in Fig. 4, the device 300 of the weight of the determination search terms includes: data capture unit 310, probability determining unit 320, weight Determination unit 330, storage unit 340 and search processing 350.
Wherein, data capture unit 310, probability determining unit 320 and weight determining unit 330 and data described above Acquiring unit 210, probability determining unit 220 and the correspondence of weight determining unit 230 are identical, and details are not described herein.
In the present embodiment, the weight determining unit 330 is further adapted for saving each search terms and corresponding weight Into weight database;
Storage unit 340 is suitable for storing the weight database;
Search processing 350, suitable for being multiple search terms by the search word segmentation when receiving search term;From institute It states and obtains the corresponding weight of multiple search terms in weight database;According to the corresponding weight of multiple search terms into Row search process.
It should be noted that each embodiment pair of method shown in each embodiment of Fig. 2 to Fig. 4 shown device and figure 1 above It answers identical, has been described in detail above, details are not described herein.
In conclusion the set of search data pair is obtained according to the technique and scheme of the present invention, according to the collection of search data pair It closes, determines the probability that each search terms for including in each search term occur in search result content, searched for according to each search terms The probability occurred in resultant content, determines the weight of each search terms, and is saved in weight database.The present invention is from search engine Search data are obtained in click logs to set, the data volume that can be utilized is big, and data acquisition is easy, and since it is considered User's request and the correlation for clicking result, more meet user demand, and the degree of correlation is high.According to the technical solution of the present invention, may be used To fully consider the importance for appearing in each search terms content in search result, search data centering segment is excavated on a large scale And the probability that segment each search terms for including occur in search result, meanwhile, using respectively being searched in obtained probability calculation segment The weight of rope item, resulting weight more meet user demand, and accuracy is high, and gained weight is saved in weight database, It during on-line search, scans for handling using the weight, the search quality of search engine can be effectively improved.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein. Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the device of the weight of determining search terms according to an embodiment of the present invention In some or all components some or all functions.The present invention is also implemented as described herein for executing Some or all device or device programs (for example, computer program and computer program product) of method.In this way Realization program of the invention can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (14)

1. a kind of method of the weight of determining search terms, wherein this method comprises:
Obtain the set of search data pair;Wherein described search data are to including: search term and corresponding search result content;
According to the set of search data pair, it is general to determine that each search terms for including in each search term occur in search result content Rate;
According to the probability that each search terms occur in search result content, the weight of each search terms is determined;
Wherein, the set according to search data pair, determines each search terms for including in each search term in search result content The probability of middle appearance includes:
For search data to each search data pair in set, it is each to determine that the search term from each search data pair can obtain Continuous segment;
Using segment as key, and each search terms for including with the segment are in the corresponding search result content of search term where the segment In the case where whether occurring be value, output key-value pair;
In the key-value pair set of output, by the value in the identical each key-value pair of stat key, each search terms in the key are obtained The probability occurred in search result content.
2. the method for claim 1, wherein the set for obtaining search data pair includes:
Search data are obtained from search engine click logs to gather composition.
3. the method for claim 1, wherein each search terms for including with the segment are in the search where the segment The case where whether occurring in the corresponding search result content of word is that value includes:
Determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and corresponding search is indicated with two kinds of possible values of every bit Whether item occurs in corresponding search result content.
4. the method for claim 1, wherein value by the identical each key-value pair of stat key, obtains the key In the probability that occurs in search result content of each search terms include:
For each search terms in the identical key, counts the search terms and showed in the value of the identical each key-value pair of the key For the number occurred in search result content, it is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, the probability that the search terms occur in search result content is determined.
5. the method for claim 1, wherein
Whether each search terms for including with the segment go out in the corresponding search result content of search term where the segment Existing situation is that value comprises determining that the search item number N, N that include are natural number in the segment;Using N binary numbers as institute The value stated, and with every bit value 1 when indicates that corresponding search terms occur in corresponding search result content, value 0 When indicate do not occur;
The value by the identical each key-value pair of stat key obtains each search terms in the key and goes out in search result content Existing probability includes: to count the search terms in the identical each key-value pair of the key for each search terms in the identical key Value in value be 1 number, be denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, the probability that the search terms occur in search result content is determined.
6. the method for claim 1, wherein described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
7. this method further comprises such as method of any of claims 1-6:
Each search terms and corresponding weight are saved in weight database;
It is multiple search terms by the search word segmentation when receiving search term;
The corresponding weight of multiple search terms is obtained from the weight database;
It scans for handling according to the corresponding weight of multiple search terms.
8. a kind of device of the weight of determining search terms, wherein the device includes:
Data capture unit, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term and corresponding Search result content;
Probability determining unit determines that each search terms for including in each search term are being searched for suitable for the set according to search data pair The probability occurred in resultant content;
Weight determining unit determines the power of each search terms suitable for the probability occurred in search result content according to each search terms Weight;
Wherein, the probability determining unit further comprises:
Key-value pair output unit, suitable for, to each search data pair in set, being determined from each search data pair for search data Each continuous segment that can obtain of search term;Using segment as key, and each search terms for including with the segment are in the segment institute The corresponding search result content of search term in whether occur the case where for value, output key-value pair;
Statistic unit, suitable for by the value in the identical each key-value pair of stat key, obtaining the key in the key-value pair set of output In the probability that occurs in search result content of each search terms.
9. device as claimed in claim 8, wherein
The data capture unit is suitable for obtaining search data from search engine click logs and gather composition.
10. device as claimed in claim 8, wherein
The key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;With the two of N into Number processed indicates that corresponding search terms are tied in corresponding search as the value, and with two kinds of possible values of every bit Whether occur in fruit content.
11. device as claimed in claim 8, wherein
The statistic unit, suitable for it is identical in the key to count the search terms for each search terms in the identical key The number occurred in search result content is shown as in the value of each key-value pair, is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;According to first numerical value and second value Ratio determines the probability that the search terms occur in search result content.
12. device as claimed in claim 8, wherein
The key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;With the two of N into Number processed is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search result content Occur, indicates do not occur when value 0;
The statistic unit, suitable for it is identical in the key to count the search terms for each search terms in the identical key Value is 1 number in the value of each key-value pair, is denoted as the first numerical value;The number for counting the identical each key-value pair of the key is denoted as Two numerical value;According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content Rate.
13. device as claimed in claim 8, wherein described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
14. such as the described in any item devices of claim 8-13, wherein
The weight determining unit is further adapted for for each search terms and corresponding weight being saved in weight database;
The device further comprises:
Storage unit is suitable for storing the weight database;
Search processing, suitable for being multiple search terms by the search word segmentation when receiving search term;From the weight number According to obtaining the corresponding weight of multiple search terms in library;It scans for locating according to the corresponding weight of multiple search terms Reason.
CN201510917486.XA 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms Active CN105528430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510917486.XA CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510917486.XA CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Publications (2)

Publication Number Publication Date
CN105528430A CN105528430A (en) 2016-04-27
CN105528430B true CN105528430B (en) 2019-05-31

Family

ID=55770653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510917486.XA Active CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Country Status (1)

Country Link
CN (1) CN105528430B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933714B (en) * 2019-03-18 2021-04-20 北京搜狗科技发展有限公司 Entry weight calculation method, entry weight search method and related device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
CN102193932A (en) * 2010-03-09 2011-09-21 北京金山软件有限公司 Method and system for determining search term
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103150362A (en) * 2013-02-28 2013-06-12 北京奇虎科技有限公司 Video search method and system
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN105095381A (en) * 2015-06-30 2015-11-25 北京奇虎科技有限公司 Method and device for new word identification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
CN102193932A (en) * 2010-03-09 2011-09-21 北京金山软件有限公司 Method and system for determining search term
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103150362A (en) * 2013-02-28 2013-06-12 北京奇虎科技有限公司 Video search method and system
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value
CN105095381A (en) * 2015-06-30 2015-11-25 北京奇虎科技有限公司 Method and device for new word identification

Also Published As

Publication number Publication date
CN105528430A (en) 2016-04-27

Similar Documents

Publication Publication Date Title
CN105956161B (en) A kind of information recommendation method and device
US20140108418A1 (en) Searching code by specifying its behavior
CN107861981A (en) A kind of data processing method and device
CN103150362B (en) A kind of video searching method and system
CN109918594B (en) Information display method and device
CN105095381B (en) New word identification method and device
CN103617213B (en) Method and system for identifying newspage attributive characters
EP2631815A1 (en) Method and device for ordering search results, method and device for providing information
CN104361115A (en) Entry weight definition method and device based on co-clicking
WO2014201833A1 (en) Method and device for processing data
CN106776559A (en) The method and device of text semantic Similarity Measure
CN104376115A (en) Fuzzy word determining method and device based on global search
CN103870607A (en) Sequencing method and device of search results of multiple search engines
CN103942264A (en) Method and device for pushing webpages containing news information
CN107341181A (en) Method, apparatus, computer-readable recording medium and computer equipment are recommended in search
CN105786910B (en) Entry weighing computation method and device
CN103870563B (en) It is determined that the method and apparatus of the theme distribution of given text
CN106919576A (en) Using the method and device of two grades of classes keywords database search for application now
CN105528430B (en) A kind of method and apparatus of the weight of determining search terms
CN105488209B (en) A kind of analysis method and device of word weight
CN106557483A (en) A kind of data processing, data query method and apparatus
CN109522275A (en) Label method for digging, electronic equipment and the storage medium of content are produced based on user
CN108647227A (en) A kind of recommendation method and device
WO2018205391A1 (en) Method, system and apparatus for evaluating accuracy of information retrieval, and computer-readable storage medium
CN111724143A (en) RPA-based flow element positioning method and device, computing equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.