CN105528430B - A kind of method and apparatus of the weight of determining search terms - Google Patents
A kind of method and apparatus of the weight of determining search terms Download PDFInfo
- Publication number
- CN105528430B CN105528430B CN201510917486.XA CN201510917486A CN105528430B CN 105528430 B CN105528430 B CN 105528430B CN 201510917486 A CN201510917486 A CN 201510917486A CN 105528430 B CN105528430 B CN 105528430B
- Authority
- CN
- China
- Prior art keywords
- search
- value
- key
- search terms
- terms
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and apparatus of the weight of determining search terms.This method comprises: obtaining the set of search data pair;Wherein described search data are to including: search term and corresponding search result content;According to the set of search data pair, the probability that each search terms for including in each search term occur in search result content is determined;According to the probability that each search terms occur in search result content, the weight of each search terms is determined.According to the technical solution of the present invention, it can fully consider the importance for appearing in each search terms content in search result, the probability that each search terms that search data centering segment and segment include occur in search result is excavated on a large scale, and according to the probability excavated, the weight of each search terms is determined.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of method and apparatus of the weight of determining search terms.
Background technique
It is more and more common by web search data with the development of computer networking technology, and with network information
How according to the demand of user more and more huger, the data that user may search for are also more and more, be in the data of magnanimity
User provides most accurately information, improves search efficiency, becoming major search engine will solve the problems, such as.
In the prior art, search result is provided according to the weight of search terms (term) each in search term, in magnanimity
Data in for user provide most accurately search result information.But how the weight of each search terms in search term calculates
The problem of accurate search result is urgent need to resolve can be provided
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind
The method and apparatus for stating the weight of the determination search terms of problem.
According to one aspect of the present invention, a kind of method of the weight of determining search terms is provided, this method comprises:
Obtain the set of search data pair;Wherein described search data are to including: in search term and corresponding search result
Hold;
According to the set of search data pair, determine that each search terms for including in each search term occur in search result content
Probability;
According to the probability that each search terms occur in search result content, the weight of each search terms is determined.
Optionally, the set according to search data pair determines that each search terms for including in each search term are tied in search
The probability occurred in fruit content includes:
For search data to each search data pair in set, determine that the search term from each search data pair can obtain
Each continuous segment;
Using segment as key, and each search terms for including with the segment are in the corresponding search result of search term where the segment
The case where whether occurring in content is value, exports key-value pair;
In the key-value pair set of output, by the value in the identical each key-value pair of stat key, respectively searching in the key is obtained
The probability that rope item occurs in search result content.
Optionally, the set for obtaining search data pair includes:
Search data are obtained from search engine click logs to gather composition.
Optionally, each search terms for including with the segment are in the corresponding search result of search term where the segment
The case where whether occurring in appearance is that value includes:
Determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and indicate corresponding with two kinds of possible values of every bit
Whether search terms occur in corresponding search result content.
Optionally, the value by the identical each key-value pair of stat key, each search terms obtained in the key are being searched for
The probability occurred in resultant content includes:
For each search terms in the identical key, the search terms are counted in the value of the identical each key-value pair of the key
The number occurred in search result content is shown as, the first numerical value is denoted as;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content
Rate.
Optionally, each search terms for including with the segment are in the corresponding search result of search term where the segment
The case where whether occurring in appearance comprises determining that the search item number N, N that include in the segment are natural number for value;With the two of N into
Number processed is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search result content
Occur, indicates do not occur when value 0;
The value by the identical each key-value pair of stat key, obtains each search terms in the key in search result content
The probability of middle appearance includes: to count the search terms in the identical each key of the key for each search terms in the identical key
Value is 1 number in the value of value pair, is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content
Rate.
Optionally, described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Optionally, this method further comprises:
Each search terms and corresponding weight are saved in weight database;
It is multiple search terms by the search word segmentation when receiving search term;
The corresponding weight of multiple search terms is obtained from the weight database;
It scans for handling according to the corresponding weight of multiple search terms.
According to another aspect of the invention, a kind of device of the weight of determining search terms is provided, wherein the device packet
It includes:
Data capture unit, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term and right
The search result content answered;
Probability determining unit determines that each search terms for including in each search term exist suitable for the set according to search data pair
The probability occurred in search result content;
Weight determining unit determines each search terms suitable for the probability occurred in search result content according to each search terms
Weight.
Optionally, the probability determining unit further comprises:
Key-value pair output unit, suitable for, to each search data pair in set, being determined from each search number for search data
According to pair each continuous segment that can obtain of search term;Using segment as key, and each search terms for including with the segment are in the piece
The case where whether occurring in the corresponding search result content of search term where section is value, exports key-value pair;
Statistic unit, suitable for by the value in the identical each key-value pair of stat key, obtaining in the key-value pair set of output
The probability that each search terms in the key occur in search result content.
Optionally, the data capture unit is suitable for obtaining search data from search engine click logs to composition collection
It closes.
Optionally, the key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and indicate that corresponding search terms exist with two kinds of possible values of every bit
Whether occur in corresponding search result content.
Optionally, the statistic unit, suitable for counting the search terms in institute for each search terms in the identical key
It states in the value of the identical each key-value pair of key and shows as the number occurred in search result content, be denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;According to first numerical value and the second number
The ratio of value determines the probability that the search terms occur in search result content.
Optionally, the key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and with every bit value 1 when indicates that corresponding search terms are searched corresponding
Rope resultant content occurs, and indicates do not occur when value 0;
The statistic unit, suitable for counting the search terms in the key phase for each search terms in the identical key
Value is 1 number in the value of same each key-value pair, is denoted as the first numerical value;The number of the identical each key-value pair of the key is counted, is remembered
For second value;According to the ratio of first numerical value and second value, determine that the search terms occur in search result content
Probability.
Optionally, described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Optionally,
The weight determining unit is further adapted for for each search terms and corresponding weight being saved in weight database;
The device further comprises:
Storage unit is suitable for storing the weight database;
Search processing, suitable for being multiple search terms by the search word segmentation when receiving search term;From the power
The corresponding weight of multiple search terms is obtained in weight database;It is searched according to the corresponding weight of multiple search terms
Rope processing.
The set for obtaining search data pair according to the technique and scheme of the present invention determines each according to the set of search data pair
The probability that each search terms for including in search term occur in search result content, according to each search terms in search result content
The probability of appearance determines the weight of each search terms.According to the technical solution of the present invention, it can fully consider and appear in search result
In each search terms content importance, excavate segment and segment each search terms for including in search data on a large scale and searching for
As a result the probability occurred in, and according to the probability excavated, determine the weight of each search terms.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the method for the weight of determining search terms according to an embodiment of the invention;
Fig. 2 shows a kind of schematic diagrames of the device of the weight of determining search terms according to an embodiment of the invention;
Fig. 3 shows a kind of determine the probability of the device of the weight of determining search terms in accordance with another embodiment of the present invention
Cell schematics;
Fig. 4 shows a kind of schematic diagram of the device of the weight of determining search terms in accordance with another embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Fig. 1 shows a kind of method flow diagram of the weight of determining search terms according to an embodiment of the invention.Such as Fig. 1
It is shown, this method comprises:
Step S110 obtains the set of search data pair;Wherein search data to include: search term and corresponding search knot
Fruit content.
It include one or more search data pair in the set of the search data pair got.
Step S120 determines each search terms for including in each search term in search result according to the set of search data pair
The probability occurred in content.
Step S130 determines the weight of each search terms according to the probability that each search terms occur in search result content.
Each search terms in a kind of search term are given in method shown in FIG. 1 in the probability of occurrence in search result
Method for Accurate Calculation may further determine that the weight of each search terms according to the obtained probability of this method, thus according to respectively searching
The weight of rope item provides search result, substantially increases the accuracy of search engine.
In one embodiment of the invention, the step S120 in method shown in Fig. 1 is according to the set for searching for data pair, really
The probability that each search terms for including in fixed each search term occur in search result content includes:
Step S121 determines the search from each search data pair for search data to each search data pair in set
Each continuous segment that word can obtain.
Step S122, using segment as key, and each search terms for including with the segment are corresponding in the search term where the segment
Search result content in the case where whether occurring be value, output key-value pair.
Step S123, by the value in the identical each key-value pair of stat key, obtains the key in the key-value pair set of output
In the probability that occurs in search result content of each search terms.
In one embodiment of the invention, the step S110 in method shown in Fig. 1 obtains the set packet of search data pair
It includes: obtaining search data from search engine click logs and composition is gathered.
It obtains search data from search engine click logs to gather composition, the data volume that can be utilized is big, and data obtain
It tries to please easily, and since it considers user's request and clicks the correlation of result, more meets user demand, the degree of correlation is high.
Here, each pair of search data centering obtained from search engine click logs, search term are what user inputted
Search query word, search result content are the search results that user finally clicks.As it can be seen that from search engine click logs
It obtains search data and expectation of the family to search result is shared to composition set symbol.
In one embodiment of the invention, each search terms for including with the segment described in step S122 are in the segment institute
The corresponding search result content of search term in whether occur the case where for value include:
Step S1221 determines that the search item number N, N that include in the segment are natural number.
Step S1222, using N binary numbers as the value, and with two kinds of possible values of every bit
Indicate whether corresponding search terms occur in corresponding search result content.
For example, segment includes three search terms, then use three binary numbers as value, and 1 expression is corresponding searches
Rope item occurs in the search result of the search data pair, and 0 indicates do not occur.Then value 110 indicates that first in the segment searches
Rope item and the second search terms occur in the search result of the search data pair, search of the third search terms in the search data pair
As a result do not occur in.
In one embodiment of the invention, pass through the value in the identical each key-value pair of stat key described in step S123,
Obtaining the probability that each search terms in the key occur in search result content includes:
Step S1231 counts the search terms in the identical each key-value pair of key for each search terms in the identical key
Value in show as the number occurred in search result content, be denoted as the first numerical value.
Step S1232, the number of the identical each key-value pair of stat key, is denoted as second value.
Step S1233 determines that the search terms go out in search result content according to the ratio of the first numerical value and second value
Existing probability.
By taking segment " ABC " as an example, there is following key-value pair: [ABC, 111] [ABC, 100] [ABC, 011] [ABC, 101]
[ABC, 110].
Probability=4/5=0.8 that A search terms occur in search result;
Probability=3/5=0.6 that B search terms occur in search result;
Probability=3/5=0.6 that C search terms occur in search result.
In one embodiment of the invention, each search terms for including with the segment described in step S122 are in the segment institute
The corresponding search result content of search term in whether occur the case where for value include:
Step S1221 ' determines that the search item number N, N that include in the segment are natural number.
Step S1222 ', using N binary numbers as the value, and with every bit value 1 when indicates pair
The search terms answered occur in corresponding search result content, indicate do not occur when value 0.
Then by the value in the identical each key-value pair of stat key described in step S123, each search terms obtained in the key exist
The probability occurred in search result content includes:
Step S1231 ' counts the search terms in the identical each key assignments of key for each search terms in the identical key
Pair value in value be 1 number, be denoted as the first numerical value.
Step S1232 ', the number of the identical each key-value pair of stat key, is denoted as second value.
Step S1233 ' determines the search terms in search result content according to the ratio of the first numerical value and second value
The probability of appearance.
In one embodiment of the invention, search result content is any one in following in method shown in Fig. 1;User
The title of the search results pages of click;The abstract for the search results pages that user clicks;The whole for the search results pages that user clicks
Content.
For example, the set of data pair is searched in acquisition from search engine click logs, it is " ABCDE " with search term, search
The title content of result page is that a pair of of " FGACDHJ " searches for for data pair.It is obtainable continuous from search term " ABCDE "
Segment includes:
1. including the segment of 1 search terms: A, B, C, D, E.
2. including the segment of 2 search terms: AB, BC, CD, DE.
3. including the segment of 3 search terms: ABC, BCD, CDE.
4. including the segment of 4 search terms: ABCD, BCDE.
5. including the segment of 5 search terms: ABCDE.
Using each segment as key, and each search terms for including with the segment are in the search result content of the search data pair
The case where whether occurring is value, exports the key-value pair about the search data pair.In this example, the search that occurs in search result
Item is A, C and D, it is determined that the corresponding value of A, C and D is 1, remaining corresponding value of each search terms content not occurred is 0.To this
Search term and corresponding search result content are handled, and following key-value pair can be exported:
1. including the segment of 1 search terms: A:1, B:0, C:1, D:1, E:0.
2. including the segment of 2 search terms: AB:10, BC:01, CD:11, DE:10.
3. including the segment of 3 search terms: ABC:101, BCD:011, CDE:110.
4. including the segment of 4 search terms: ABCD:1011, BCDE:0110.
5. including the segment of 5 search terms: ABCDE:10110.
To every a pair of of search data to treatment process as above is all carried out, all search of the set of search data pair have been handled
After word and corresponding search result content, key-value pair set is obtained.In key-value pair set in each key-value pair identical to key
Value is counted, and for each search terms in the identical key, counts the search terms in the value of the identical each key-value pair of key
The number that value is 1, is denoted as the first numerical value;The number of the identical each key-value pair of stat key, is denoted as second value;According to the first number
The ratio of value and second value, determines the probability that the search terms occur in search result content.By taking A, B and C as an example, it is assumed that
In each key-value pair of same keys " ABC ", the probability that each search terms occur in search result content is counted, it is available similar to such as
Lower data:
ABC:0.7,0.3,0.9.
The data are expressed as follows meaning: all search data centerings comprising segment " ABC ", wrap in the search result of click
Probability containing A is 0.7, and the probability comprising B is 0.3, and the probability comprising C is 0.9.It can be considered that the important ratio of A and C compared with
Height, and the important ratio of B is lower.
In one embodiment of the invention, each search terms and corresponding weight are saved in weight database, then existed
On the basis of above-mentioned, this method further comprises:
The search word segmentation is multiple search terms when receiving search term by step S140.
Step S150 obtains the corresponding weight of multiple search terms from weight database.
Step S160 scans for handling according to the corresponding weight of multiple search terms.
Using the weight of each search terms in obtained probability calculation segment, because probability calculation process considers search result
In each search terms content importance, so the weight more meets user demand, accuracy is high, and gained weight is saved in power
In weight database, during on-line search, scans for handling using the weight, the search matter of search engine can be effectively improved
Amount.
Fig. 2 shows a kind of schematic device of the weight of determining search terms according to an embodiment of the invention, such as Fig. 2
Shown, the device 200 of the weight of the determination search terms includes:
Data capture unit 210, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term
With corresponding search result content.
Probability determining unit 220 determines each search terms for including in each search term suitable for the set according to search data pair
The probability occurred in search result content.
Weight determining unit 230 determines each search suitable for the probability occurred in search result content according to each search terms
The weight of item.
Fig. 3 shows a kind of determine the probability of the device of the weight of determining search terms in accordance with another embodiment of the present invention
Cell schematics, as shown in figure 3, probability determining unit 220 further comprises:
Key-value pair output unit 221 and statistic unit 222.
Key-value pair output unit 221, suitable for, to each search data pair in set, being determined from each search for search data
Each continuous segment that the search term of data pair can obtain;Using segment as key, and each search terms for including with the segment are at this
The case where whether occurring in the corresponding search result content of search term where segment is value, exports key-value pair.
Statistic unit 222, suitable in the key-value pair set of output, by the value in the identical each key-value pair of stat key,
Obtain the probability that each search terms in the key occur in search result content.
In one embodiment of the invention, data capture unit 210 are searched suitable for obtaining from search engine click logs
Rope data set.
In one embodiment of the invention, key-value pair output unit 221 is adapted to determine that the search terms for including in the segment
Number N, N is natural number;Using N binary numbers as the value, and indicated with two kinds of possible values of every bit
Whether corresponding search terms occur in corresponding search result content.
In one embodiment of the invention, statistic unit 222, suitable for for each search terms in the identical key,
It counts the search terms and shows as the number occurred in search result content in the value of the identical each key-value pair of key, be denoted as first
Numerical value;The number of the identical each key-value pair of stat key, is denoted as second value;According to the ratio of first numerical value and second value
Value, determines the probability that the search terms occur in search result content.
For example, key-value pair output unit 221, is adapted to determine that the search item number N, N that include in the segment are natural number;With N
The binary number of position is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search
Resultant content occurs, and indicates do not occur when value 0.
Correspondingly, statistic unit 222, suitable for counting the search terms in key for each search terms in the identical key
The number that value is 1 in the value of identical each key-value pair, is denoted as the first numerical value;The number of the identical each key-value pair of stat key, note
For second value;According to the ratio of the first numerical value and second value, determine that the search terms occur general in search result content
Rate.
In one embodiment of the invention, search result content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
Fig. 4 shows a kind of schematic device of the weight of determining search terms in accordance with another embodiment of the present invention, such as
Shown in Fig. 4, the device 300 of the weight of the determination search terms includes: data capture unit 310, probability determining unit 320, weight
Determination unit 330, storage unit 340 and search processing 350.
Wherein, data capture unit 310, probability determining unit 320 and weight determining unit 330 and data described above
Acquiring unit 210, probability determining unit 220 and the correspondence of weight determining unit 230 are identical, and details are not described herein.
In the present embodiment, the weight determining unit 330 is further adapted for saving each search terms and corresponding weight
Into weight database;
Storage unit 340 is suitable for storing the weight database;
Search processing 350, suitable for being multiple search terms by the search word segmentation when receiving search term;From institute
It states and obtains the corresponding weight of multiple search terms in weight database;According to the corresponding weight of multiple search terms into
Row search process.
It should be noted that each embodiment pair of method shown in each embodiment of Fig. 2 to Fig. 4 shown device and figure 1 above
It answers identical, has been described in detail above, details are not described herein.
In conclusion the set of search data pair is obtained according to the technique and scheme of the present invention, according to the collection of search data pair
It closes, determines the probability that each search terms for including in each search term occur in search result content, searched for according to each search terms
The probability occurred in resultant content, determines the weight of each search terms, and is saved in weight database.The present invention is from search engine
Search data are obtained in click logs to set, the data volume that can be utilized is big, and data acquisition is easy, and since it is considered
User's request and the correlation for clicking result, more meet user demand, and the degree of correlation is high.According to the technical solution of the present invention, may be used
To fully consider the importance for appearing in each search terms content in search result, search data centering segment is excavated on a large scale
And the probability that segment each search terms for including occur in search result, meanwhile, using respectively being searched in obtained probability calculation segment
The weight of rope item, resulting weight more meet user demand, and accuracy is high, and gained weight is saved in weight database,
It during on-line search, scans for handling using the weight, the search quality of search engine can be effectively improved.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein.
Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) realize the device of the weight of determining search terms according to an embodiment of the present invention
In some or all components some or all functions.The present invention is also implemented as described herein for executing
Some or all device or device programs (for example, computer program and computer program product) of method.In this way
Realization program of the invention can store on a computer-readable medium, or can have the shape of one or more signal
Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape
Formula provides.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
Claims (14)
1. a kind of method of the weight of determining search terms, wherein this method comprises:
Obtain the set of search data pair;Wherein described search data are to including: search term and corresponding search result content;
According to the set of search data pair, it is general to determine that each search terms for including in each search term occur in search result content
Rate;
According to the probability that each search terms occur in search result content, the weight of each search terms is determined;
Wherein, the set according to search data pair, determines each search terms for including in each search term in search result content
The probability of middle appearance includes:
For search data to each search data pair in set, it is each to determine that the search term from each search data pair can obtain
Continuous segment;
Using segment as key, and each search terms for including with the segment are in the corresponding search result content of search term where the segment
In the case where whether occurring be value, output key-value pair;
In the key-value pair set of output, by the value in the identical each key-value pair of stat key, each search terms in the key are obtained
The probability occurred in search result content.
2. the method for claim 1, wherein the set for obtaining search data pair includes:
Search data are obtained from search engine click logs to gather composition.
3. the method for claim 1, wherein each search terms for including with the segment are in the search where the segment
The case where whether occurring in the corresponding search result content of word is that value includes:
Determine that the search item number N, N that include in the segment are natural number;
Using N binary numbers as the value, and corresponding search is indicated with two kinds of possible values of every bit
Whether item occurs in corresponding search result content.
4. the method for claim 1, wherein value by the identical each key-value pair of stat key, obtains the key
In the probability that occurs in search result content of each search terms include:
For each search terms in the identical key, counts the search terms and showed in the value of the identical each key-value pair of the key
For the number occurred in search result content, it is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, the probability that the search terms occur in search result content is determined.
5. the method for claim 1, wherein
Whether each search terms for including with the segment go out in the corresponding search result content of search term where the segment
Existing situation is that value comprises determining that the search item number N, N that include are natural number in the segment;Using N binary numbers as institute
The value stated, and with every bit value 1 when indicates that corresponding search terms occur in corresponding search result content, value 0
When indicate do not occur;
The value by the identical each key-value pair of stat key obtains each search terms in the key and goes out in search result content
Existing probability includes: to count the search terms in the identical each key-value pair of the key for each search terms in the identical key
Value in value be 1 number, be denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;
According to the ratio of first numerical value and second value, the probability that the search terms occur in search result content is determined.
6. the method for claim 1, wherein described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
7. this method further comprises such as method of any of claims 1-6:
Each search terms and corresponding weight are saved in weight database;
It is multiple search terms by the search word segmentation when receiving search term;
The corresponding weight of multiple search terms is obtained from the weight database;
It scans for handling according to the corresponding weight of multiple search terms.
8. a kind of device of the weight of determining search terms, wherein the device includes:
Data capture unit, suitable for obtaining the set of search data pair;Wherein the search data are to including: search term and corresponding
Search result content;
Probability determining unit determines that each search terms for including in each search term are being searched for suitable for the set according to search data pair
The probability occurred in resultant content;
Weight determining unit determines the power of each search terms suitable for the probability occurred in search result content according to each search terms
Weight;
Wherein, the probability determining unit further comprises:
Key-value pair output unit, suitable for, to each search data pair in set, being determined from each search data pair for search data
Each continuous segment that can obtain of search term;Using segment as key, and each search terms for including with the segment are in the segment institute
The corresponding search result content of search term in whether occur the case where for value, output key-value pair;
Statistic unit, suitable for by the value in the identical each key-value pair of stat key, obtaining the key in the key-value pair set of output
In the probability that occurs in search result content of each search terms.
9. device as claimed in claim 8, wherein
The data capture unit is suitable for obtaining search data from search engine click logs and gather composition.
10. device as claimed in claim 8, wherein
The key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;With the two of N into
Number processed indicates that corresponding search terms are tied in corresponding search as the value, and with two kinds of possible values of every bit
Whether occur in fruit content.
11. device as claimed in claim 8, wherein
The statistic unit, suitable for it is identical in the key to count the search terms for each search terms in the identical key
The number occurred in search result content is shown as in the value of each key-value pair, is denoted as the first numerical value;
The number for counting the identical each key-value pair of the key, is denoted as second value;According to first numerical value and second value
Ratio determines the probability that the search terms occur in search result content.
12. device as claimed in claim 8, wherein
The key-value pair output unit is adapted to determine that the search item number N, N that include in the segment are natural number;With the two of N into
Number processed is as the value, and with every bit value 1 when indicates corresponding search terms in corresponding search result content
Occur, indicates do not occur when value 0;
The statistic unit, suitable for it is identical in the key to count the search terms for each search terms in the identical key
Value is 1 number in the value of each key-value pair, is denoted as the first numerical value;The number for counting the identical each key-value pair of the key is denoted as
Two numerical value;According to the ratio of first numerical value and second value, determine that the search terms occur general in search result content
Rate.
13. device as claimed in claim 8, wherein described search resultant content is any one in following;
The title of search results pages;
The abstract of search results pages;
The full content of search results pages.
14. such as the described in any item devices of claim 8-13, wherein
The weight determining unit is further adapted for for each search terms and corresponding weight being saved in weight database;
The device further comprises:
Storage unit is suitable for storing the weight database;
Search processing, suitable for being multiple search terms by the search word segmentation when receiving search term;From the weight number
According to obtaining the corresponding weight of multiple search terms in library;It scans for locating according to the corresponding weight of multiple search terms
Reason.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510917486.XA CN105528430B (en) | 2015-12-10 | 2015-12-10 | A kind of method and apparatus of the weight of determining search terms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510917486.XA CN105528430B (en) | 2015-12-10 | 2015-12-10 | A kind of method and apparatus of the weight of determining search terms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105528430A CN105528430A (en) | 2016-04-27 |
CN105528430B true CN105528430B (en) | 2019-05-31 |
Family
ID=55770653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510917486.XA Active CN105528430B (en) | 2015-12-10 | 2015-12-10 | A kind of method and apparatus of the weight of determining search terms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105528430B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109933714B (en) * | 2019-03-18 | 2021-04-20 | 北京搜狗科技发展有限公司 | Entry weight calculation method, entry weight search method and related device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278980A (en) * | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
CN102193932A (en) * | 2010-03-09 | 2011-09-21 | 北京金山软件有限公司 | Method and system for determining search term |
CN102289436A (en) * | 2010-06-18 | 2011-12-21 | 阿里巴巴集团控股有限公司 | Method and device for determining weighted value of search term and method and device for generating search results |
CN103150362A (en) * | 2013-02-28 | 2013-06-12 | 北京奇虎科技有限公司 | Video search method and system |
CN104361115A (en) * | 2014-12-01 | 2015-02-18 | 北京奇虎科技有限公司 | Entry weight definition method and device based on co-clicking |
CN104376115A (en) * | 2014-12-01 | 2015-02-25 | 北京奇虎科技有限公司 | Fuzzy word determining method and device based on global search |
CN104615723A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Determining method and device of search term weight value |
CN104636403A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Query request processing method and device |
CN105095381A (en) * | 2015-06-30 | 2015-11-25 | 北京奇虎科技有限公司 | Method and device for new word identification |
-
2015
- 2015-12-10 CN CN201510917486.XA patent/CN105528430B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278980A (en) * | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
CN102193932A (en) * | 2010-03-09 | 2011-09-21 | 北京金山软件有限公司 | Method and system for determining search term |
CN102289436A (en) * | 2010-06-18 | 2011-12-21 | 阿里巴巴集团控股有限公司 | Method and device for determining weighted value of search term and method and device for generating search results |
CN103150362A (en) * | 2013-02-28 | 2013-06-12 | 北京奇虎科技有限公司 | Video search method and system |
CN104636403A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Query request processing method and device |
CN104361115A (en) * | 2014-12-01 | 2015-02-18 | 北京奇虎科技有限公司 | Entry weight definition method and device based on co-clicking |
CN104376115A (en) * | 2014-12-01 | 2015-02-25 | 北京奇虎科技有限公司 | Fuzzy word determining method and device based on global search |
CN104615723A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Determining method and device of search term weight value |
CN105095381A (en) * | 2015-06-30 | 2015-11-25 | 北京奇虎科技有限公司 | Method and device for new word identification |
Also Published As
Publication number | Publication date |
---|---|
CN105528430A (en) | 2016-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105956161B (en) | A kind of information recommendation method and device | |
US20140108418A1 (en) | Searching code by specifying its behavior | |
CN107861981A (en) | A kind of data processing method and device | |
CN103150362B (en) | A kind of video searching method and system | |
CN109918594B (en) | Information display method and device | |
CN105095381B (en) | New word identification method and device | |
CN103617213B (en) | Method and system for identifying newspage attributive characters | |
EP2631815A1 (en) | Method and device for ordering search results, method and device for providing information | |
CN104361115A (en) | Entry weight definition method and device based on co-clicking | |
WO2014201833A1 (en) | Method and device for processing data | |
CN106776559A (en) | The method and device of text semantic Similarity Measure | |
CN104376115A (en) | Fuzzy word determining method and device based on global search | |
CN103870607A (en) | Sequencing method and device of search results of multiple search engines | |
CN103942264A (en) | Method and device for pushing webpages containing news information | |
CN107341181A (en) | Method, apparatus, computer-readable recording medium and computer equipment are recommended in search | |
CN105786910B (en) | Entry weighing computation method and device | |
CN103870563B (en) | It is determined that the method and apparatus of the theme distribution of given text | |
CN106919576A (en) | Using the method and device of two grades of classes keywords database search for application now | |
CN105528430B (en) | A kind of method and apparatus of the weight of determining search terms | |
CN105488209B (en) | A kind of analysis method and device of word weight | |
CN106557483A (en) | A kind of data processing, data query method and apparatus | |
CN109522275A (en) | Label method for digging, electronic equipment and the storage medium of content are produced based on user | |
CN108647227A (en) | A kind of recommendation method and device | |
WO2018205391A1 (en) | Method, system and apparatus for evaluating accuracy of information retrieval, and computer-readable storage medium | |
CN111724143A (en) | RPA-based flow element positioning method and device, computing equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220726 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |