CN105159898B - A kind of method and apparatus of search - Google Patents

A kind of method and apparatus of search Download PDF

Info

Publication number
CN105159898B
CN105159898B CN201410262143.XA CN201410262143A CN105159898B CN 105159898 B CN105159898 B CN 105159898B CN 201410262143 A CN201410262143 A CN 201410262143A CN 105159898 B CN105159898 B CN 105159898B
Authority
CN
China
Prior art keywords
user
word string
query
information
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410262143.XA
Other languages
Chinese (zh)
Other versions
CN105159898A (en
Inventor
张友书
张阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201410262143.XA priority Critical patent/CN105159898B/en
Publication of CN105159898A publication Critical patent/CN105159898A/en
Application granted granted Critical
Publication of CN105159898B publication Critical patent/CN105159898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of method and apparatus of search, the method includes: to be scanned for the original query word string when receiving the original query word string of the first user submission, obtain the matched network information;According to the network information judge the original query word string whether be more query intentions inquiry word string;If so, the original query word string is rewritten as multiple the first inquiry word strings with the query intention respectively according to each query intention;The second user that there is same or similar query intention with the first inquiry word string is searched according to the first inquiry word string respectively;Wherein, the second user has community information;The network information and the corresponding community information of the second user are synthesized into search result.The embodiment of the present invention avoids the first user and repeats to carry out cumbersome artificial filter to the network information of magnanimity, reduces the consuming of the first user time and energy, substantially increases the efficiency, quality and capacity of acquisition of information.

Description

A kind of method and apparatus of search
Technical field
The present invention relates to the technical fields of search, method and a kind of device of search more particularly to a kind of search.
Background technique
With the rapid development of network, the network information is sharply increased.User in the network information of magnanimity in order to find institute The network information needed is scanned for usually using search engine.
Search engine refers to that collecting information from internet automatically is supplied to what user was inquired after centainly arranging System.Network information vastness is multifarious, and has no order, and all network informations are as the island one by one on vast sea, webpage chain Connecing is bridge criss-cross between these islands, and search engine, then draws an open-and-shut information map for user, It is consulted at any time for user.
But the contradiction that the network information speed increased and people obtain between information needed ability is more and more prominent, mistake The network information of amount makes user carry out cumbersome artificial filter when searching for the network information, takes considerable time and smart The search efficiency of power, the network information is very low.
Summary of the invention
The embodiment of the present invention is the technical problem to be solved is that a kind of method of search is provided, to improve the network information Search efficiency.
Correspondingly, the embodiment of the invention also provides a kind of device of search, to guarantee the realization of the above method and answer With.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of methods of search, comprising:
It when receiving the original query word string of the first user submission, is scanned for, is obtained with the original query word string The matched network information;
According to the network information judge the original query word string whether be more query intentions inquiry word string;If so, The original query word string is then rewritten as multiple the first inquiries with the query intention respectively according to each query intention Word string;
Searching respectively according to the first inquiry word string has same or similar query intention with the first inquiry word string Second user;Wherein, the second user has community information;
The network information and the corresponding community information of the second user are synthesized into search result.
Preferably, described to judge that the step of whether the original query word string is the inquiry word string of more query intentions includes:
Obtain the matched fisrt feature network information of the original query word string;The fisrt feature network information includes row The preceding M network information of the highest preceding N network information of sequence and/or history number of clicks at most;
Obtain the second feature network information of other query word String matchings;The second feature network information includes sorting most The preceding B network information of the high preceding A network information and/or history number of clicks at most;
Judge in the fisrt feature network information whether to include at least two second feature network informations;If so, sentencing The fixed original query word string is the inquiry word string of more query intentions;Wherein, M, N, A, B are positive integer.
Preferably, described to judge that the step of whether the original query word string is the inquiry word string of more query intentions includes:
The corresponding entity class of the original query word string is searched in set knowledge base;
When the entity class is more than two, determine that the original query word string is the inquiry word string of more query intentions.
Preferably, described to judge that the step of whether the original query word string is the inquiry word string of more query intentions includes:
The associated Feature Words of original query word string are searched in set knowledge base;
Judge in the webpage of the whole network, whether the quantity of the Feature Words is more than preset quantity threshold value;If so, using knowing The entity class for knowing library classifies to the Feature Words;
When acquisition at least two is classified, determine that the original query word string is the inquiry word string of more query intentions.
Preferably, the lookup has the step of the second user of same or similar query intention with the first inquiry word string Suddenly include:
Each of described first user first is obtained respectively inquires the corresponding first query intention information of word string and described Second query intention information of second user;
Calculate separately the similarity of the first query intention information Yu the second query intention information;
When the similarity is greater than preset similarity threshold, the first inquiry word string and the second user are judged With the same or similar query intention.
Preferably, the first query intention information includes first eigenvector, and the first eigenvector is according to First inquiry word string is determined;
The second query intention information includes second feature vector, and the second feature vector is according to second inquiry Word string is determined;
Wherein, the second inquiry word string is the inquiry word string that the second user is formerly submitted.
Preferably, the first eigenvector comprises at least one of the following:
The associated feature vector of participle and the first query word String matching of first inquiry word string and the first inquiry word string The associated feature vector of the network information;
The second feature vector comprises at least one of the following:
The associated feature vector of participle and the second query word String matching of second inquiry word string and the second inquiry word string The associated feature vector of the network information.
Preferably, the step by the network information and the corresponding community information synthesis search result of the second user Suddenly include:
First user under each query intention is calculated to spend closely with being associated with for the second user;
The corresponding community information of the second user is ranked up according to association degree closely;
Network information community information corresponding with the second user after sequence is synthesized into search result.
Preferably, described to calculate being associated with for first user and the second user under each query intention and spend closely Step includes:
To the similarity of the first query intention information and the second query intention information described under each query intention, And/or the related information between first user and the second user, and/or, the second user is looked into described second It askes the historical operation information record being intended to and configures corresponding weight;
To configuration weight after the first query intention information and the second query intention information similarity, And/or the related information between first user and the second user, and/or, the second user is looked into described second It askes the historical operation information being intended to and carries out read group total, obtain first user and the second user under each query intention Association spend closely.
Preferably, the related information between first user and the second user comprises at least one of the following:
The average connection duration in average connection number, preset time period, the quantity of common friend in preset time period, Dwelling places;
The second user comprises at least one of the following the historical operation information of second query intention:
The corresponding searching times of second query intention, the corresponding network information of second query intention browsing when The corresponding search continuous days of long, described second query intention.
Preferably, there is community's friend relation between first user and the second user.
The embodiment of the invention also discloses a kind of devices of search, comprising:
Network information search module, for receive the first user submission original query word string when, with described original Inquiry word string scans for, and obtains the matched network information;
More query intention judgment modules, for judging whether the original query word string is look into more according to the network information Ask the inquiry word string being intended to;If so, calling query word falsification writing module;
Query word falsification writing module, it is multiple for being rewritten as the original query word string respectively according to each query intention The first inquiry word string with the query intention;
User's searching module has phase with the first inquiry word string for searching respectively according to the first inquiry word string The second user of same or similar query intention;Wherein, the second user has community information;
Search result synthesis module, for searching the network information and the corresponding community information synthesis of the second user Hitch fruit.
Preferably, more query intention judgment modules include:
Fisrt feature network information acquisition submodule, for obtaining the matched fisrt feature network of the original query word string Information;The fisrt feature network information include the highest preceding N network information of sequence and/or history number of clicks at most before The M network information;
Second feature network information acquisition submodule, the second feature network for obtaining other query word String matchings are believed Breath;The second feature network information includes the preceding B of the highest preceding A network information of sequence and/or history number of clicks at most The network information;
Feature Network Information judging submodule, for judging in the fisrt feature network information whether to include at least two The second feature network information;If so, calling the first decision sub-module;
First decision sub-module, for determining that the original query word string is the inquiry word string of more query intentions;Wherein, M, N, A, B are positive integer.
Preferably, more query intention judgment modules include:
Entity class searches submodule, for searching the corresponding entity of the original query word string in set knowledge base Classification;
Second decision sub-module, for determining that the original query word string is more when the entity class is more than two The inquiry word string of query intention.
Preferably, more query intention judgment modules include:
Feature Words search submodule, for searching the associated feature of original query word string in set knowledge base Word;
Quantity judging submodule, for judging in the webpage of the whole network, whether the quantity of the Feature Words is more than present count Measure threshold value;If so, calling classification submodule;
Classification submodule, for being classified using the entity class of knowledge base to the Feature Words;
Third decision sub-module, for when acquisition at least two is classified, determining that the original query word string is more inquiries The inquiry word string of intention.
Preferably, user's searching module includes:
Query intention acquisition of information submodule, it is corresponding for obtaining the inquiry word string of each of described first user first respectively The first query intention information and the second user the second query intention information;
Query intention information similarity calculation submodule, for calculating separately the first query intention information and described the The similarity of two query intention information;
Judging submodule, for judging first query word when the similarity is greater than preset similarity threshold String has the same or similar query intention with the second user.
Compared with prior art, the embodiment of the present invention includes following advantages:
It is scanned in the embodiment of the present invention with the original query word string that the first user submits, obtains matched network letter Original query word string is rewritten as multiple with this by breath when judging original query word string for the inquiry word string of more query intentions First inquiry word string of query intention, and the second user that there is same or similar query intention with the first user is searched, and The community information of the network information and second user is synthesized into search result, so that when the first user demand is indefinite, according to each Kind classification demand is screened the community good friend of user with regard to subject categories by analysis search log, respectively obtained and each theme The mostly concerned each second user of classification, so that the search need of the first user is finely divided, make user search need not It can also recommend contact person similar with active user's demand out in specific situation, avoid the first user repetition to the net of magnanimity Network information carries out cumbersome artificial filter, reduces the consuming of the first user time and energy, decreases user equipment and net The system resources consumption stood decreases the occupancy of network bandwidth, substantially increases the efficiency, quality and capacity of acquisition of information.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the embodiment of the method for search of the invention;
Fig. 2 is a kind of displaying exemplary diagram of community information of the invention;
Fig. 3 is a kind of structural block diagram of the Installation practice of search of the invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
Referring to Fig.1, show a kind of step flow chart of the embodiment of the method for search of the invention, can specifically include as Lower step:
Step 101, it when receiving the original query word string of the first user submission, is searched with the original query word string Rope obtains the matched network information;
Using the embodiment of the present invention, the first user can log in the first client, then the first user can pass through first Client submits original query word string, request search and the matched network information of original query word string.
It, then can be original according to this when receiving the original query word string of the first user submission in the embodiment of the present invention Word string Rapid Detection network information in index database is inquired, the covariance mapping of the network information and inquiry is carried out, to will export Result be ranked up.
It is illustrated by taking search engine as an example, the search routine of search engine is divided into two parts, first is that front end user is asked Process is sought, second is that rear end makes data procedures.
One, front end user request process:
1. receiving request: receiving the inquiry word string that user inputs in search engine;
2. query word is analyzed: carrying out word segmentation processing to inquiry word string;
3. retrieval: according to word segmentation result, from the inverted index of pre-production, searching candidate's relevant to word segmentation result The network information;
4. sequence: for the candidate network information, being ranked up according to dimensions such as content relevance, timeliness;
5. showing: the webpage after sequence is come out in search engine webpage representation.
Two, rear end makes data procedures:
1. webpage capture: grabbing the network information of internet and guarantor by the linking relationship between webpage using crawler technology It deposits.
2. compilation of index: the network information for having grabbed preservation is analyzed, such as to web page title and page text into Row word segmentation processing makes inverted index according to word segmentation result, uses for front end user request process.
Step 102, according to the network information judge the original query word string whether be more query intentions query word String;If so, thening follow the steps 103;
Each searching request that user is issued may imply potential query intention behind, when original query word string When being associated with multiple queries intention, user demand is indefinite.
For example, the original query word string that user submits when searching for is " the semi-gods and the semi-devils ", pent-up demand may have three classes: Film " the semi-gods and the semi-devils ", TV play " the semi-gods and the semi-devils ", three kinds of game " the semi-gods and the semi-devils " can be on this basis by original query word string " the semi-gods and the semi-devils " is rewritten.
In one preferred embodiment of the invention, step 102 may include following sub-step:
Sub-step S11 obtains the matched fisrt feature network information of the original query word string;The fisrt feature network Information may include the preceding M network information of the highest preceding N network information of sequence and/or history number of clicks at most;
Sub-step S12 obtains the second feature network information of other query word String matchings;The second feature network information It may include the preceding B network information of the highest preceding A network information of sequence and/or history number of clicks at most;
Whether sub-step S13 judges in the fisrt feature network information to include at least two second feature network informations; If so, executing sub-step S14;
Sub-step S14 determines that the original query word string is the inquiry word string of more query intentions.
It should be noted that M, N, A, B all can be positive integers.
In the concrete realization, by the search result (i.e. with the network information of query word String matching) of analysis inquiry word string and User search for log, judge the inquiry word string whether be more intent query words inquiry word string.
It is possible to further obtain the N network information before the search result of all query words, example by searching for log statistic Such as the preceding M item net of preceding 10 URL (Uniform Resource Locator, uniform resource locator) and number of clicks at most Network information, such as preceding 10 URL.Before if the highest preceding N network information of sequence and number of clicks of inquiry word string a are most The M network information, the highest preceding A network information of sequence and number of clicks comprising inquiry word string b and inquiry word string c are most The preceding B network information, it may be considered that inquiry word string a is more intent query words, and user demand has two classes, a kind of demand Related to inquiry word string b, another kind of demand is related to inquiry word string c.
For example, obtaining the highest network information of sequence as shown in Table 1 by search log statistic and/or history being clicked The most network informations of number.
Table 1, the highest network information of sequence and/or the most network information lists of history number of clicks
It can analyze to obtain by table 1:
The highest network information of " the semi-gods and the semi-devils " sequence includes that " dragon oath game " and " the semi-gods and the semi-devils TV play " sorts The highest network information;
The most network informations of " the semi-gods and the semi-devils " history number of clicks include " dragon oath game " and " the semi-gods and the semi-devils TV It is acute " the most network informations of history number of clicks;
Therefore can know, when user searches for, the query demand of original query word string " the semi-gods and the semi-devils " can have two kinds, point It Wei " dragon oath game " and " the semi-gods and the semi-devils TV play ".
, can be in the query demand for precalculating each inquiry word string using the embodiment of the present invention, then production is as shown in table 2 More query intention dictionaries, receive user submission original query word string when, can be searched in the query intention dictionary The original query word string then can be determined that the original query word string is the inquiry word string of more query intentions when finding.
Table 2, more intent query word string lists
In one preferred embodiment of the invention, step 102 may include following sub-step:
Sub-step S21 searches the corresponding entity class of the original query word string in set knowledge base;
Sub-step S22 determines that the original query word string is more query intentions when the entity class is more than two Inquire word string.
The corresponding entity class of original query word string can be obtained by knowledge library lookup in the embodiment of the present invention, from And classify to user query demand.
For example, user searches for " the fiery shadow person of bearing ", lookup knowledge base, which obtains " the fiery shadow person of bearing ", two entity class, first is that unrestrained It draws, first is that cartoon.So user demand can be divided by two classes according to the classification in knowledge base: " the caricature fire shadow person of bearing " and " the cartoon fire shadow person of bearing ", and on this basis rewrite original query word string " the fiery shadow person of bearing ".
It should be noted that knowledge base is structuring in knowledge engineering, easy to operate, Yi Liyong, comprehensively organized knowledge Cluster is the needs solved for a certain (or certain) field question, is being calculated using certain (or several) knowledge representation mode The knowledge piece set interknited for storing, organize, managing and using in machine memory.These knowledge pieces include related to field Theoretical knowledge, factual data, the heuristic knowledge obtained by expertise, such as definition related in certain field, theorem and fortune Algorithm and common sense knowledge etc..
Entity can be with corresponding one specific individual, such as can be Liu Dehua, Zhang Baizhi, woods in star's classification Green rosy clouds etc., entity also include some wide in range individuals, such as people, film star, singer of representative classification etc..
In one preferred embodiment of the invention, step 102 may include following submodule:
Sub-step S31 searches the associated Feature Words of original query word string in set knowledge base;
Sub-step S32 judges in the webpage of the whole network whether the quantity of the Feature Words is more than preset quantity threshold value;If It is then to execute sub-step S33;
Sub-step S33 classifies to the Feature Words using the entity class of knowledge base;
Sub-step S34 determines that the original query word string is the inquiry of more query intentions when acquisition at least two is classified Word string.
In the embodiment of the present invention user demand can be determined in conjunction with the webpage of knowledge base and internet.
In the concrete realization, first several Feature Words can be extracted, analysis is then passed through according to the physical contents of knowledge base Internet web page obtains the correlation degree and demand intensity of Feature Words and entity, and finally selection amount threshold determines original Inquire the final classification of word string.
For example, user, in search, the original query word string of submission is " Tsinghua University ", then it can be from the " clear of knowledge base In the physical contents of Hua Da ", " education of undergraduate course " is extracted, " graduate education ", " two school gates ", " MoonlIght on the Lotus Pond ", " first is eaten The Feature Words such as the Room ", " garden Jin Chun dining room ".Then by analyzing internet web page, statistics occurs [clear simultaneously in webpage Hua Da, education of undergraduate course], [Tsinghua University, graduate education], the Feature Words such as [Tsinghua University, two school gates] webpage number.Simultaneously The webpage of appearance is more, and it is closer to indicate that the specific word is associated with " Tsinghua University ".More than the Feature Words of preset amount threshold It may act as the potential demand of user.
Further according to the classification system of knowledge base, Feature Words are classified.
For example, " education of undergraduate course ", " graduate education " are enrollment class, " two school gates ", " MoonlIght on the Lotus Pond " are sight spot class, " first Institute dining room ", " garden Jin Chun dining room " are dining room class.
Finally demand relevant to " Tsinghua University " is just divided into three classes, i.e., enrollment class demand, sight spot class demand, dining room class need It asks.
Step 103, the original query word string is rewritten as respectively according to each query intention multiple with the inquiry The first inquiry word string being intended to;
After the analysis of more intent queries, a fuzzy query demand, it can it is converted into the query demand of multiple determinations, I.e. first inquiry word string can be for the inquiry word string for determining query demand.
For example, original query word string " the semi-gods and the semi-devils " can be rewritten as the first inquiry word string " the semi-gods and the semi-devils film ", " sky dragon Eight TV plays ", " dragon oath game ".
Step 104, it searches respectively with the first inquiry word string according to the first inquiry word string with same or similar The second user of query intention;Wherein, the second user can have community information;
During the present invention is implemented, it can be found out according to original query word string and be looked into what each first inquiry word string matched respectively It askes and is intended to, be then directed to different query intentions, matching meets the second user of the corresponding query intention of the first inquiry word string.
For example, original query word string is " the semi-gods and the semi-devils ", and the second user of lookup may include three when user searches for Part, i.e., the tool for inquiring word string " the semi-gods and the semi-devils film ", " the semi-gods and the semi-devils TV play " and " dragon oath game " with first respectively There is the second user of same or similar query intention, guarantees to recommend in the indefinite situation of search need to use with current out The similar contact person of family demand.
In the concrete realization, it can have community's friend relation between first user and the second user, then originally Social account, such as immediate communication tool user, all types of websites (such as forum, discussion bar, door can be associated in inventive embodiments Family website etc.) registration user etc., it is associated with community's friend relation of available first user of social account, in the first user Community good friend user in search matching second user.
It should be noted that community's friend relation may include one or more levels friend relation, for example, level-one friend relation User can be active user good friend user, second level good friend user can for active user good friend user it is corresponding Good friend user etc., the embodiments of the present invention are not limited thereto.
Certainly, non-community's friend relation, i.e. second user be can have between first user and the second user It can be strange user for the first user, then can search matched the in the embodiment of the present invention in global scope Two users.
Wherein, the second user can have community information, and community can be several social groups or social organization It is gathered in some field and is formed by the collectively owned business in life that is mutually related, such as forum, microblogging, discussion bar, portal Website, instant communicating system etc., i.e. community information may include user's head portrait, user's name, User ID, address etc. Deng.
In one preferred embodiment of the invention, step 104 may include following sub-step:
Sub-step S41 obtains each of described first user first respectively and inquires the corresponding first query intention letter of word string Second query intention information of breath and the second user;
First query intention information can be the first user of mark in the indefinite situation of query intention, a certain subdivision The information of the corresponding query intention of subject categories, the second query intention information can be the letter of mark second user query intention Breath.
In a preferred example of an embodiment of the present invention, the first query intention information may include fisrt feature to Amount, the second query intention information may include second feature vector;
Wherein, the vector information that first eigenvector can be intended to for the first user query of mark, second feature vector can Think the vector information of mark second user query intention.
Then in this example,
The first query intention information may include corresponding first eigenvector, and the first eigenvector can To be determined respectively according to the first inquiry word string;
The second query intention information may include second feature vector, and the second feature vector can be according to described Second inquiry word string is determined;
Wherein, the second inquiry word string can be the inquiry word string that the second user is formerly submitted
In this example, it can search by analyzing inquiry word string, search result and search log and represent inquiry The feature of the query intention of word string, is calculated characteristic value, so that query word string list is shown as feature vector.
The relevant feature vector of query intention of inquiry word string can be divided into three categories, and the first kind can be query word string sheet The feature vector of body, the second class can be able to be and query word for the associated feature vector of participle with inquiry word string, third class The associated feature vector of the network information of String matching, these feature vectors may be used to indicate the query intention of inquiry word string.
Then in the concrete realization, the first eigenvector may include following at least one:
The associated feature vector of participle and the first query word String matching of first inquiry word string and the first inquiry word string The associated feature vector of the network information;
The second feature vector may include following at least one:
The associated feature vector of participle and the second query word String matching of second inquiry word string and the second inquiry word string The associated feature vector of the network information.
In a kind of preferable example that the present invention is implemented, the associated feature vector of participle with the first inquiry word string can To comprise at least one of the following:
The synonymous word string of first inquiry word string, the participle of the first inquiry word string, the first inquiry word string participle part of speech, the One inquiry word string participle synonym, first inquiry word string participle different degree;
Described and the first query word String matching associated feature vector of the network information may include following at least one:
Title with the network information of the first query word String matching, the webpage with the network information of the first query word String matching Mark is inquired with the history click information of the network information of the first query word String matching, with the first associated other of inquiry word string Word string;
The associated feature vector of participle with the second inquiry word string may include following at least one:
The synonymous word string of second inquiry word string, the participle of the second inquiry word string, the second inquiry word string participle part of speech, the Two inquiry word strings participle synonym, second inquiry word string participle different degree;
Described and the second query word String matching associated feature vector of the network information may include following at least one:
Title with the network information of the second query word String matching, the webpage with the network information of the second query word String matching Mark is inquired with the history click information of the network information of the second query word String matching, with the second associated other of inquiry word string Word string.
The example of first/second feature vector can be such that
1, word string itself is inquired;
For example, revised first inquiry word string " the semi-gods and the semi-devils TV play " itself.
2, the synonymous word string of word string is inquired;
In this example, the synonymous word string of inquiry word string can be found in the good synonym dictionary of pre-production.For example, " the semi-gods and the semi-devils " and " new the semi-gods and the semi-devils " is synonym, and " new the semi-gods and the semi-devils " and " the good version of the semi-gods and the semi-devils clock Chinese " is synonym (this Class synonym always can be synonym with newest one edition the semi-gods and the semi-devils with actual change).
3, the participle term of word string is inquired;
In this example, query word can be segmented, the term after being segmented.For example, to inquiry word string " sky dragon eight Term after portion's TV play " participle has two [the semi-gods and the semi-devils, TV plays].
4, the part of speech of the participle term of word string is inquired;
In this example, part of speech analysis can be carried out to participle term, obtain the part of speech of participle term.For example, participle term [the semi-gods and the semi-devils, TV play] corresponding part of speech is [noun, noun].
5, the synonym of the participle term of word string is inquired;
In this example, the synonym of participle term can be searched in the synonym dictionary of pre-production.For example, participle The synonym of term [the semi-gods and the semi-devils, TV play] is [the semi-gods and the semi-devils, serial].
6, the different degree of the participle term of word string is inquired;
In this example, log can be searched for by statistics, obtain TF (Term Frequency, the word of each participle term Frequently) and IDF (Inverse Document Frequency, anti-document frequency).TF-IDF is a kind of statistical method, to assess Significance level of one words for a copy of it file in a file set or a corpus.The importance of words is with it The directly proportional increase of the number occurred hereof, but the frequency that can occur in corpus with it simultaneously is inversely proportional decline.Then The different degree of each participle term can be indicated in this example by TF-IDF.For example, participle term [" the semi-gods and the semi-devils ", " electricity Depending on play "] in, the TF-IDF value of " the semi-gods and the semi-devils " is higher than the TF-IDF value of " TV play ", then " the semi-gods and the semi-devils " is than " TV play " Different degree is high, includes more information content.
7, with the title of the network information of query word String matching;
In this example, the title of the network information can refer to and inquire that word string is corresponding, and (N is by preceding N that search engine returns Positive integer, such as the 10) title of search result, can be used for the relevant text of locating query word string and keyword.For example, searching Rope " the semi-gods and the semi-devils ", first three title of the search result of return is respectively that " the new good version of the semi-gods and the semi-devils clock Chinese (complete 42 collection) is online Viewing-* * video display ", " the semi-gods and the semi-devils (2013)-the semi-gods and the semi-devils (2013) complete or collected works (1-42 is complete)-* * video " and " the semi-gods and the semi-devils _ point Collect plot-* * net ".
8, with the banner of the network information of query word String matching;
In this example, banner can be the information that can represent the webpage that one uniquely determines, such as unified resource Identifier (Uniform Resource Identifier, URI), uniform resource identifier can specifically include unified resource again Finger URL (Uniform Resource Locator, URL) or uniform resource name (Uniform Resource Name, URN) etc..(M is positive integer to M, such as the 10) URL of the network information, can be used for positioning before being specifically as follows search result Inquire the relevant network address of word string and website.For example, search " the semi-gods and the semi-devils ", first three URL of search result are respectively as follows:
" http://kan.***.com/search/ keyword=%E5%A4%A9%E9%BE%99%E5% 85%AB%E9%83%A8 ";
"http://tv.***.com/s2013/tlbbwsj2013/";
" http://www.***.com/drama/KysdNWU=/episode ".
9, with the history click information of the network information of query word String matching;
In this example, history click information can be for the user of the search inquiry word string, the click feelings in search result The statistics of condition.It is more important, more relevant with inquiry word string which network information is measured by user behavior.For example, user searches for " the semi-gods and the semi-devils " 10000 times, the click of first three URL are shown in table 1.
Table 3, history click information table
By table 3 it can be shown that the URL of the Article 2 network information and inquiry word string are more relevant.
10, word strings are inquired with associated other of inquiry word string;
In this example, it may search for submitting the user of the inquiry word string which also has been searched for other inquiry word strings, Ke Yiyong In the relevant some concepts of expression inquiry word string.For example, the user of search " 18 is big ", has also searched for " two Conferences ", " the 18 of party Spirit " etc..
Certainly, above-mentioned first/second feature vector is intended only as example, in implementing the embodiments of the present invention, can basis Other first/second feature vectors are arranged in actual conditions, and the embodiments of the present invention are not limited thereto.In addition, in addition to above-mentioned Outside one/second feature vector, those skilled in the art can also use other first/second feature vectors according to actual needs, The embodiment of the present invention is also without restriction to this.
Sub-step S42 calculates the similarity of the first query intention information and the second query intention information;
In the concrete realization, inquiry word string can be clustered according to the similitude of query intention.
In a preferred example of an embodiment of the present invention, sub-step S42 can further include following sub-step:
Sub-step S421 calculates the similarity between the first eigenvector and the second feature vector.
In this example, for the feature vector determined by inquiry word string, clustering algorithm (such as hierarchical clustering can be used Algorithm/kmeans algorithm etc.) similarity is calculated, word string, which will be inquired, further according to similarity carries out category division.
For example, the first inquiry word string " the semi-gods and the semi-devils TV play " and the second inquiry word string " the semi-gods and the semi-devils Zhong Hanliang in table 4 The corresponding first eigenvector of version " and second feature vector, identical part have:
1, the participle term that the participle term for inquiring word string has a different degree high is identical, i.e., " the semi-gods and the semi-devils ";
2, first three network information of search result is identical, i.e., with the title of the network information of query word String matching, with look into The banner for asking the matched network information of word string is identical;
3, inquiring with " the semi-gods and the semi-devils TV play " associated other includes " the good version of the semi-gods and the semi-devils clock Chinese " in word strings, and all There is identical query word " the semi-gods and the semi-devils ".
Table 4, feature vector contrast table
In the cluster process using clustering algorithm, these same sections can be quantified and fisrt feature is calculated The similarity of vector sum second feature vector.
Sub-step S43 judges the first inquiry word string and institute when the similarity is greater than preset similarity threshold Second user is stated with the same or similar query intention.
In the concrete realization, when similarity is more than default similarity threshold, then first word string and the second query word are inquired String can gather for one kind, i.e. query intention and second user after this corresponding subdivision of the first user are same or similar.
First eigenvector and second feature vector are more similar, and the first inquiry word string and the second inquiry word string are more possible to It is to be gathered in cluster process for one kind, the inquiry meaning of query intention and second user after this corresponding subdivision of the first user Scheme more similar or even identical.
For example, the first inquiry word string " the semi-gods and the semi-devils TV play " and second inquire word string " the good version of the semi-gods and the semi-devils clock Chinese " can be with Gather for one kind, the first inquiry word string " loan application " and the second inquiry word string " application provide a loan process " can gather for one kind.
In the concrete realization, user and its query intention, inquiry word string/spy can be saved after user inquires The corresponding relationship of vector and its query intention is levied, to facilitate subsequent lookup that there is same or similar look into each first inquiry word string Ask the second user being intended to.
For example, the corresponding relationship can be saved according to format as shown in table 5.
Table 5, user-query intention, inquiry word string/feature vector-query intention corresponding lists
When searching the second user that there is same or similar query intention with the first user, looked into according to the user-of preservation Ask be intended to, inquiry word string/feature vector-query intention corresponding lists and the first user first eigenvector, be calculated with First user query are intended to the same or similar second user.
Steps are as follows for specific calculating:
1, the first eigenvector A of the first user is determined;
2, using the feature vector in A and user-query intention, inquiry word string/feature vector-query intention corresponding lists A1, A2 ..., An (n is positive integer) calculate similarity, it is corresponding to find the highest feature phase vector Ai of similarity (i is positive integer) Query intention i;
3, the query intention i obtained according to step 2, in user-query intention, inquiry word string/feature vector-query intention In corresponding lists, the second user of query intention i is found.
Step 105, the network information and the corresponding community information of the second user are synthesized into search result.
It, can be using the community information of the network information and second user as final search result in the embodiment of the present invention.
In one preferred embodiment of the invention, step 105 may include following sub-step:
Sub-step S51 calculates first user under each query intention and spends closely with being associated with for the second user;
In the embodiment of the present invention, influence the first user to be associated with the factor spent closely with second user to may include three portions Point, first part is the similarity of query intention, and second part is the familiarity of the first user and second user, Part III It is familiarity of the second user to query intention.
In a preferred example of an embodiment of the present invention, sub-step S51 can further include following sub-step:
Sub-step S511, by the corresponding each subdivision classification of original query word string, i.e., under each query intention described The similarity of one query intention information and the second query intention information, and/or, first user and the second user Between related information, and/or, the second user configures corresponding power to the historical operation information of second query intention Weight;
Sub-step S512, to the first query intention information and the second query intention information after configuration weight Similarity, and/or, the related information between first user and the second user, and/or, the second user pair The historical operation information of second query intention carries out read group total, obtains first user and institute under each query intention The association for stating second user is spent closely.
In this example, can by historical data and search log analysis, the second query intention information it is similar Degree, and/or, the related information between first user and the second user, and/or, the second user is to described the Then the numerical value of each factor in the historical operation information of two query intentions configures weight with experience according to actual needs, such as Different degree is higher, and weight then can be bigger, finally by various factors weighted calculation, obtaining being associated with close degree.
In practical applications, the similarity of the first query intention information and the second query intention information can be in step 104 In be calculated.Inquiry word string is more similar, and query intention is then more similar.
For example, second user A was searched in the corresponding subdivision classification " TV play " of original query word string " the semi-gods and the semi-devils " " the semi-gods and the semi-devils TV play complete or collected works ", second user B searched for " the semi-gods and the semi-devils introduction ", then second user A is than second user B's Query intention is closer to the first user, then the association of second user A association of the degree than second user B closely is spent bigger closely.
In the concrete realization, the related information between first user and the second user may include it is following at least It is a kind of:
The average connection duration in average connection number, preset time period, the quantity of common friend in preset time period, Dwelling places.
In this example, related information can identify the familiarity of the first user and second user, more frequent connection Second user, familiarity is higher, then association degree closely is then higher.
The second user may include following at least one to the historical operation information of second query intention:
The history of second the query intention corresponding searching times and the matched network information of the second query intention Number of clicks, the browsing duration of the corresponding network information of second query intention, the corresponding search of second query intention Continuous days.
In this example, historical operation information can identify second user to the level of understanding of the query intention, to the inquiry It is intended to spend that the time is more, more known second user, understands that higher, then association degree closely is then higher.
Searching times corresponding for the second query intention, can be in user-query intention as shown in table 5, query word String/feature vector-query intention corresponding lists are found, such as can be with for the sequence of the corresponding searching times of query intention 1 For 2 > user of user 1.
It, can from search log for the history number of clicks with the matched network information of the second query intention To obtain second user to the number of clicks of the second inquiry word string, number of clicks is more, it can be said that the webpage quantity of bright browsing, Content is more, higher to the familiarity of the second query intention.
The browsing duration of the network information corresponding for the second query intention statistics can obtain second from search log The time quantum of user's browsing the second inquiry word string related web page, the browsing time is longer, then to the familiarity of the second query intention It is higher.
Search continuous days corresponding for the second query intention statistics can obtain second user and look into from search log Ask the continuous days of same query intention.Number of days is more, the duration is longer, it can be said that bright second user anticipates to the second inquiry Scheme more familiar.For example, second user A continues a search in month " Japan's tourism ", second user B continues search " Japanese trip in three days Trip ", it may be considered that this query intention is more familiar with some second user A to " Japan travels " than second user B.
For example, the first inquiry word string " the semi-gods and the semi-devils TV play ", the second user with same or similar query intention have It three, respectively second user A, second user B, second user C, influences to be associated with the factor spent closely as shown in table 6.
Table 6, association degree contrast table closely
Wherein, second user A is compared with second user C as the first user connection frequently, but is more familiar with inquiry meaning Figure.Second user C is compared with second user B, is contacted more frequently with the first user, is more familiar with to the query intention.
According to sub-step S51, same available and the first inquiry word string " the semi-gods and the semi-devils film ", " dragon oath game " Second user with same or similar query intention.
Sub-step S52 is ranked up the corresponding community information of the second user according to association degree closely;
In this example, it can be ranked up from high to low according to association degree closely, i.e., sequence sorts;Certainly, this example In can also be ranked up from low to high according to association degree closely, i.e. Bit-reversed, the embodiments of the present invention are not limited thereto.
For example, the degree closely of association shown in table 6: 155 > 135 > 117.2, the collating sequence of available second user are as follows: Second user A > second user C > second user B.
Network information community information corresponding with the second user after sequence is synthesized search result by sub-step S53.
Search result synthesis after the completion of, then can in the client by the community information of the second user after sequence together with The network information is presented to the first user, such as by the head portrait of each second user on the right side of the corresponding network information of the first inquiry word string Show, carries out communication exchange for the first user.
As shown in Fig. 2, can be opened up according to different query intentions, i.e. TV play class demand, game demand, film demand Show community information of second user, including its head portrait, title etc..For example, being " the semi-gods and the semi-devils " by query word string original in table 5 When, user 1, user 2 show that user 3 shows in the case where segmenting classification " game " under corresponding subdivision classification " TV play ";It is original to look into When inquiry word string is " Tsinghua University ", user 4 shows that user 5 is in the case where segmenting classification " tourism " under corresponding subdivision classification " enrollment " It shows.
In another example good friend A studied " recruiting for Tsinghua University by search engine in the instant messaging good friend of active user Life " category information, good friend B studied " tourism " category information of Tsinghua University by search engine, then " Tsing-Hua University is big for active user's input Learn " when, the head portrait of good friend A, good friend B can be attached to the class label of " enrollment " on the right side of search results pages, " tourism " respectively Under, make active user when search need is indefinite, the associated user in community good friend is finely divided, it is corresponding thin for its selection It is exchanged again after sub-category good friend.
It, can be in the synthesis of search result, to the community information construction of second user and the using the embodiment of the present invention The entrance object for the communication software that two users are communicated, the first user can trigger the entrance pair by modes such as mouse clicks As directly carrying out instant messaging with second user.
Certainly, the first user can also use other approach and second user after the community information for obtaining second user It is communicated.
For example, the first user can obtain the second user if including mail address in the community information of second user Outlook (one for sending and receiving, the application program writing, manage Email) entrance, to mail address transmission mail.
In another example the first user can pass through if including user's name or User ID in the community information of second user Corresponding immediate communication tool, all types of websites (such as forum, discussion bar, portal website etc.) find second user progress Communication.
In other embodiments, user can scan in mobile client, wirelessly submit original query Word string obtains matched radio network information, judges original query word string for the query word of more query intentions in wireless server When string, original query word string is rewritten as multiple the first inquiry word strings with the query intention, and search corresponding inquiry respectively The second user of intention, and the community information of the network information and second user synthesis wireless search result is back to mobile client End, the corresponding instant communication software that user directly calls in mobile client are linked up with selected second user.
Traditional search engine can only search for the network information, and active user inputs in the community websites such as microblogging, forum and looks into Word string is ask, community website can return to user relevant to word string is inquired and microblogging/model, but return is searched in community website User is will to inquire word string to match to obtain with community information (mainly user name), is not carried out to the search need of user thin Point, it is even more impossible to obtain the user of similar demands.
It is scanned in the embodiment of the present invention with the original query word string that the first user submits, obtains matched network letter Original query word string is rewritten as multiple with this by breath when judging original query word string for the inquiry word string of more query intentions First inquiry word string of query intention, and search has the second of same or similar query intention to use with the first inquiry word string Family, and the community information of the network information and second user is synthesized into search result, so that being pressed when the first user demand is indefinite According to various classification demands, subject categories are screened by analysis search log to the community good friend of user, respectively obtain with each The mostly concerned each second user of subject categories needs user in search so that the search need of the first user is finely divided Contact person similar with active user's demand out can also be recommended by asking in indefinite situation, avoided the first user and repeated to magnanimity The network information carry out cumbersome artificial filter, reduce the consuming of the first user time and energy, decrease user equipment With the system resources consumption of website, the occupancy of network bandwidth is decreased, improves the efficiency, quality and capacity of acquisition of information.
Referring to Fig. 3, a kind of structural block diagram of the Installation practice of search of the present invention is shown, can specifically include such as lower die Block:
Network information search module 301, for receive the first user submission original query word string when, with the original The inquiry word string that begins scans for, and obtains the matched network information;
More query intention judgment modules 302, for according to the network information judge the original query word string whether be The inquiry word string of more query intentions;If so, calling query word falsification writing module 303;
Query word falsification writing module 303, for being rewritten as the original query word string respectively according to each query intention Multiple the first inquiry word strings with the query intention;
User's searching module 304 has phase with the first inquiry word string for searching respectively according to the network information The second user of same or similar query intention;Wherein, the second user has community information;
Search result synthesis module 305, for closing the network information and the corresponding community information of the second user At search result.
In one preferred embodiment of the invention, more query intention judgment modules 302 may include following submodule Block:
Fisrt feature network information acquisition submodule, for obtaining the matched fisrt feature network of the original query word string Information;The fisrt feature network information include the highest preceding N network information of sequence and/or history number of clicks at most before The M network information;
Second feature network information acquisition submodule, the second feature network for obtaining other query word String matchings are believed Breath;The second feature network information includes the preceding B of the highest preceding A network information of sequence and/or history number of clicks at most The network information;
Feature Network Information judging submodule, for judging in the fisrt feature network information whether to include at least two The second feature network information;If so, calling the first decision sub-module;
First decision sub-module, for determining that the original query word string is the inquiry word string of more query intentions;Wherein, M, N, A, B are positive integer.
In one preferred embodiment of the invention, more query intention judgment modules 302 may include following submodule Block:
Entity class searches submodule, for searching the corresponding entity of the original query word string in set knowledge base Classification;
Second decision sub-module, for determining that the original query word string is more when the entity class is more than two The inquiry word string of query intention.
In one preferred embodiment of the invention, more query intention judgment modules 302 may include following submodule Block:
Feature Words search submodule, for searching the associated feature of original query word string in set knowledge base Word;
Quantity judging submodule, for judging in the webpage of the whole network, whether the quantity of the Feature Words is more than present count Measure threshold value;If so, calling classification submodule;
Classification submodule, for being classified using the entity class of knowledge base to the Feature Words;
Third decision sub-module, for when acquisition at least two is classified, determining that the original query word string is more inquiries The inquiry word string of intention.
In one preferred embodiment of the invention, user's searching module 304 may include following submodule:
Query intention acquisition of information submodule, it is corresponding for obtaining the inquiry word string of each of described first user first respectively The first query intention information and the second user the second query intention information;
Query intention information similarity calculation submodule, for calculating separately the first query intention information and described the The similarity of two query intention information;
Judging submodule, for judging first query word when the similarity is greater than preset similarity threshold String has the same or similar query intention with the second user.
In one preferred embodiment of the invention, the first query intention information may include first eigenvector, The first eigenvector can be determined according to the first inquiry word string;
The second query intention information may include second feature vector, and the second feature vector can be according to described Second inquiry word string is determined;
Wherein, the second inquiry word string is the inquiry word string that the second user is formerly submitted.
In one preferred embodiment of the invention, the query intention information similarity calculation submodule may include as Lower submodule:
Feature vector similarity calculation submodule, for calculate the first eigenvector and the second feature vector it Between similarity.
In a preferred example of an embodiment of the present invention, the first eigenvector may include following at least one:
The associated feature vector of participle and the first query word String matching of first inquiry word string and the first inquiry word string The associated feature vector of the network information;
The second feature vector may include following at least one:
The associated feature vector of participle and the second query word String matching of second inquiry word string and the second inquiry word string The associated feature vector of the network information.
In a preferred example of an embodiment of the present invention, the associated feature vector of participle with the first inquiry word string May include following at least one:
The synonymous word string of first inquiry word string, the participle of the first inquiry word string, the first inquiry word string participle part of speech, the One inquiry word string participle synonym, first inquiry word string participle different degree;
Described and the first query word String matching associated feature vector of the network information may include following at least one:
Title with the network information of the first query word String matching, the webpage with the network information of the first query word String matching Mark is inquired with the history click information of the network information of the first query word String matching, with the first associated other of inquiry word string Word string;
The associated feature vector of participle with the second inquiry word string may include following at least one:
The synonymous word string of second inquiry word string, the participle of the second inquiry word string, the second inquiry word string participle part of speech, the Two inquiry word strings participle synonym, second inquiry word string participle different degree;
Described and the second query word String matching associated feature vector of the network information may include following at least one:
Title with the network information of the second query word String matching, the webpage with the network information of the second query word String matching Mark is inquired with the history click information of the network information of the second query word String matching, with the second associated other of inquiry word string Word string.
In one preferred embodiment of the invention, described search result synthesis module 305 may include following submodule:
Association degree computational submodule closely, for calculating first user and the second user under each query intention Association spend closely;
Community information sorting sub-module, for being carried out according to association degree closely to the community information of the second user Sequence;
Submodule is synthesized, for knot to be searched in the community information synthesis of the second user after the network information and sequence Fruit.
In one preferred embodiment of the invention, association degree computational submodule closely may include following submodule Block:
Weight configures submodule, for the first query intention information described under each query intention and second inquiry The similarity of intent information, and/or, the related information between first user and the second user, and/or, described Two users configure corresponding weight to the historical operation information record of second query intention;
Read group total submodule, for the first query intention information and second inquiry after configuration weight The similarity of intent information, and/or, the related information between first user and the second user, and/or, described Two users carry out read group total to the historical operation information of second query intention, obtain described first under each query intention User spends closely with being associated with for the second user.
In one preferred embodiment of the invention, the related information between first user and the second user can To comprise at least one of the following:
The average connection duration in average connection number, preset time period, the quantity of common friend in preset time period, Dwelling places;
The second user may include following at least one to the historical operation information of second query intention:
The corresponding searching times of second query intention, the corresponding network information of second query intention browsing when The corresponding search continuous days of long, described second query intention.
In one preferred embodiment of the invention, it can have community between first user and the second user Friend relation.
Method to a kind of search provided by the present invention and a kind of device of search above, are described in detail, this Apply that a specific example illustrates the principle and implementation of the invention in text, the explanation of above example is only intended to It facilitates the understanding of the method and its core concept of the invention;For those of ordinary skill in the art, according to the thought of the present invention, In There will be changes in specific embodiment and application range, in conclusion the content of the present specification should not be construed as to this hair Bright limitation.

Claims (22)

1. a kind of method of search characterized by comprising
When receiving the original query word string of the first user submission, is scanned for, matched with the original query word string The network information;
According to the network information judge the original query word string whether be more query intentions inquiry word string;If so, pressing The original query word string is rewritten as multiple the first inquiry word strings with the query intention respectively according to each query intention;
Searching respectively according to the first inquiry word string has the of same or similar query intention with the first inquiry word string Two users;Wherein, the second user has community information;
The network information and the corresponding community information of the second user are synthesized into search result;
According in described search result second user and the second user corresponding community information construction and second user into The entrance object of the communication software of row communication, the first user directly carry out Instant Messenger with second user by the entrance object News.
2. judging whether the original query word string is inquire more the method according to claim 1, wherein described The step of inquiry word string of intention includes:
Obtain the matched fisrt feature network information of the original query word string;The fisrt feature network information includes sorting most The preceding M network information of the high preceding N network information and/or history number of clicks at most;
Obtain the second feature network information of other query word String matchings;The second feature network information includes that sequence is highest The preceding B network information of the preceding A network information and/or history number of clicks at most;
Judge in the fisrt feature network information whether to include at least two second feature network informations;If so, determining institute State the inquiry word string that original query word string is more query intentions;Wherein, M, N, A, B are positive integer.
3. judging whether the original query word string is inquire more the method according to claim 1, wherein described The step of inquiry word string of intention includes:
The corresponding entity class of the original query word string is searched in set knowledge base;
When the entity class is more than two, determine that the original query word string is the inquiry word string of more query intentions.
4. judging whether the original query word string is inquire more the method according to claim 1, wherein described The step of inquiry word string of intention includes:
The associated Feature Words of original query word string are searched in set knowledge base;
Judge in the webpage of the whole network, whether the quantity of the Feature Words is more than preset quantity threshold value;If so, using knowledge base Entity class classify to the Feature Words;
When acquisition at least two is classified, determine that the original query word string is the inquiry word string of more query intentions.
5. method according to claim 1 or 2 or 3 or 4, which is characterized in that the lookup and the first inquiry word string The step of having the second user of same or similar query intention includes:
Each of described first user first is obtained respectively inquires the corresponding first query intention information of word string and described second Each of user second inquires the corresponding second query intention information of word string;
Calculate separately the similarity of the first query intention information Yu the second query intention information;
When the similarity is greater than preset similarity threshold, judge that the first inquiry word string has with the second user The same or similar query intention.
6. according to the method described in claim 5, it is characterized in that, the first query intention information include fisrt feature to Amount, the first eigenvector are determined according to the first inquiry word string;
The second query intention information includes second feature vector, and the second feature vector is according to the second inquiry word string It is determined;
Wherein, the second inquiry word string is the inquiry word string that the second user is formerly submitted.
7. according to the method described in claim 6, it is characterized in that, the first eigenvector comprises at least one of the following:
First inquiry word string, the network with the participle associated feature vector and the first query word String matching of the first inquiry word string The feature vector of information association;
The second feature vector comprises at least one of the following:
Second inquiry word string, the network with the participle associated feature vector and the second query word String matching of the second inquiry word string The feature vector of information association.
8. according to the method described in claim 5, it is characterized in that, described that the network information and the second user is corresponding Community information synthesize search result the step of include:
First user under each query intention is calculated to spend closely with being associated with for the second user;
The corresponding community information of the second user is ranked up according to association degree closely;
Network information community information corresponding with the second user after sequence is synthesized into search result.
9. according to the method described in claim 8, it is characterized in that, it is described calculate under each query intention first user with The step of association of the second user is spent closely include:
To the similarity of the first query intention information and the second query intention information described under each query intention, and/or, Related information between first user and the second user, and/or, the second user is to second query intention Historical operation information record configure corresponding weight;
To the similarity of the first query intention information and the second query intention information after configuration weight, and/or, Related information between first user and the second user, and/or, the second user is to second query intention Historical operation information carry out read group total, obtain being associated with for first user and the second user under each query intention Degree closely.
10. according to the method described in claim 9, it is characterized in that, pass between first user and the second user Connection information comprises at least one of the following:
The average connection duration in average connection number, preset time period, the quantity of common friend, inhabitation in preset time period Position;
The second user comprises at least one of the following the historical operation information of second query intention:
The corresponding searching times of second query intention, the corresponding network information of second query intention browsing duration, The corresponding search continuous days of second query intention.
11. method described according to claim 1 or 2 or 3 or 4 or 6 or 7 or 9 or 10, which is characterized in that first user There is community's friend relation between the second user.
12. a kind of device of search characterized by comprising
Network information search module, for receive the first user submission original query word string when, with the original query Word string scans for, and obtains the matched network information;
More query intention judgment modules, for judging whether the original query word string is more inquiry meanings according to the network information The inquiry word string of figure;If so, calling query word falsification writing module;
Query word falsification writing module, for being rewritten as multiple having respectively by the original query word string according to each query intention First inquiry word string of the query intention;
User's searching module, for searching respectively with the first inquiry word string according to the first inquiry word string with identical or The second user of similar query intention;Wherein, the second user has community information;
Search result synthesis module, for tying the network information and the corresponding community information synthesis search of the second user Fruit;
Entrance object formation module, for according in described search result second user and the corresponding community of the second user The entrance object for the communication software that information structuring and second user are communicated, the first user is by the entrance object, directly Instant messaging is carried out with second user.
13. device according to claim 12, which is characterized in that more query intention judgment modules include:
Fisrt feature network information acquisition submodule, for obtaining the matched fisrt feature network letter of the original query word string Breath;The fisrt feature network information includes the preceding M of the highest preceding N network information of sequence and/or history number of clicks at most The network information;
Second feature network information acquisition submodule, for obtaining the second feature network information of other query word String matchings;Institute Stating the second feature network information includes the preceding B network of the highest preceding A network information of sequence and/or history number of clicks at most Information;
Feature Network Information judging submodule, for judging in the fisrt feature network information whether to include at least two second Feature Network Information;If so, calling the first decision sub-module;
First decision sub-module, for determining that the original query word string is the inquiry word string of more query intentions;Wherein, M, N, A, B is positive integer.
14. device according to claim 12, which is characterized in that more query intention judgment modules include:
Entity class searches submodule, for searching the corresponding entity class of the original query word string in set knowledge base Not;
Second decision sub-module, for when the entity class is more than two, determining that the original query word string is more inquiries The inquiry word string of intention.
15. device according to claim 12, which is characterized in that more query intention judgment modules include:
Feature Words search submodule, for searching the associated Feature Words of original query word string in set knowledge base;
Quantity judging submodule, for judging in the webpage of the whole network, whether the quantity of the Feature Words is more than preset quantity threshold Value;If so, calling classification submodule;
Classification submodule, for being classified using the entity class of knowledge base to the Feature Words;
Third decision sub-module, for when acquisition at least two is classified, determining that the original query word string is more query intentions Inquiry word string.
16. device described in 2 or 13 or 14 or 15 according to claim 1, which is characterized in that user's searching module includes:
Query intention acquisition of information submodule inquires word string corresponding for obtaining each of described first user first respectively Each of one query intention information and the second user second inquires the corresponding second query intention information of word string;
Query intention information similarity calculation submodule is looked into for calculating separately the first query intention information with described second Ask the similarity of intent information;
Judging submodule, for the similarity be greater than preset similarity threshold when, judge it is described first inquiry word string and The second user has the same or similar query intention.
17. device according to claim 16, which is characterized in that the first query intention information include fisrt feature to Amount, the first eigenvector are determined according to the first inquiry word string;
The second query intention information includes second feature vector, and the second feature vector is according to the second inquiry word string It is determined;
Wherein, the second inquiry word string is the inquiry word string that the second user is formerly submitted.
18. device according to claim 17, which is characterized in that the first eigenvector comprises at least one of the following:
First inquiry word string, the network with the participle associated feature vector and the first query word String matching of the first inquiry word string The feature vector of information association;
The second feature vector comprises at least one of the following:
Second inquiry word string, the network with the participle associated feature vector and the second query word String matching of the second inquiry word string The feature vector of information association.
19. device according to claim 16, which is characterized in that described search result synthesis module includes:
Association degree computational submodule closely, for calculating the pass of first user and the second user under each query intention Connection degree closely;
Community information sorting sub-module, for arranging according to association degree closely the community information of the second user Sequence;
Submodule is synthesized, for the community information of the second user after the network information and sequence to be synthesized search result.
20. device according to claim 19, which is characterized in that the association spends computational submodule closely and includes:
Weight configures submodule, for the first query intention information described under each query intention and second query intention The similarity of information, and/or, the related information between first user and the second user, and/or, described second uses Family configures corresponding weight to the historical operation information record of second query intention;
Read group total submodule, for the first query intention information and second query intention after configuration weight The similarity of information, and/or, the related information between first user and the second user, and/or, described second uses Family carries out read group total to the historical operation information of second query intention, obtains first user under each query intention It is spent closely with being associated with for the second user.
21. device according to claim 20, which is characterized in that the pass between first user and the second user Connection information comprises at least one of the following:
The average connection duration in average connection number, preset time period, the quantity of common friend, inhabitation in preset time period Position;
The second user may include following at least one to the historical operation information of second query intention:
The corresponding searching times of second query intention, the corresponding network information of second query intention browsing duration, The corresponding search continuous days of second query intention.
22. device described in 2 or 13 or 14 or 15 or 17 or 18 or 20 or 21 according to claim 1, which is characterized in that described There is community's friend relation between one user and the second user.
CN201410262143.XA 2014-06-12 2014-06-12 A kind of method and apparatus of search Active CN105159898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410262143.XA CN105159898B (en) 2014-06-12 2014-06-12 A kind of method and apparatus of search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410262143.XA CN105159898B (en) 2014-06-12 2014-06-12 A kind of method and apparatus of search

Publications (2)

Publication Number Publication Date
CN105159898A CN105159898A (en) 2015-12-16
CN105159898B true CN105159898B (en) 2019-11-26

Family

ID=54800755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410262143.XA Active CN105159898B (en) 2014-06-12 2014-06-12 A kind of method and apparatus of search

Country Status (1)

Country Link
CN (1) CN105159898B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951422B (en) * 2016-01-07 2021-05-28 腾讯科技(深圳)有限公司 Webpage training method and device, and search intention identification method and device
CN106021516A (en) * 2016-05-24 2016-10-12 百度在线网络技术(北京)有限公司 Search method and device
CN106971004B (en) * 2017-04-26 2021-04-06 百度在线网络技术(北京)有限公司 Search result providing method and device
CN108182290B (en) * 2018-01-30 2022-03-25 深圳市富途网络科技有限公司 Estimation method for community content hot sequencing
CN109543026A (en) * 2018-12-12 2019-03-29 广东小天才科技有限公司 Analytic content acquisition method of mathematical formula and family education equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136869A (en) * 2006-08-30 2008-03-05 高鹏 Method for generating search intention based contacts group of instant communication system
CN102016845A (en) * 2008-04-29 2011-04-13 微软公司 Social network powered query refinement and recommendations
CN102402589A (en) * 2011-10-26 2012-04-04 北京百度网讯科技有限公司 Method and equipment for providing reference research information related to research request
CN102456054A (en) * 2010-10-28 2012-05-16 腾讯科技(深圳)有限公司 Searching method and system
CN103942198A (en) * 2013-01-18 2014-07-23 佳能株式会社 Method and device for mining intentions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136869A (en) * 2006-08-30 2008-03-05 高鹏 Method for generating search intention based contacts group of instant communication system
CN102016845A (en) * 2008-04-29 2011-04-13 微软公司 Social network powered query refinement and recommendations
CN102456054A (en) * 2010-10-28 2012-05-16 腾讯科技(深圳)有限公司 Searching method and system
CN102402589A (en) * 2011-10-26 2012-04-04 北京百度网讯科技有限公司 Method and equipment for providing reference research information related to research request
CN103942198A (en) * 2013-01-18 2014-07-23 佳能株式会社 Method and device for mining intentions

Also Published As

Publication number Publication date
CN105159898A (en) 2015-12-16

Similar Documents

Publication Publication Date Title
Feng et al. An expert recommendation algorithm based on Pearson correlation coefficient and FP-growth
US11663254B2 (en) System and engine for seeded clustering of news events
CN105045875B (en) Personalized search and device
CN110147437A (en) A kind of searching method and device of knowledge based map
Tran et al. Hashtag recommendation approach based on content and user characteristics
CN107256267A (en) Querying method and device
CN105159898B (en) A kind of method and apparatus of search
CN107729336A (en) Data processing method, equipment and system
CN110390094B (en) Method, electronic device and computer program product for classifying documents
CN108664515B (en) A kind of searching method and device, electronic equipment
CN105786810B (en) The method for building up and device of classification mapping relations
CN104881447A (en) Searching method and device
US20130332440A1 (en) Refinements in Document Analysis
KR100557874B1 (en) Method of scientific information analysis and media that can record computer program thereof
CN110175289B (en) Mixed recommendation method based on cosine similarity collaborative filtering
Shi et al. [Retracted] Research on Fast Recommendation Algorithm of Library Personalized Information Based on Density Clustering
CN111753151A (en) Service recommendation method based on internet user behaviors
CN115329078B (en) Text data processing method, device, equipment and storage medium
CN116226533A (en) News associated recommendation method, device and medium based on association prediction model
Shi et al. A hybrid approach for automatic mashup tag recommendation
CN108460131A (en) A kind of tag along sort processing method and processing device
CN112052402B (en) Information recommendation method and device, electronic equipment and storage medium
Bakariya et al. Pattern mining approach for social network services
CN105159899B (en) Searching method and device
KR102041915B1 (en) Database module using artificial intelligence, economic data providing system and method using the same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant