CN103020164A - Semantic search method based on multi-semantic analysis and personalized sequencing - Google Patents
Semantic search method based on multi-semantic analysis and personalized sequencing Download PDFInfo
- Publication number
- CN103020164A CN103020164A CN201210488572XA CN201210488572A CN103020164A CN 103020164 A CN103020164 A CN 103020164A CN 201210488572X A CN201210488572X A CN 201210488572XA CN 201210488572 A CN201210488572 A CN 201210488572A CN 103020164 A CN103020164 A CN 103020164A
- Authority
- CN
- China
- Prior art keywords
- document
- vector
- user
- word
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a semantic search method based on multi-semantic analysis and personalized sequencing, and belongs to the field of information search. The semantic search method adopts the technical scheme comprising the following steps: firstly, by a crawler technology and other technologies, acquiring webpage documents from the Internet, classifying the webpage documents by using a support vector machine, establishing a word vector library by a multi-semantic analysis method, and writing multi-classification results into an index to form an index library; secondly, based on the word vector library, forming search keywords input by a user into a query vector, performing class matching query with the index library to obtain an initial sequencing result; and finally, according to personalized information and history access information of the user, optimizing the initial sequencing result, and returning the optimized result to the user. By the semantic search method based on the multi-semantic analysis and the personalized sequencing, the word vector library and the index library with rich semantemes are formed; and through the personalized information and the history access information, a search result can meet a search demand of the user better and search satisfaction of the user can be improved.
Description
Technical field
The invention belongs to information retrieval field, relate in particular to a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering.
Background technology
Search engine is the certain strategy of a basis, use specific computer program to gather information and information organized and process from the internet after, for the user provides retrieval service and with the relative information displaying of the user search system to the user.In order to tackle the rapid growth of the information capacity on the internet, search engine arises at the historic moment.Even to this day, it has become the requisite approach of people's obtaining information from the network.But, current main flow based on the search engine of key word such as Google, Baidu, Bing, Yahoo etc., some thorny problems of ubiquity.Result such as user search understands a large amount of incoherent links of ubiquity; Because user crowd's diversity, single result can not satisfy each user's special requirement targetedly; Search procedure is not considered the semantic relevancy between the word, and Search Results do not organize by certain mode effectively, and the user must not be wasted time and energy and be browsed and select.
Semantic search is a kind of novel way of search that is different from based on keyword search.In general, the work of semantic search no longer sticks to the key word of user institute input request statement itself, and can capture comparatively exactly the potential intention of user institute read statement, thereby can return the result who meets its demand most to the user more accurately, compare traditional search and have higher retrieval precision and original advantage.Ramesh Singh and Myungjin Lee attempt Search Results is reorganized in its research, improve user's search experience.Lien-Fu Lai and Huanhuan Cao utilize concealed Markov tree or other models to calculate the degree of correlation that concerns between Different Results, thereby increase the face of containing of Search Results.FangLiu and Jaime Teevan etc. have proposed the method that the historical visit information of the various users of utilization carries out personalized search, in order to improve the precision of search.Suitable improvement has all been carried out in above-mentioned these researchs aspect semantic search, but these researchs can carry out the Extraordinary condition relatively harsher, and the increase of time loss control are bad based on the user being inquired about in the personalization of classification; Secondly, do not consider from user-dependent different information to have different weights in the process.Therefore, the ordering processing mode to final Search Results is still unsatisfactory.
Summary of the invention
In the problem that exists aspect retrieval precision and the user search experience, the present invention proposes a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering for the existing information retrieval.
A kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering is characterized in that, specifically may further comprise the steps:
Step 1: a part of utilizing crawler technology to obtain web document from the internet is carried out manual sort as training pattern, in conjunction with multi-semantic meaning analytical approach MSA structure term vector storehouse, with the web document vector representation, and training pattern is put in the support vector machines sorter document vector is trained, new webpage utilizes this model to classify by SVM; The classification information of all webpages is write in the index database as an attribute;
Step 2: based on the term vector storehouse that step 1 forms, the search key structure term vector separately with user's input forms final query vector, and query vector and index database is carried out the classification matching inquiry, obtains initial web search result.
Step 3: personal customization information and historical visit information according to the user are optimized ordering to the initial retrieval result, and final result for retrieval is returned to the user.
In the step 1, construct the term vector storehouse based on multi-semantic meaning analytical approach MSA, and the classification results of web document is write in the index, form the process of index database; Specifically comprise following step:
Step 11: structuring concept space; Setting space of the present invention is the m dimension.
The basic dimension of concept space is the set of some class labels, the information that can represent whole corpus, general m the class label that directly extracts from the corpus tag along sort consists of m dimension of vector, then the semantic information of each word is described by a m dimensional vector in the web document, is called term vector;
Step 12: the determining of term vector component value:
Word is to extract from the web document of training pattern, and the size of each component value of term vector is decided by all documents of training pattern, and each component value computing formula of term vector is:
Wherein, t
jRepresent j word in the term vector storehouse, w (c
i, t
j) represent word t
jWith i dimension c in the equivalent vector
iRelation, namely be word t
jEquivalent is to measuring i component value; | D| is the quantity of training document; Tf (d
k, t
j) refer to word t
jAt document d
kThe frequency of middle appearance; H (c
i, d
k) be a discriminant function: if document d
kBelong to dimension c
iDescribed field, then H (c
i, d
k) value is 1, otherwise is 0; Length (d
k) be document d
kLength, i.e. document d
kThrough the number of the word that obtains after the participle denoising, when some words repeatedly occur in document, then repeat count, i.e. length (d
k) 〉=n; K is the quantity of document.
Step 13: the formation in term vector unitization processing and term vector storehouse.
With term vector unitization processing, make its component value scope be [0,1], thereby have better versatility.Term vector behind a plurality of units just forms the term vector storehouse; The computing formula of term vector unitization is:
Wherein, the term vector behind the unit is designated as
W ' (c
i, t
j) be
I component value, then the term vector storehouse is:
Step 14: obtain the weights of each word in the document and these weights are carried out the unit processing by the TFIDF method.TFIDF weights method be popular for many years and be proved to be one of effective weights method, it does not consider the classification situation to the determining only to depend on the overall condition of corpus of weights, therefore have very strong versatility, the weights that can be applied to the word of many classifying texts in representing are determined.The TFIDF weights determine that the computing formula of method is:
Wherein, t
gBe document d
kG participle, weight (t
g, d
k) represent word t
gAt document d
kIn shared weights, the set of D representative training document, d
kRepresent k document.| D| is the quantity of training document; The D' representative contains word t
gCollection of document, | D ' | be the quantity of the middle document of set D '.
In like manner unit processing, so that the weights span of word is [0,1] behind the document participle, the computing formula of the weights of word is behind the document participle:
Wherein, weight'(t
g, d
k) be word t behind the unit
gAt document d
kIn shared weights, n is the participle kind sum of document.
Step 15: the document vector forms.After adopting the TFIDF method to represent weights, the document vector of multi-semantic meaning analysis (MSA) has just formed, document d
kCorresponding document vector
In the computing formula of i component value be:
Document d
kThe document vector form be designated as:
This document vector, each component value has directly represented the degree of correlation of this document with corresponding dimension (classification), has very strong Semantic, is the basis of matching inquiry.Afterwards by the m that a pre-defines class label, use the support vector machine technology document vector to be classified and as the criteria for classification of new webpage, and the classification of all webpages is write in the index database as an attribute.
In the step 2, described query vector and index database carry out classification matching inquiry step and comprise:
Step 21: based on the term vector storehouse, with the searching key word vector representation of user's input.
Note searched key set of words is: KEY={key
1, key
2..., key
n, the term vector of corresponding each word of extraction makes up each word key from the term vector storehouse that has established
iVector form
Then all keywords can form the query vector set
Wherein, in the term vector storehouse, there is not key
iThe time,
Step 22: on the basis of step 21, form the query vector of searching key word in the m gt: the query vector formula is:
Amount of orientation
First three component of component value maximum be designated as: α
p, α
q, α
r, the dimension classification of their correspondences is designated as: c
P, c
q, c
r, the weight vector of classification is designated as:
This weight vector can used in the user profile coupling afterwards.Based on this three kind { c
P, c
q, c
rIn index database, carry out match query, and filter out the webpage that belongs to these three classifications, adopt Lucene basis sort algorithm, obtain initial ranking results.
Described Lucene basic score algorithmic formula is:
Wherein, q is the demand of retrieval;
Tf (t, d) represents the word frequency that entry t occurs in document d;
Idf (t) represents entry t and arrange word frequency in document;
T.getBoost (): the weight of each word in the query statement, can in inquiry, set certain word more important;
Norm (t, d): normalization factor, it comprises three parameters: (1) Document boost: this value is larger, illustrates that this document is more important.(2) Field boost: this territory is larger, illustrates that this territory is more important.(3) lengthNorm (feld): the Term sum that comprises in territory is more, also is that document is longer, and this value is less, and document is shorter, and this value is larger;
Coord (q, d) represents coordinating factor, and its calculating is based on the entry quantity of all Gong inquiries that comprise among the document d;
QueryNorm (q) representative the variance that provides each query entries and after, calculate the standardized value of certain inquiry.
In the step 3, according to user's personal customization information initial ranking results is optimized processing and specifically may further comprise the steps;
Step 301: collect three kinds of the highest personal customization information of user query frequency: the first customized information u, the second customized information v and the 3rd customized information s, and determine that the weights of these three kinds of personal customization information are A, B and E;
Step 302: the Query coupling when user customized information is determined; At this moment, because the classification of every personal information of user is all definite, therefore, the Lucene basis score of document in the initial ranking results is made amendment:
If I. u, v and s are 0, then this document scores is constant;
If II. there is one not to be 0 among u, v and the s, then:
Wherein,
U=1 represents this webpage classification and conforms to the first customized information, and 0 representative is not inconsistent;
V=1 represents this webpage classification and conforms to user's the second customized information, and 0 representative is not inconsistent;
S=1 represents this webpage classification and conforms to user's the 3rd customized information, and 0 representative is not inconsistent;
Topscore is the top score in the result document, and lastscore is minimum score;
Step 303: the Query coupling when user customized information is fuzzy:
The personal customization information of inputting as the user does not belong to given default category scope, and the personal customization information of input is searched corresponding classification in the term vector storehouse, obtains corresponding new term vector; Set in the present invention term vector collection corresponding to user's the first customized information
Corresponding weight vector is
Corresponding classification is c
1, c
2, c
3The term vector collection that user's the second customized information is corresponding is
Corresponding weight vector is
Corresponding classification is c
4, c
5, c
6New classification set is designated as: C={c
1, c
2, c
3∪ { c
4, c
5, c
6; For each web document, if document d
kBelong to classification c
i, and c
i∈ C, then the document score of this webpage becomes:
Wherein, wu
iAnd wv
iThe corresponding c of weight vector
iIf the value of that dimension of class is the not corresponding c of weight vector
iClass, this is 0 years old.
In the step 3, according to the historical visit information of user initial ranking results is optimized processing; The historical visit information the matching analysis of described user is according to the optimization to initial ranking results of user's history access record.Owing to repeatedly in the webpage search afterwards of accessed mistake very important effect being arranged in the access history, and the maximum page of all user selections has very large directiveness to the search tendency of unique user, so, this method utilizes user's historical visit information that initial ranking results is optimized, and promotes the page rank high with user's degree of correlation.The present invention proposes the excellent row's algorithm of following webpage and may further comprise the steps:
Step 311: then carry out following algorithm if document d is history or hot link hotlink, otherwise skip this step;
Step 312: establishing debut ranking is r, and then the new rank of d is:
Wherein,
S ': be that historical record is 1, no is 0;
H: be that hot link hotlink is 1, no is 0;
n
1: the user is to this historical number of clicks;
n
2: the number of clicks of hot link hotlink.
As can be known, the minimum value of r' is 0.
The present invention at first is optimized existing algorithm, adds the multi-semantic meaning analysis, proposes semantic information more abundant term vector storehouse and index database.Then based on the term vector storehouse search key that the user inputs is carried out semantic analysis, carry out the Query coupling with index database, form initial ranking results.In conjunction with individual subscriber customized information and historical visit information, utilize semantic analysis that initial ranking results is optimized at last, thereby more met the result for retrieval of user's tendency, improve user's retrieval and experience.
Description of drawings
Fig. 1 is the algorithm flow chart of a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering provided by the invention;
Fig. 2 is the contrast distribution figure of three kinds of search methods (LB, YH and OURS) retrieval precision of providing of the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit the scope of the invention and to use.
Below describe detailed process of the present invention in detail by a specific embodiment:
Step 1: corpus is prepared
Utilize crawler technology to obtain webpage from the internet.Climb to approximately 6000 up-to-date webpages from main websites such as Sina website (sina.com), Zhong Guan-cun online (zol.com), select a part as training set, utilize the SVM processing of classifying.According to source and the actual conditions of these webpages, adopt direct derivation mode, finally determine 7 class label: sport, agriculture, automobile, IT, food, lady and finance, normal, training pattern namely thus 7 class labels portray.Wherein normal is not for belonging to other classifications of the first six class label.Utilize this training pattern to the test set processing of classifying.This method is if put into commercial the use, then can be according to ODP(Open Directory Project) taxonomical hierarchy set each other class label of level because the search engine in the reality has huge widely web page source.
Step 2: the selection of relevant control methods
Select two representative searching method: Lucene and YD (Yahoo Directory) to contrast the size of the retrieval precision of this method in the present embodiment.
Test search effect of the present invention 2.1:Lucene search for the index that these webpages set up by Lucene Searcher as first contrast test.
2.2:Yahoo Directory is an online English website split catalog search, the Search Results on it all posts class label.Simultaneously, the training pattern that these webpages are set up can be used to the webpage of classifying and climbing down from Yahoo Directory, as second contrast test.
2.3: in order to satisfy most of user, the result that returns of each key word for test, lower front 30 result of its rank of this method crawl, classify by the training pattern of having set up, and in search, reorganize according to the present invention, search effect of the present invention is tested in test as a comparison.
Lucene and Yahoo Directory are the methods of present industry Information Organization, processing and the retrieval comparatively paid close attention to, so the present invention selects to carry out the contrast of index of correlation with these two kinds of methods.
Step 3: Experimental Comparison target setting
Statistics shows, the page that arrives for search engine retrieving has only been checked front results page more than 100 less than 0.1% user, and the user more than 80% has only browsed front 30 results page.Because the present invention has certain screening, in order to make the user more selection space is arranged, in the contrast of Lucene search, front 200 webpages that this method is chosen initial search result are optimized Ordination and processing.
The present embodiment is randomly drawed 7 users and is experienced.In order to weigh retrieval effectiveness, set the standard of an assessment: accuracy rate R.For each inquiry, get 10 documents of Search Results, the accuracy rate R of each inquiry is defined as:
D wherein
rQuantity for the document relevant with searching keyword.Repeatedly averaging after the inquiry, is exactly the retrieval precision of the method.
Step 4: experimental result and correlation analysis
The search accuracy rate of three kinds of methods and search precision contrast are as shown in Figure 2.Wherein:
Lucene:Lucene?Searcher
YD:Yahoo?Directory
OURS: method proposed by the invention
As can be seen from the figure, the web document proportion that meets the user search tendency that the present invention returns all is better than front two kinds of methods, accuracy rate and retrieval precision will significantly be better than the search method of Lucene, though Yahoo Directory is more approaching with retrieval precision of the present invention, but weaker.The multi-semantic meaning analysis that this explanation the present invention proposes and the semantic retrieving method of Optimal scheduling can promote retrieval accurately and precision, in present search engine, has obvious superiority, the search tendency that can more be close to the users satisfies user's retrieval and experiences.
The above; only for the better embodiment of the present invention, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.
Claims (6)
1. the semantic retrieving method based on multi-semantic meaning analysis and personalized ordering is characterized in that, specifically may further comprise the steps:
Step 1: a part of utilizing crawler technology to obtain web document from the internet is carried out manual sort as training pattern, in conjunction with multi-semantic meaning analytical approach MSA structure term vector storehouse, with the web document vector representation, and training pattern is put in the support vector machines sorter document vector is trained, new webpage utilizes this model to classify by SVM; The classification information of all webpages is write in the index database as an attribute;
Step 2: based on the term vector storehouse that step 1 forms, the search key structure term vector separately with user's input forms final query vector, and query vector and index database is carried out the classification matching inquiry, obtains initial web search result;
Step 3: personal customization information and historical visit information according to the user are optimized ordering to the initial retrieval result, and final result for retrieval is returned to the user.
2. a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering according to claim 1, it is characterized in that, in the described step 1, based on multi-semantic meaning analytical approach MSA structure term vector storehouse, and the classification results of web document write in the index, form the process of index database; Specifically comprise step:
Step 11: structuring concept space; Setting space of the present invention is the m dimension;
The basic dimension of concept space is the set of some class labels, the information that can represent whole corpus, general m the class label that directly extracts from the corpus tag along sort consists of m dimension of vector, then the semantic information of each word is described by a m dimensional vector in the web document, is called term vector;
Step 12: the determining of term vector component value:
Word is to extract from the web document of training pattern, and the size of each component value of term vector is decided by all documents of training pattern; Each component value computing formula of term vector is:
Wherein, t
jRepresent j word in the term vector storehouse, w (c
i, t
j) represent word t
jWith i dimension c in the equivalent vector
iRelation, namely be word t
jEquivalent is to measuring i component value; | D| is the quantity of training document; Tf (d
k, t
j) refer to word t
jAt document d
kThe frequency of middle appearance; H (c
i, d
k) be a discriminant function: if document d
kBelong to dimension c
iDescribed field, then H (c
i, d
k) value is 1, otherwise is 0; Length (d
k) be document d
kLength, i.e. document d
kThrough the number of the word that obtains after the participle denoising, when some words repeatedly occur in document, then repeat count, i.e. length (d
k) 〉=n; K is the quantity of document;
Step 13: the formation in term vector unitization processing and term vector storehouse:
With term vector unitization processing, make its component value scope be [0,1], thereby have better versatility; Term vector behind a plurality of units just forms the term vector storehouse; The computing formula of term vector unitization is:
Wherein, the term vector behind the unit is designated as
W ' (c
i, t
j) be
I component value, then the term vector storehouse is:
Step 14: obtain the weights of each word in the document and these weights are carried out the unit processing by the TFIDF method; TFIDF weights method be popular for many years and be proved to be one of effective weights method, it does not consider the classification situation to the determining only to depend on the overall condition of corpus of weights, therefore have very strong versatility, the weights that can be applied to the word of many classifying texts in representing are determined; The TFIDF weights determine that the computing formula of method is:
Wherein, t
gBe document d
kG participle, weight (t
g, d
k) represent word t
gAt document d
kIn shared weights, the set of D representative training document, d
kRepresent k document; | D| is the quantity of training document; The D' representative contains word t
gCollection of document, | D ' | be the quantity of the middle document of set D ';
In like manner unit processing, so that the weights span of word is [0,1] behind the document participle, the computing formula of the weights of word is behind the document participle:
Wherein, weight'(t
g, d
k) be word t behind the unit
gAt document d
kIn shared weights, n is the participle kind sum of document;
Step 15: the document vector forms; After adopting the TFIDF method to represent weights, the document vector of multi-semantic meaning analysis (MSA) has just formed, document d
kCorresponding document vector
In the computing formula of i component value be:
Document d
kThe document vector form be designated as:
Wherein, n is the participle kind sum of document,
Be t
gVector form in the term vector storehouse;
This document vector, each component value has directly represented the degree of correlation of this document with corresponding dimension (classification), has very strong Semantic, is the basis of matching inquiry; Afterwards by the m that a pre-defines class label, use the support vector machine technology document vector to be classified and as the criteria for classification of new webpage, and the classification of all webpages is write in the index database as an attribute.
3. a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering according to claim 1 is characterized in that, in the described step 2, the matching analysis of query vector and index database comprises substep:
Step 21: based on the term vector storehouse, with the searching key word vector representation of user's input;
Note searched key set of words is: KEY={key
1, key
2..., key
n, the term vector of corresponding each word of extraction makes up each word key from the term vector storehouse that has established
iVector form
Then all keywords can form the query vector set
Wherein, in the term vector storehouse, there is not key
iThe time,
Step 22: on the basis of step 21, form the query vector of searching key word in the m gt: the query vector formula is:
Amount of orientation
First three component of component value maximum be designated as: α
p, α
q, α
r, the dimension classification of their correspondences is designated as: c
P, c
q, c
r, the weight vector of classification is designated as:
This weight vector can used in the user profile coupling afterwards; Based on this three kind { c
P, c
q, c
rIn index database, carry out match query, and filter out the webpage that belongs to these three classifications, adopt Lucene basis sort algorithm, obtain initial ranking results.
4. a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering according to claim 3 is characterized in that, described Lucene basis sort algorithm such as formula are:
Wherein, q is the demand of retrieval;
Tf (t, d) represents the word frequency that entry t occurs in document d;
Idf (t) represents entry t and arrange word frequency in document;
T.getBoost (): the weight of each word in the query statement;
Norm (t, d): normalization factor;
Coord (q, d) represents coordinating factor;
QueryNorm (q) representative the variance that provides each query entries and after, the standardized value of inquiry.
5. a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering according to claim 1 is characterized in that, according to user's personal customization information initial ranking results is optimized processing and specifically may further comprise the steps:
Step 301: collect three kinds of the highest personal customization information of user query frequency: the first customized information u, the second customized information v and the 3rd customized information s, and determine that the weights of these three kinds of personal customization information are A, B and E;
Step 302: the Query coupling when user customized information is determined; At this moment, because the classification of every personal information of user is all definite, therefore, the Lucene basis score of document in the initial ranking results is made amendment:
If I. u, v and s are 0, then this document scores is constant;
If II. there is one not to be 0 among u, v and the s, then:
Wherein,
U=1 represents this webpage classification and conforms to the first customized information, and 0 representative is not inconsistent;
V=1 represents this webpage classification and conforms to user's the second customized information, and 0 representative is not inconsistent;
S=1 represents this webpage classification and conforms to user's the 3rd customized information, and 0 representative is not inconsistent;
Topscore is the top score in the result document, and lastscore is minimum score;
Step 303: the Query coupling when user customized information is fuzzy:
The personal customization information of inputting as the user does not belong to given default category scope, and the personal customization information of input is searched corresponding classification in the term vector storehouse, obtains corresponding new term vector; Set in the present invention term vector collection corresponding to user's the first customized information
Corresponding weight vector is
Corresponding classification is c
1, c
2, c
3The term vector collection that user's the second customized information is corresponding is
Corresponding weight vector is
Corresponding classification is c
4, c
5, c
6New classification set is designated as: C={c
1, c
2, c
3∪ { c
4, c
5, c
6; For each web document, if document d
kBelong to classification c
i, and c
i∈ C, then the document score of this webpage becomes:
Wherein, wu
iAnd wv
iThe corresponding c of weight vector
iIf the value of that dimension of class is the not corresponding c of weight vector
iClass, this is 0 years old.
6. a kind of semantic retrieving method based on multi-semantic meaning analysis and personalized ordering according to claim 1 is characterized in that, according to the historical visit information of user initial ranking results is optimized processing procedure and specifically may further comprise the steps;
Step 311: then carry out following algorithm if document d is history or hot link hotlink, otherwise skip this step;
Step 312: establishing debut ranking is r, and then the new rank of d is:
Wherein,
S ': be that historical record is 1, no is 0;
H: be that hot link hotlink is 1, no is 0;
n
1: the user is to this historical number of clicks;
n
2: the number of clicks of hot link hotlink;
As can be known, the minimum value of r' is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488572.XA CN103020164B (en) | 2012-11-26 | 2012-11-26 | Semantic search method based on multi-semantic analysis and personalized sequencing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488572.XA CN103020164B (en) | 2012-11-26 | 2012-11-26 | Semantic search method based on multi-semantic analysis and personalized sequencing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103020164A true CN103020164A (en) | 2013-04-03 |
CN103020164B CN103020164B (en) | 2015-06-10 |
Family
ID=47968768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210488572.XA Expired - Fee Related CN103020164B (en) | 2012-11-26 | 2012-11-26 | Semantic search method based on multi-semantic analysis and personalized sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103020164B (en) |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593336A (en) * | 2013-10-30 | 2014-02-19 | 中国运载火箭技术研究院 | Knowledge pushing system and method based on semantic analysis |
CN103646017A (en) * | 2013-12-11 | 2014-03-19 | 南京大学 | Acronym generating system for naming and working method thereof |
CN104008169A (en) * | 2014-05-30 | 2014-08-27 | 中国测绘科学研究院 | Semanteme based geographical label content safe checking method and device |
CN104050240A (en) * | 2014-05-26 | 2014-09-17 | 北京奇虎科技有限公司 | Method and device for determining categorical attribute of search query word |
CN104408036A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Correlated topic recognition method and device |
CN104516897A (en) * | 2013-09-29 | 2015-04-15 | 国际商业机器公司 | Method and device for sorting application objects |
CN105247517A (en) * | 2013-04-23 | 2016-01-13 | 谷歌公司 | Ranking signals in mixed corpora environments |
WO2016082406A1 (en) * | 2014-11-28 | 2016-06-02 | 华为技术有限公司 | Method and apparatus for determining semantic matching degree |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
CN105893397A (en) * | 2015-06-30 | 2016-08-24 | 北京爱奇艺科技有限公司 | Video recommendation method and apparatus |
CN106095983A (en) * | 2016-06-20 | 2016-11-09 | 北京百度网讯科技有限公司 | A kind of similarity based on personalized deep neural network determines method and device |
CN106156071A (en) * | 2015-03-31 | 2016-11-23 | 北京奇虎科技有限公司 | Intranet Intranet searching method, device and server |
CN106528595A (en) * | 2016-09-23 | 2017-03-22 | 中国农业科学院农业信息研究所 | Website homepage content based field information collection and association method |
CN106610972A (en) * | 2015-10-21 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Query rewriting method and apparatus |
CN106844343A (en) * | 2017-01-20 | 2017-06-13 | 上海傲硕信息科技有限公司 | Instruction results screening plant |
CN106910497A (en) * | 2015-12-22 | 2017-06-30 | 阿里巴巴集团控股有限公司 | A kind of Chinese word pronunciation Forecasting Methodology and device |
CN106951422A (en) * | 2016-01-07 | 2017-07-14 | 腾讯科技(深圳)有限公司 | The method and apparatus of webpage training, the method and apparatus of search intention identification |
CN106980634A (en) * | 2016-01-18 | 2017-07-25 | 维布络有限公司 | System and method for classifying and solving software production accident list |
CN107203526A (en) * | 2016-03-16 | 2017-09-26 | 高德信息技术有限公司 | A kind of query string semantic requirement analysis method and device |
CN103744905B (en) * | 2013-12-25 | 2018-03-30 | 新浪网技术(中国)有限公司 | Method for judging rubbish mail and device |
CN108111478A (en) * | 2017-11-07 | 2018-06-01 | 中国互联网络信息中心 | A kind of phishing recognition methods and device based on semantic understanding |
CN108182229A (en) * | 2017-12-27 | 2018-06-19 | 上海科大讯飞信息科技有限公司 | Information interacting method and device |
CN108460067A (en) * | 2017-10-30 | 2018-08-28 | 上海赛图计算机科技股份有限公司 | Tile index structure, index structuring method and data retrieval method based on data |
WO2018157625A1 (en) * | 2017-02-28 | 2018-09-07 | 华为技术有限公司 | Reinforcement learning-based method for learning to rank and server |
CN109189910A (en) * | 2018-09-18 | 2019-01-11 | 哈尔滨工程大学 | A kind of label auto recommending method towards mobile application problem report |
CN109376288A (en) * | 2018-09-28 | 2019-02-22 | 北京北斗方圆电子科技有限公司 | A kind of cloud computing platform and its equalization methods for realizing semantic search |
CN110019888A (en) * | 2017-12-01 | 2019-07-16 | 北京搜狗科技发展有限公司 | A kind of searching method and device |
CN110287288A (en) * | 2019-06-18 | 2019-09-27 | 北京百度网讯科技有限公司 | Recommend the method and apparatus of document |
CN110705307A (en) * | 2019-08-30 | 2020-01-17 | 深圳壹账通智能科技有限公司 | Information change index monitoring method and device, computer equipment and storage medium |
CN110765368A (en) * | 2018-12-29 | 2020-02-07 | 北京嘀嘀无限科技发展有限公司 | Artificial intelligence system and method for semantic retrieval |
CN110895467A (en) * | 2018-09-13 | 2020-03-20 | 深圳市蓝灯鱼智能科技有限公司 | Method and device for updating search model, storage medium and electronic device |
CN111753527A (en) * | 2020-06-29 | 2020-10-09 | 平安科技(深圳)有限公司 | Data analysis method and device based on natural language processing and computer equipment |
CN113269477A (en) * | 2021-07-14 | 2021-08-17 | 北京邮电大学 | Scientific research project query scoring model training method, query method and device |
CN113569128A (en) * | 2020-04-29 | 2021-10-29 | 北京金山云网络技术有限公司 | Data retrieval method and device and electronic equipment |
CN114021019A (en) * | 2021-11-10 | 2022-02-08 | 中国人民大学 | Retrieval method integrating personalized search and search result diversification |
CN114969310A (en) * | 2022-06-07 | 2022-08-30 | 南京云问网络技术有限公司 | Multi-dimensional data-oriented sectional type retrieval and sorting system design method |
CN115168577A (en) * | 2022-06-30 | 2022-10-11 | 北京百度网讯科技有限公司 | Model updating method and device, electronic equipment and storage medium |
WO2023040808A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Webpage retrieval method and related device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862004A (en) * | 2017-10-24 | 2018-03-30 | 科大讯飞股份有限公司 | Intelligent sorting method and device, storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070106659A1 (en) * | 2005-03-18 | 2007-05-10 | Yunshan Lu | Search engine that applies feedback from users to improve search results |
CN101398839A (en) * | 2008-10-23 | 2009-04-01 | 浙江大学 | Personalized push method for vocal web page news |
CN102495872A (en) * | 2011-11-30 | 2012-06-13 | 中国科学技术大学 | Method and device for conducting personalized news recommendation to mobile device users |
-
2012
- 2012-11-26 CN CN201210488572.XA patent/CN103020164B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070106659A1 (en) * | 2005-03-18 | 2007-05-10 | Yunshan Lu | Search engine that applies feedback from users to improve search results |
CN101398839A (en) * | 2008-10-23 | 2009-04-01 | 浙江大学 | Personalized push method for vocal web page news |
CN102495872A (en) * | 2011-11-30 | 2012-06-13 | 中国科学技术大学 | Method and device for conducting personalized news recommendation to mobile device users |
Non-Patent Citations (2)
Title |
---|
YINGLONG MA 等: "Using multi-categorization semantic analysis and personalization for semantic search", 《HTTP://ARXIV.ORG》 * |
马应龙 等: "一种基于多分类语义分析和个性化的语义检索方法", 《东南大学学报(自然科学版)》 * |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105247517A (en) * | 2013-04-23 | 2016-01-13 | 谷歌公司 | Ranking signals in mixed corpora environments |
CN104516897A (en) * | 2013-09-29 | 2015-04-15 | 国际商业机器公司 | Method and device for sorting application objects |
CN104516897B (en) * | 2013-09-29 | 2018-03-02 | 国际商业机器公司 | A kind of method and apparatus being ranked up for application |
CN103593336A (en) * | 2013-10-30 | 2014-02-19 | 中国运载火箭技术研究院 | Knowledge pushing system and method based on semantic analysis |
CN103593336B (en) * | 2013-10-30 | 2017-05-10 | 中国运载火箭技术研究院 | Knowledge pushing system and method based on semantic analysis |
CN103646017B (en) * | 2013-12-11 | 2017-01-04 | 南京大学 | Acronym generating system for naming and working method thereof |
CN103646017A (en) * | 2013-12-11 | 2014-03-19 | 南京大学 | Acronym generating system for naming and working method thereof |
CN103744905B (en) * | 2013-12-25 | 2018-03-30 | 新浪网技术(中国)有限公司 | Method for judging rubbish mail and device |
CN104050240A (en) * | 2014-05-26 | 2014-09-17 | 北京奇虎科技有限公司 | Method and device for determining categorical attribute of search query word |
CN104008169B (en) * | 2014-05-30 | 2017-02-22 | 中国测绘科学研究院 | Semanteme based geographical label content safe checking method and device |
CN104008169A (en) * | 2014-05-30 | 2014-08-27 | 中国测绘科学研究院 | Semanteme based geographical label content safe checking method and device |
US10467342B2 (en) | 2014-11-28 | 2019-11-05 | Huawei Technologies Co., Ltd. | Method and apparatus for determining semantic matching degree |
US11138385B2 (en) | 2014-11-28 | 2021-10-05 | Huawei Technologies Co., Ltd. | Method and apparatus for determining semantic matching degree |
WO2016082406A1 (en) * | 2014-11-28 | 2016-06-02 | 华为技术有限公司 | Method and apparatus for determining semantic matching degree |
CN104408036A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Correlated topic recognition method and device |
CN106156071A (en) * | 2015-03-31 | 2016-11-23 | 北京奇虎科技有限公司 | Intranet Intranet searching method, device and server |
CN105893397A (en) * | 2015-06-30 | 2016-08-24 | 北京爱奇艺科技有限公司 | Video recommendation method and apparatus |
CN105893397B (en) * | 2015-06-30 | 2019-03-15 | 北京爱奇艺科技有限公司 | A kind of video recommendation method and device |
CN106610972A (en) * | 2015-10-21 | 2017-05-03 | 阿里巴巴集团控股有限公司 | Query rewriting method and apparatus |
CN106910497A (en) * | 2015-12-22 | 2017-06-30 | 阿里巴巴集团控股有限公司 | A kind of Chinese word pronunciation Forecasting Methodology and device |
CN106951422A (en) * | 2016-01-07 | 2017-07-14 | 腾讯科技(深圳)有限公司 | The method and apparatus of webpage training, the method and apparatus of search intention identification |
CN106980634A (en) * | 2016-01-18 | 2017-07-25 | 维布络有限公司 | System and method for classifying and solving software production accident list |
CN107203526A (en) * | 2016-03-16 | 2017-09-26 | 高德信息技术有限公司 | A kind of query string semantic requirement analysis method and device |
CN105786782B (en) * | 2016-03-25 | 2018-10-19 | 北京搜狗信息服务有限公司 | A kind of training method and device of term vector |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
CN106095983B (en) * | 2016-06-20 | 2019-11-26 | 北京百度网讯科技有限公司 | A kind of similarity based on personalized deep neural network determines method and device |
CN106095983A (en) * | 2016-06-20 | 2016-11-09 | 北京百度网讯科技有限公司 | A kind of similarity based on personalized deep neural network determines method and device |
CN106528595A (en) * | 2016-09-23 | 2017-03-22 | 中国农业科学院农业信息研究所 | Website homepage content based field information collection and association method |
CN106528595B (en) * | 2016-09-23 | 2019-08-06 | 中国农业科学院农业信息研究所 | Realm information based on website homepage content is collected and correlating method |
CN106844343A (en) * | 2017-01-20 | 2017-06-13 | 上海傲硕信息科技有限公司 | Instruction results screening plant |
CN106844343B (en) * | 2017-01-20 | 2019-11-19 | 上海傲硕信息科技有限公司 | Instruction results screening plant |
WO2018157625A1 (en) * | 2017-02-28 | 2018-09-07 | 华为技术有限公司 | Reinforcement learning-based method for learning to rank and server |
US11500954B2 (en) | 2017-02-28 | 2022-11-15 | Huawei Technologies Co., Ltd. | Learning-to-rank method based on reinforcement learning and server |
CN108460067A (en) * | 2017-10-30 | 2018-08-28 | 上海赛图计算机科技股份有限公司 | Tile index structure, index structuring method and data retrieval method based on data |
CN108460067B (en) * | 2017-10-30 | 2022-08-16 | 上海赛图计算机科技股份有限公司 | Tile index structure based on data, index construction method and data retrieval method |
CN108111478A (en) * | 2017-11-07 | 2018-06-01 | 中国互联网络信息中心 | A kind of phishing recognition methods and device based on semantic understanding |
CN110019888A (en) * | 2017-12-01 | 2019-07-16 | 北京搜狗科技发展有限公司 | A kind of searching method and device |
CN108182229A (en) * | 2017-12-27 | 2018-06-19 | 上海科大讯飞信息科技有限公司 | Information interacting method and device |
CN108182229B (en) * | 2017-12-27 | 2022-10-28 | 上海科大讯飞信息科技有限公司 | Information interaction method and device |
CN110895467A (en) * | 2018-09-13 | 2020-03-20 | 深圳市蓝灯鱼智能科技有限公司 | Method and device for updating search model, storage medium and electronic device |
CN109189910B (en) * | 2018-09-18 | 2019-09-10 | 哈尔滨工程大学 | A kind of label auto recommending method towards mobile application problem report |
CN109189910A (en) * | 2018-09-18 | 2019-01-11 | 哈尔滨工程大学 | A kind of label auto recommending method towards mobile application problem report |
CN109376288B (en) * | 2018-09-28 | 2021-04-23 | 邦道科技有限公司 | Cloud computing platform for realizing semantic search and balancing method thereof |
CN109376288A (en) * | 2018-09-28 | 2019-02-22 | 北京北斗方圆电子科技有限公司 | A kind of cloud computing platform and its equalization methods for realizing semantic search |
CN110765368B (en) * | 2018-12-29 | 2020-10-27 | 滴图(北京)科技有限公司 | Artificial intelligence system and method for semantic retrieval |
CN110765368A (en) * | 2018-12-29 | 2020-02-07 | 北京嘀嘀无限科技发展有限公司 | Artificial intelligence system and method for semantic retrieval |
CN110287288A (en) * | 2019-06-18 | 2019-09-27 | 北京百度网讯科技有限公司 | Recommend the method and apparatus of document |
CN110287288B (en) * | 2019-06-18 | 2022-02-18 | 北京百度网讯科技有限公司 | Method and device for recommending documents |
CN110705307A (en) * | 2019-08-30 | 2020-01-17 | 深圳壹账通智能科技有限公司 | Information change index monitoring method and device, computer equipment and storage medium |
CN113569128A (en) * | 2020-04-29 | 2021-10-29 | 北京金山云网络技术有限公司 | Data retrieval method and device and electronic equipment |
CN111753527A (en) * | 2020-06-29 | 2020-10-09 | 平安科技(深圳)有限公司 | Data analysis method and device based on natural language processing and computer equipment |
CN113269477A (en) * | 2021-07-14 | 2021-08-17 | 北京邮电大学 | Scientific research project query scoring model training method, query method and device |
WO2023040808A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Webpage retrieval method and related device |
CN114021019A (en) * | 2021-11-10 | 2022-02-08 | 中国人民大学 | Retrieval method integrating personalized search and search result diversification |
CN114021019B (en) * | 2021-11-10 | 2024-03-29 | 中国人民大学 | Retrieval method integrating personalized search and diversification of search results |
CN114969310A (en) * | 2022-06-07 | 2022-08-30 | 南京云问网络技术有限公司 | Multi-dimensional data-oriented sectional type retrieval and sorting system design method |
CN114969310B (en) * | 2022-06-07 | 2024-04-05 | 南京云问网络技术有限公司 | Multi-dimensional data-oriented sectional search ordering system design method |
CN115168577A (en) * | 2022-06-30 | 2022-10-11 | 北京百度网讯科技有限公司 | Model updating method and device, electronic equipment and storage medium |
CN115168577B (en) * | 2022-06-30 | 2023-03-21 | 北京百度网讯科技有限公司 | Model updating method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103020164B (en) | 2015-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103020164B (en) | Semantic search method based on multi-semantic analysis and personalized sequencing | |
CN101364239B (en) | Method for auto constructing classified catalogue and relevant system | |
US9652537B2 (en) | Identifying terms associated with queries | |
CN102902806B (en) | A kind of method and system utilizing search engine to carry out query expansion | |
CN101520785B (en) | Information retrieval method and system therefor | |
JP5632124B2 (en) | Rating method, search result sorting method, rating system, and search result sorting system | |
US8538989B1 (en) | Assigning weights to parts of a document | |
CN106339502A (en) | Modeling recommendation method based on user behavior data fragmentation cluster | |
CN103455487B (en) | The extracting method and device of a kind of search term | |
US20060155751A1 (en) | System and method for document analysis, processing and information extraction | |
CN104933239A (en) | Hybrid model based personalized position information recommendation system and realization method therefor | |
EP1782278A2 (en) | System and method for document analysis, processing and information extraction | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN101283353A (en) | Systems for and methods of finding relevant documents by analyzing tags | |
CN109918563A (en) | A method of the book recommendation based on public data | |
CN1996316A (en) | Search engine searching method based on web page correlation | |
Hao et al. | An Algorithm for Generating a Recommended Rule Set Based on Learner's Browse Interest | |
CN106886577A (en) | A kind of various dimensions web page browsing behavior evaluation method | |
Rajkumar et al. | Users’ click and bookmark based personalization using modified agglomerative clustering for web search engine | |
CN1996280A (en) | Method for co-building search engine | |
KR101448134B1 (en) | an blog prestige ranking method based on weighted indexing of terms | |
Khelghati | Deep web content monitoring | |
Fathy et al. | A Personalized Approach for Re-ranking Search Results Using User Preferences. | |
Peng et al. | A focused web crawler face stock information of financial field | |
Chen et al. | Personalized search based on learning user click history |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150610 Termination date: 20151126 |
|
CF01 | Termination of patent right due to non-payment of annual fee |