CN112069399B - Personalized search system based on interaction matching - Google Patents

Personalized search system based on interaction matching Download PDF

Info

Publication number
CN112069399B
CN112069399B CN202010861245.9A CN202010861245A CN112069399B CN 112069399 B CN112069399 B CN 112069399B CN 202010861245 A CN202010861245 A CN 202010861245A CN 112069399 B CN112069399 B CN 112069399B
Authority
CN
China
Prior art keywords
matching
vector
document
user
personalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010861245.9A
Other languages
Chinese (zh)
Other versions
CN112069399A (en
Inventor
窦志成
邴庆禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202010861245.9A priority Critical patent/CN112069399B/en
Publication of CN112069399A publication Critical patent/CN112069399A/en
Application granted granted Critical
Publication of CN112069399B publication Critical patent/CN112069399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention realizes a personalized search system based on interactive matching by a method in the artificial intelligence field, a system input module, a personalized search module based on interactive matching and an output module, wherein the operation process of the personalized search module based on interactive matching is realized by four steps of bottom matching modeling of a user search history, calculation of attention weight, generation of user interest matching vectors and personalized reordering, a model idea of matching a user history query with candidate documents in an interactive way is based on a word level, an attention mechanism reduces the influence of irrelevant information in the search history, and a convolution neural network is used for fusing the weighted matching methods, so that a final interest matching vector of the documents is generated, more accurate interest matching score is obtained, and the technical problems that the quality dependence vector of a sequencing result is good or bad for constructing a model and the vector constructing process possibly ignores some useful information under the existing vector representation method are solved.

Description

Personalized search system based on interaction matching
Technical Field
The invention relates to the field of artificial intelligence, in particular to a personalized search system based on interactive matching.
Background
Personalizing user searches with user history information has proven to be effective in improving the quality of search rankings. The personalized search algorithm firstly models the interests of the user according to the historical behaviors and other information of the user, and when the matching score is calculated, the correlation degree between the query statement and the document is considered, and the matching degree between the document and the interests of the user is introduced, so that a search result list meeting the requirements of different users is personalized and customized. The user interest model can be built based on various information sources, such as the position information of the user, the retrieval mode, the browsing history and the searching history of the user, and the like, and most personalized searching algorithms are currently built based on the historical browsing and the historical searching behaviors of the user. In recent years, researchers introduce a deep learning method into a personalized ranking model, so that the semantic understanding capability of the model on texts is enhanced, and a good effect is achieved on personalized rearrangement of search results. Ranking algorithms that utilize deep learning can be categorized into representation-based matching and interaction-based matching. Based on the representation matching, in the sorting algorithm, semantic vector representations of the query and the document are obtained through learning respectively, and then the two vectors are subjected to matching calculation. The interactive matching algorithm is to make the query and the document interact in advance on the word level with finer granularity, grasp more complete matching signals, and then combine the matching signals into a matching score by utilizing the matching signals. The existing personalized search algorithm almost all calculates interest expression vectors of users, and then interacts with the expression vectors of candidate documents to obtain personalized matching scores, and an algorithm idea based on expression matching is used.
Most of the existing personalized ranking algorithms directly calculate interest expression vectors of users in various modes according to historical behaviors of the users, and then interact with the expression vectors of candidate documents to obtain personalized matching scores. The method is to acquire a matching signal of the document and the user interest by taking the whole document as a unit, and mainly converts the document to be matched and the user interest into a representation vector, then carries out vector matching, and focuses on the construction of a representation layer. Under the vector representation-based method, the quality of the sequencing result depends on the quality of a vector construction model to a great extent, and the process of constructing the vector may ignore some useful information, such as text information and interaction information of the query and the document at the word level, so as to influence the personalized sequencing result.
Disclosure of Invention
Therefore, the invention provides an individualized searching system based on interactive matching, which comprises an input module, an individualized searching module based on interactive matching and an output module;
the input module is used for reading the user inquiry history and the candidate documents, inputting the standardized formats into the personalized search module based on interactive matching,
the operation process of the personalized search module based on the interaction matching is divided into four steps:
step one: a bottom layer matching modeling step of the user search history, wherein a bottom layer matching model is established by utilizing the history search information of the user, and the history inquiry of the user and the candidate document are interacted according to words to obtain a fine matching signal of the bottom layer;
step two: a step of calculating attention weight, namely introducing an attention mechanism, and weighting corresponding matching signals according to the contribution degree of different query records in the search history of the user to the current query;
step three: generating a user interest matching vector, namely performing feature extraction on the weighted matching signals by using a convolutional neural network to generate a document and a final user interest matching vector;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating relevance scores of the candidate documents through clicking feature vectors, and personalized reordering by taking the sum of the personalized scores as a final document matching score;
and the output module outputs the document matching score and the personalized rearrangement result.
The implementation mode of the bottom layer matching modeling step of the user search history is as follows: defining a user's historical query list as { q } 1 ,q 2 ,q 3 ,…,q n (wherein n.gtoreq.3, an integer), the current candidate document is d, for each historical query-candidate document pair<q i ,d>Firstly, mapping the two words into word vectors, using word2vec model to express the word vectors, q i Is processed and expressed as a group of word vectors { qw } 1 ,qw 2 ,qw 3 ,…,qw x Processed d is denoted as { dw }, d 1 ,dw 2 ,dw 3 ,…,dw y }. To be in two sets of word vectorsEach vector is interacted pairwise to obtain<q i ,d>The word matching matrix T of (a), each element in the matching matrix T is:
T i,j =cos(qw i ,dw j )
wherein T is i,j Representing the elements of the ith row and jth column in the matrix T, qw i Representing a word vector, dw, corresponding to the i-th word in the history query j The matching value of the word vector corresponding to the jth word in the representative candidate document (wherein, i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y and i, j, x, y are integers) is calculated by a cosine function. In the K-NRM model, K RBF kernels are applied to each row in the matching matrix to obtain a K-dimensional feature vector
Figure BDA0002648218730000021
The formula corresponding to the RBF kernel is:
Figure BDA0002648218730000031
wherein K is k (T i ) Representing the value of the kth RBF kernel after processing the ith row of the matching matrix T, wherein the value range is between 0 and y; mu (mu) k Sum sigma k Are all super parameters, mu is uniformly valued from-1 to 1, then the logarithm of the feature vector quantity corresponding to each row in the matching matrix is summed again to be used as a historical query q i Final bottom matching results with candidate documents:
Figure BDA0002648218730000032
{ v for underlying match vector calculated based on user's historical search information 1 ,v 2 ,v 3 ,…,v n And the element in the expression is a fine granularity matching vector v of the candidate document.
The specific implementation manner of the attention weight calculation step is as follows: the fine granularity matching vector v of the current query q and the candidate document d calculates the attention weight for the bottom matching vector corresponding to each historical query record:
e i =g(v,v i )
Figure BDA0002648218730000033
wherein g is a multi-layer perceptron using tanh as an activation function, alpha i Is the bottom layer matching vector v calculated by the attention layer i The corresponding weight, the weighted bottom matching vector is:
Figure BDA0002648218730000034
the weighted fine-grained matching vector corresponding to each historical query of the user is { V } 1 ,V 2 ,V 3 ,…,V n }。
The specific implementation mode of the step of generating the user interest matching vector is as follows: the weighted fine granularity matching vector { V } 1 ,V 2 ,V 3 ,…,V n Splicing the two rows to form a matching characteristic matrix M, M= [ V ] 1 ,V 2 ,V 3 ,…,V n ]∈R K×n Convolving the matching feature matrix M by using 100 convolution checks to obtain a three-dimensional tensor A epsilon R 100×(K-2)×(n-2) Each element in tensor a is:
Figure BDA0002648218730000035
wherein t is an integer of 1-100, b t For the bias vector b.epsilon.R 100 The value of the t element, f t Is the t 3 x 3 convolution kernel, M i-1:i+1,j-1:j+1 Representing the submatrices of the matched feature matrix M from row i-1 to row i+1 and from column j-1 to column j+1,
Figure BDA0002648218730000036
representation ofThe elements of the corresponding positions of the two matrixes are multiplied and all the products are added and summed, a convolution layer adopts a Relu function as an activation function, and after the convolution layer is processed, maximum pooling is applied to the second dimension and the third dimension of the three-dimensional tensor A in a pooling layer to obtain a vector I, I with 100 dimensions t For the t-th element in vector I:
Figure BDA0002648218730000041
the output vector I is the final user interest matching vector.
The size of the convolution kernel is 3 x 3 and there are at least 3 search histories per user.
The specific implementation mode of the personalized reordering step is as follows: the candidate documents and the matching score (d|I) of the user interests are obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d|q) of the candidate document and the current query is calculated by a multi-layer perceptron according to three click characteristics of the click times, the original click positions and the click entropy; the final score of the candidate document is obtained by adding the interest matching score (d|I) and the relevance score (d|q), and the final personalized sequencing result is obtained by reordering the original document list according to the score.
Training the candidate documents and the relevance score of the current query through a LambdaRank algorithm in the calculation of the relevance score of the candidate documents and the current query, taking the clicked document as a relevant document sample, taking the rest documents as irrelevant samples, and selecting one relevant document d i And an uncorrelated document d j Document pairs are constructed to calculate losses. The calculation of the loss function also introduces the degree of influence of the sequence of exchanging document pairs on the evaluation index MAP as a corresponding weight, namely, the document pairs with larger difference (larger MAP change value after exchanging sequence) are given larger weight. The loss function is obtained by multiplying the cross entropy between the actual probability and the predicted probability by the variation value of the MAP evaluation index:
Figure BDA0002648218730000042
where Δ is document d i And document d j The changing value of the MAP evaluation index after the exchange of the position,
Figure BDA0002648218730000043
representing document d i Document d j Actual probability of high correlation, p ij Representing the prediction probability, the prediction probability p ij The calculation method comprises the following steps:
Figure BDA0002648218730000044
the invention has the technical effects that:
(1) The method introduces a model idea based on interactive matching, does not convert the text into a unique integral expression vector, and interacts the historical query of the user with the candidate document at the word level to obtain a more accurate and complete matching signal.
(2) The attention mechanism is introduced, and the corresponding matching signals are weighted according to the contribution degree of different historical queries to the current matching, so that the influence of irrelevant information in the search history is reduced.
(3) The weighted matching signals are subjected to feature extraction by using a convolutional neural network to generate final interest matching vectors of the document, so that more accurate interest matching scores are obtained.
Drawings
FIG. 1 is a framework of an interactive matching based personalized search module;
Detailed Description
The following is a preferred embodiment of the present invention and a technical solution of the present invention is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
In order to achieve the above object, the present invention provides a personalized search system based on interactive matching.
The system comprises an input module, an individualized searching module based on interaction matching and an output module; the input module is used for reading the user inquiry history and the candidate documents, inputting the standardized formats of the user inquiry history and the candidate documents into the personalized search module based on the interactive matching, and outputting the document matching score and the personalized rearrangement result by the output module.
And processing the bottom matching signal by using a convolutional neural network based on the personalized search module of the interactive matching to obtain a final interest matching result of the candidate document.
The personalized search module based on interaction matching takes inter-word matching signals of the historical queries and candidate documents in the historical behavior information of the user into consideration, and a historical query list { q } of the user 1 ,q 2 ,q 3 ,…,q n The current candidate document is d, firstly, a search log of a user is processed through a K-NRM model based on interaction matching to obtain each historical query q i Fine granularity matching vector v with candidate document d i (where 1.ltoreq.i.ltoreq.n), and a fine-grained matching vector v of the current query q with the candidate document d. Then, given that the user interests are dynamically changing and that the user queries sometimes have some chance, the contribution of different queries in the user search history to the current query is different. According to the contribution degree of each historical query to the current query, a multi-layer perceptron is utilized to generate a matching vector { v ] for the K-NRM model 1 ,v 2 ,v 3 ,…,v n Weighting to obtain a weighted matching vector list { V }, and 1 ,V 2 ,V 3 ,…,V n }. And then, processing the vectors by using a convolutional neural network to obtain matching vectors between the candidate documents and the interests of the user. And finally, respectively calculating an interest matching score and a relevance score of the current candidate document according to the interest matching vector and the click feature vector, and summing to obtain a final document matching score, wherein the formula is as follows:
score(d)=score(d|I)+score(d|q)
where score (d|I) represents the matching score of the current candidate document and the user's search interests, and score (d|q) represents the relevance score of the current candidate document to the current query.
The framework of the personalized search module based on interaction matching is shown in fig. 1, and is divided into the following four parts according to the processing flow:
step one: the underlying matching modeling of the user search history. And establishing a bottom layer matching model by utilizing the historical search information of the user, and interacting the historical query of the user with the candidate document according to words to obtain a fine matching signal of the bottom layer.
Step two: and (5) calculating the attention weight. And introducing an attention mechanism, and weighting corresponding matching signals according to the contribution degree of different query records in the search history of the user to the current query.
Step three: and generating a user interest matching vector. And performing feature extraction on the weighted matching signals by using a convolutional neural network to generate final matching vectors of the document and the user interests.
Step four: personalized reordering. And calculating the personalized score of the candidate document by the obtained interest matching vector, calculating the relevance score by clicking the feature vector, and personalized rearrangement is carried out by taking the sum of the score and the relevance score as the final document matching score.
The bottom layer matching modeling step of the user search history:
the user's search history can provide rich information for the acquisition of the user's search interests. In the past, most algorithms model the interests of a user based on the historical behavior information of the user to obtain an interest vector representing the search preference of the user, and then the interest vector is interacted with the document vector. The method comprises the steps of adopting a K-NRM framework, establishing a bottom layer matching model by utilizing historical search information of each user U, and carrying out interactive matching on each historical query in the historical search of the user with candidate documents at the bottom layer.
The user's historical query list is { q 1 ,q 2 ,q 3 ,…,q n Current candidate document is d. For each historical query-candidate document pair<q i ,d>Firstly, mapping the two words into word vectors, and using a word2vec model to represent the word vectors. q i Is processed and expressed as a group of word vectors { qw } 1 ,qw 2 ,qw 3 ,…,qw x Processed d is denoted as { dw }, d 1 ,dw 2 ,dw 3 ,…,dw y }. Each vector in the two groups of word vectors is interacted pairwise to obtain<q i ,d>Is a word matching matrix T of (a). Each element in the matching matrix T is given by the following formula:
T i,j =cos(qw i ,dw j )
wherein T is i,j Representing the elements of the ith row and jth column in the matrix T, qw i Representing a word vector, dw, corresponding to the i-th word in the history query j The matching value of the word vector (wherein, 1.ltoreq.i.ltoreq.x, 1.ltoreq.j.ltoreq.y) corresponding to the jth word in the candidate document is calculated by a cosine function.
From the above description, the ith row in the matching matrix represents the matching signal of the ith word in the history query and the candidate document. In the K-NRM model, K RBF kernels are applied to each row in the matching matrix to obtain a K-dimensional feature vector
Figure BDA0002648218730000071
The corresponding formula of the RBF kernel is as follows:
Figure BDA0002648218730000072
wherein K is k (T i ) Representing the value of the kth RBF kernel after processing the ith row of the matching matrix T, wherein the value range is between 0 and y; mu (mu) k Sum sigma k Are super parameters. In the K-NRM model used by us, mu is uniformly valued from-1 to 1 because the cosine similarity of the vector is valued between-1 and 1. Then, the logarithm of the feature vector quantity corresponding to each line in the matching matrix is summed again to be used as a historical query q i The final bottom matching result with the candidate document is as follows:
Figure BDA0002648218730000073
for each historical query q i It has a K-dimensional matching vector with the current candidate document, the matching vector is the historical query q i Fine granularity matching vector v with candidate document d i . The fine-grained matching vector v of the current query q and the candidate document d is also calculated by the above procedure. To this end, we have derived the bottom level matching vector calculated based on the user's historical search information using { v } 1 ,v 2 ,v 3 ,…,v n And } represents.
The calculation step of the attention weight:
because the search interests and search modes of the user are dynamically changed and the user queries have a certain contingency, the influence degree of different query records in the user search history on the current query is different. Based on the consideration, the method introduces an attention mechanism, and further optimizes each bottom matching vector according to the contribution degree of different historical queries to current matching.
In the last step, we get the underlying matching vector { v } calculated using the user's historical search information 1 ,v 2 ,v 3 ,…,v n }. The method comprises the steps of calculating attention weights for bottom matching vectors corresponding to each historical query record based on fine-granularity matching vectors v of a current query q and a candidate document d. The input of the attention layer is the bottom layer matching vector { v ] calculated in the last step 1 ,v 2 ,v 3 ,…,v n And v, the calculation formula is as follows:
e i =g(v,v i )
Figure BDA0002648218730000074
/>
wherein g (·) is a multi-layer perceptron with tanh as activation function, α i Is the bottom layer matching vector v calculated by the attention layer i The corresponding weight. The weighted bottom layer matching vector is given by the following formula:
Figure BDA0002648218730000081
the attention layer gives more attention to the bottom matching vector corresponding to the historical query with larger contribution according to the information quantity of the current matching contribution of different historical queries in the user search history, and obtains optimized bottom matching information weighted according to the contribution degree. So far, we obtain weighted fine-grained matching vector { V } corresponding to each historical query of the user 1 ,V 2 ,V 3 ,…,V n }。
Generating a user interest matching vector:
the weighted fine granularity matching vector { V } 1 ,V 2 ,V 3 ,…,V n Splicing the two rows to form a matching characteristic matrix M, M= [ V ] 1 ,V 2 ,V 3 ,…,V n ]∈R K×n . The traditional method is to directly apply maximum pooling or average pooling on the matching feature matrix to obtain the user interest matching vector. However, given that there may be a large number of history search records in the user's search history, applying pooling directly on the matching feature matrix may ignore some useful information, such as relationship information between the underlying matching vectors corresponding to neighboring history queries.
To compensate for this deficiency, this step uses 100 3×3 convolution kernels f 1 ,f 2 ,…,f 100 Convolving the matching feature matrix M to obtain a three-dimensional tensor A epsilon R 100×(K-2)×(n-2) . Each element in tensor a is given by the following formula:
Figure BDA0002648218730000082
wherein t is more than or equal to 1 and less than or equal to 100, b t For the bias vector b.epsilon.R 100 The value of the t element, f t Is the t 3 x 3 convolution kernel, M i-1:i+1,j-1:j+1 Representing the submatrices of the matched feature matrix M from row i-1 to row i+1 and from column j-1 to column j+1,
Figure BDA0002648218730000083
representing the operation of multiplying the elements of the corresponding positions of the two matrices and summing all the products. The convolution layer of this step uses a 3 x 3 convolution kernel, which requires at least 3 history queries per user's search history. In other words, the present model does not support users with less than three history queries, because too few history queries may not provide enough information for the extraction of the user's search interests, in which case the personalized rearrangement of the documents may instead interfere with the accurate calculation of the document scores. In addition, the convolution layer adopts the Relu function as the activation function, compared with other activation functions such as sigmoid, the calculated amount of the Relu function is small, and the gradient disappearance problem can be avoided.
After the convolution layer processing, we apply max-pooling (max-pooling) to the second and third dimensions of the three-dimensional tensor a at the pooling layer to obtain a 100-dimensional vector I. I t For the t-th element in the vector I, the calculation formula is as follows:
Figure BDA0002648218730000091
the purpose of the pooling layer is to further extract the characteristics of the matching characteristic tensor A, and the output vector I is the final user interest matching vector.
Personalized reordering step
Since the score of a candidate document consists of two parts: the matching score of the candidate document to the user's interests and the relevance score to the current query. The candidate documents and the matching score (d|I) of the user interests are obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d|q) of the candidate document and the current query is calculated by a multi-layer perceptron according to three click characteristics of the click times, the original click positions and the click entropy. The final score of the candidate document is obtained by adding the interest matching score (d|I) and the relevance score (d|q), and the final personalized sequencing result is obtained by reordering the original document list according to the score.
The method comprises the steps of selecting a lambdaRank algorithm for training, taking a clicked document as a relevant document sample, taking the rest documents as irrelevant samples, and selecting a relevant document d i And an uncorrelated document d j Document pairs are constructed to calculate losses. The loss function is obtained by multiplying the cross entropy between the actual probability and the predicted probability by the variation value of the MAP evaluation index, and the calculation formula is as follows:
Figure BDA0002648218730000092
wherein, delta is the variation value of MAP evaluation index,
Figure BDA0002648218730000093
representing document d i Document d j Actual probability of high correlation, p ij Representing its predictive probability; />
Figure BDA0002648218730000094
Representing document d j Document d i Actual probability of high correlation, p ji Representing its predictive probability. Prediction probability p ij The method is calculated by the following formula:
Figure BDA0002648218730000095
and outputting the finally obtained personalized sequencing result to an output module for outputting.

Claims (4)

1. An interaction matching-based personalized search system is characterized in that: the system comprises an input module, an individualized searching module based on interactive matching and an output module;
the input module is used for reading the user inquiry history and the candidate documents, inputting the standardized formats into the personalized search module based on interactive matching,
the operation process of the personalized search module based on the interaction matching is divided into four steps:
step one: a bottom layer matching modeling step of the user search history, wherein a bottom layer matching model is established by utilizing the history search information of the user, and the history inquiry of the user and the candidate document are interacted according to words to obtain a fine matching signal of the bottom layer;
step two: a step of calculating attention weight, namely introducing an attention mechanism, and weighting corresponding matching signals according to the contribution degree of different query records in the search history of the user to the current query;
step three: generating a user interest matching vector, namely performing feature extraction on the weighted matching signals by using a convolutional neural network to generate a document and a final user interest matching vector;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating relevance scores of the candidate documents through clicking feature vectors, and personalized reordering by taking the sum of the personalized scores as a final document matching score;
the output module outputs the document matching score and the personalized rearrangement result;
the implementation mode of the bottom layer matching modeling step of the user search history is as follows: defining a user's historical query list as { q } 1 ,q 2 ,q 3 ,…,q n N is an integer n.gtoreq.3, the current candidate document is d, for each historical query-candidate document pair<q i ,d>Firstly, mapping the two words into word vectors, using word2vec model to express the word vectors, q i Is processed and expressed as a group of word vectors { qw } 1 ,qw 2 ,qw 3 ,…,dw x Processed d is denoted as { dw }, d 1 ,dw 2 ,dw 3 ,…,dw y Each vector in the two groups of word vectors is interacted pairwise to obtain<q i ,d>The word matching matrix T of (a), each element in the matching matrix T is:
T i,j =cos(qw i ,dw j )
wherein T is i,j Representing the elements of the ith row and jth column in the matrix T, qw i Representing a word vector, dw, corresponding to the i-th word in the history query j Representing word vectors corresponding to the jth word in the candidate document, wherein i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y, i, j, x, y are integers, matching values of the two are calculated by a cosine function, and K RBF kernels are applied to each row in a matching matrix in a K-NRM model to obtain a K-dimensional feature vector
Figure FDA0004180801880000011
The formula corresponding to the RBF kernel is:
Figure FDA0004180801880000021
wherein K is k (T i ) Representing the value of the kth RBF kernel after processing the ith row of the matching matrix T, wherein the value range is between 0 and y; mu (mu) k Sum sigma k Are all super parameters, mu is uniformly valued from-1 to 1, then the logarithm of the feature vector quantity corresponding to each row in the matching matrix is summed again to be used as a historical query q i Final bottom matching results with candidate documents:
Figure FDA0004180801880000022
{ v for underlying match vector calculated based on user's historical search information 1 ,v 2 ,v 3 ,…,v n -representing, wherein the element is a fine-grained matching vector v of the candidate document;
the fine granularity matching vector v of the current query q and the candidate document d calculates the attention weight for the bottom matching vector corresponding to each historical query record:
e i =g(v,v i )
Figure FDA0004180801880000023
wherein g is a multi-layer perceptron using tanh as an activation function, alpha i Is the bottom layer matching vector v calculated by the attention layer i The corresponding weight, the weighted bottom matching vector is:
Figure FDA0004180801880000024
the weighted fine-grained matching vector corresponding to each historical query of the user is { V } 1 ,V 2 ,V 3 ,…,V n };
The specific implementation mode of the step of generating the user interest matching vector is as follows: the weighted fine granularity matching vector { V } 1 ,V 2 ,V 3 ,…,V n Splicing the two rows to form a matching characteristic matrix M, M= [ V ] 1 ,V 2 ,V 3 ,…,V n ]∈R K×n Convolving the matching feature matrix M by using 100 convolution checks to obtain a three-dimensional tensor A epsilon R 100×(K-2)×(n-2) Each element in tensor a is:
Figure FDA0004180801880000025
wherein t is an integer of 1-100, b t For the bias vector b.epsilon.R 100 The value of the t element, f t Is the t 3 x 3 convolution kernel, M i-1:i+1,j-1:j+1 Representing the submatrices of the matched feature matrix M from row i-1 to row i+1 and from column j-1 to column j+1,
Figure FDA0004180801880000026
representing the operation of multiplying the elements of the corresponding positions of the two matrices and summing all the products, the convolution layer adopts the Relu function as an activation function, and after the convolution layer processing, the pooling layer applies the most to the second dimension and the third dimension of the three-dimensional tensor APooling to obtain a 100-dimensional vector I, I t For the t-th element in vector I:
Figure FDA0004180801880000031
the output vector I is the final user interest matching vector.
2. The personalized search system based on interactive matching according to claim 1, wherein: the size of the convolution kernel is 3 x 3 and there are at least 3 search histories per user.
3. A personalized search system based on interactive matching as claimed in claim 2, wherein: the specific implementation mode of the personalized reordering step is as follows: the candidate documents and the matching score (d|I) of the user interests are obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d|q) of the candidate document and the current query is calculated by a multi-layer perceptron according to three click characteristics of the click times, the original click positions and the click entropy; the final score of the candidate document is obtained by adding the interest matching score (d|I) and the relevance score (d|q), and the final personalized sequencing result is obtained by reordering the original document list according to the score.
4. A personalized search system based on interactive matching as recited in claim 3, wherein: training the candidate documents and the relevance score of the current query through a LambdaRank algorithm in the calculation of the relevance score of the candidate documents and the current query, taking the clicked document as a relevant document sample, taking the rest documents as irrelevant samples, and selecting one relevant document d i And an uncorrelated document d j The document pairs are formed to calculate the loss, the calculation of the loss function also introduces the influence degree of the sequence of exchanging the document pairs on the evaluation index MAP as the corresponding weight, namely, the larger the MAP change value after the exchanging sequence is, the larger the document difference is, the larger the weight is given to the lossThe loss function is obtained by multiplying the cross entropy between the actual probability and the predicted probability by the variation value of the MAP evaluation index:
Figure FDA0004180801880000032
/>
Figure FDA0004180801880000033
where Δ is document d i And document d j The changing value of the MAP evaluation index after the exchange of the position,
Figure FDA0004180801880000034
representing document d i Document d j Actual probability of high correlation, p ij Representing the prediction probability. />
CN202010861245.9A 2020-08-25 2020-08-25 Personalized search system based on interaction matching Active CN112069399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010861245.9A CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010861245.9A CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Publications (2)

Publication Number Publication Date
CN112069399A CN112069399A (en) 2020-12-11
CN112069399B true CN112069399B (en) 2023-06-02

Family

ID=73658899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010861245.9A Active CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Country Status (1)

Country Link
CN (1) CN112069399B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987155B (en) * 2021-11-25 2024-03-26 中国人民大学 Conversational retrieval method integrating knowledge graph and large-scale user log
CN114357231B (en) * 2022-03-09 2022-06-28 城云科技(中国)有限公司 Text-based image retrieval method and device and readable storage medium
CN117851444A (en) * 2024-03-07 2024-04-09 北京谷器数据科技有限公司 Advanced searching method based on semantic understanding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
CN107957993A (en) * 2017-12-13 2018-04-24 北京邮电大学 The computational methods and device of english sentence similarity
CN111125538A (en) * 2019-12-31 2020-05-08 中国人民大学 Searching method for enhancing personalized retrieval effect by using entity information
CN111177357A (en) * 2019-12-31 2020-05-19 中国人民大学 Memory neural network-based conversational information retrieval method
CN111310023A (en) * 2020-01-15 2020-06-19 中国人民大学 Personalized search method and system based on memory network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268646B2 (en) * 2017-06-06 2019-04-23 Facebook, Inc. Tensor-based deep relevance model for search on online social networks
SG10202108020VA (en) * 2017-10-16 2021-09-29 Illumina Inc Deep learning-based techniques for training deep convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
CN107957993A (en) * 2017-12-13 2018-04-24 北京邮电大学 The computational methods and device of english sentence similarity
CN111125538A (en) * 2019-12-31 2020-05-08 中国人民大学 Searching method for enhancing personalized retrieval effect by using entity information
CN111177357A (en) * 2019-12-31 2020-05-19 中国人民大学 Memory neural network-based conversational information retrieval method
CN111310023A (en) * 2020-01-15 2020-06-19 中国人民大学 Personalized search method and system based on memory network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
End-to-End Neural Ad-hoc Ranking with Kernel Pooling;Chenyan Xiong et al.;《Research and Development in Information Retrieval》;55-64 *
基于递归神经网络与注意力机制的动态个性化搜索算法;周雨佳 等;《计算机学报》;812-826 *

Also Published As

Publication number Publication date
CN112069399A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN111667884B (en) Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism
CN112069399B (en) Personalized search system based on interaction matching
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110188358B (en) Training method and device for natural language processing model
CN110929164A (en) Interest point recommendation method based on user dynamic preference and attention mechanism
Hofmann The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data
Khrulkov et al. Tensorized embedding layers for efficient model compression
KR102203065B1 (en) Triple verification device and method
Chitty-Venkata et al. Neural architecture search for transformers: A survey
CN111782961B (en) Answer recommendation method oriented to machine reading understanding
CN112328900A (en) Deep learning recommendation method integrating scoring matrix and comment text
CN111737578A (en) Recommendation method and system
CN111723914A (en) Neural network architecture searching method based on convolution kernel prediction
Sadr et al. Convolutional neural network equipped with attention mechanism and transfer learning for enhancing performance of sentiment analysis
CN112527993A (en) Cross-media hierarchical deep video question-answer reasoning framework
CN112115371A (en) Neural attention mechanism mobile phone application recommendation model based on factorization machine
Sokkhey et al. Development and optimization of deep belief networks applied for academic performance prediction with larger datasets
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
Dinov et al. Black box machine-learning methods: Neural networks and support vector machines
Paul et al. Non-iterative online sequential learning strategy for autoencoder and classifier
CN116561314A (en) Text classification method for selecting self-attention based on self-adaptive threshold
Zhou et al. Gan-based recommendation with positive-unlabeled sampling
Ganguly et al. Evaluating CNN architectures using attention mechanisms: Convolutional Block Attention Module, Squeeze, and Excitation for image classification on CIFAR10 dataset
CN115422369B (en) Knowledge graph completion method and device based on improved TextRank
Pourbahman et al. Deep neural ranking model using distributed smoothing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant