CN111274366A

CN111274366A - Search recommendation method and device, equipment and storage medium

Info

Publication number: CN111274366A
Application number: CN202010220548.2A
Authority: CN
Inventors: 沈强; 谭松波
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-06-12

Abstract

The embodiment of the application discloses a search recommendation method, a search recommendation device, equipment and a storage medium, wherein the method comprises the following steps: determining the word vector correlation degree between the information to be searched and each candidate text in the candidate text set; determining the character relevancy between the information to be searched and each candidate text; fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text; and recommending candidate texts corresponding to the evaluation values meeting the specific conditions.

Description

Search recommendation method and device, equipment and storage medium

Technical Field

The embodiment of the application relates to the internet technology, and relates to but is not limited to a search recommendation method, a search recommendation device, search recommendation equipment and a storage medium.

Background

Users often search for needed information in the mass information of the internet, and search engines have become indispensable tools in the life and work of users. The search engine is a retrieval technology that retrieves relevant texts from the internet by using a specific strategy according to user requirements and a certain algorithm and then feeds back the texts to users. The method comprises the following steps of determining the relevance between information to be searched input by a user and candidate texts, and recommending the candidate texts with high relevance to the information to be searched to the user.

Therefore, how to make the candidate texts recommended to the user more accurate has certain significance for better meeting the user requirements.

Disclosure of Invention

In view of this, embodiments of the present application provide a search recommendation method and apparatus, a device, and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a search recommendation method, where the method includes: determining the word vector correlation degree between the information to be searched and each candidate text in the candidate text set; determining the character relevancy between the information to be searched and each candidate text; fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text; and recommending candidate texts corresponding to the evaluation values meeting the specific conditions.

In a second aspect, an embodiment of the present application provides a search recommendation apparatus, including: the first determination module is used for determining the word vector relevancy between the information to be searched and each candidate text in the candidate text set; the second determination module is used for determining the character relevancy between the information to be searched and each candidate text; the fusion module is used for fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text; and the recommending module is used for recommending the candidate texts corresponding to the evaluation values meeting the specific conditions.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement steps in any search recommendation method according to the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in any one of the search recommendation methods in the embodiment of the present application.

In the method, word vector relevancy and character relevancy between information to be searched and each candidate text are determined; then, fusing the word vector correlation degree and the character correlation degree corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text; therefore, the similarity between the information to be searched and the candidate text can be more accurately determined, so that the candidate text recommended to the user is more accurate, and the user requirements are better met.

Drawings

FIG. 1 is a schematic diagram illustrating an implementation flow of a search recommendation method according to an embodiment of the present application;

FIG. 2 is a general flow chart of a model for determining an evaluation value of a candidate text answer according to an embodiment of the present application;

FIG. 3 is a diagram illustrating search recommendation results returned by a BERT model only in an embodiment of the present application;

FIG. 4 is a diagram illustrating search recommendation results returned by fusing a BERT model and an N-GRAM according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a search recommendation apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar or different objects and do not represent a specific ordering with respect to the objects, and it should be understood that "first \ second \ third" may be interchanged under certain ordering or sequence circumstances to enable the embodiments of the present application described herein to be implemented in other orders than illustrated or described herein.

The search recommendation method provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can be various types of equipment with a search function in the implementation process, for example, the electronic equipment can include an intelligent mobile terminal (e.g., a mobile phone), a tablet computer, an electronic book, a notebook computer, a desktop computer, and the like. The functions implemented by the method can be implemented by calling program code by a processor in the electronic device, and the program code can be stored in a computer storage medium.

An embodiment of the present application provides a search recommendation method, and fig. 1 is a schematic view of an implementation flow of the search recommendation method according to the embodiment of the present application, and as shown in fig. 1, the method may include the following steps 101 to 104:

step 101, determining word vector correlation between information to be searched and each candidate text in the candidate text set.

It is understood that the information to be searched is a keyword or a sentence, also called a query sentence (query), input into the search engine by the user through voice or text input. The candidate text set may be an existing corpus or other databases. The candidate text may be a web page title, a web page content corresponding to the web page title, or the web page title and the corresponding web page content. The larger the word vector relevance is, the closer the semantics between the information to be searched and the candidate text is.

The electronic device may determine the word vector correlation degree between the information to be searched and the candidate text through steps 201 to 203 of the following embodiments.

Step 102, determining the character relevancy between the information to be searched and each candidate text.

It is known that a word vector is a semantic representation of text, and characters refer to words, etc. contained in the text. Therefore, the character relevance and the word vector relevance are measures representing the closeness degree between two texts from different angles. Word vector relatedness characterizes how close the semantics are, while character relatedness characterizes how similar the words are.

The electronic device may determine the character correlation between the information to be searched and the candidate text through steps 204 to 206 of the following embodiments.

And 103, fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text.

It can be understood that the evaluation value of the candidate text is actually a comprehensive evaluation result of the corresponding word vector relevance and the character relevance, the degree of similarity between the candidate text and the information to be searched is evaluated, and the larger the evaluation value is, the closer the semantic meaning and the word meaning between the candidate text and the information to be searched are represented. The electronic device may determine the evaluation value of the corresponding candidate text through step 207 and step 208 of the following embodiments.

And 104, recommending candidate texts corresponding to the evaluation values meeting the specific conditions.

The specific conditions may be various. For example, if the specific condition is that the evaluation value is greater than a specific threshold, the candidate text corresponding to the evaluation value greater than the specific threshold is recommended accordingly. The specific condition may be K with the largest evaluation values, where K is an integer greater than 0, and the candidate text corresponding to the K largest evaluation values is recommended accordingly.

When the candidate texts are recommended for the user, the candidate texts can be ranked according to the order of the evaluation values from large to small, so that the candidate texts with higher evaluation values are preferentially displayed, and the user can conveniently select the candidate texts.

In the embodiment of the application, a search recommendation method is provided, in the method, word vector relevancy and character relevancy between information to be searched and each candidate text are determined; then, fusing the word vector correlation degree and the character correlation degree corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text; therefore, the similarity between the information to be searched and the candidate text can be more accurately determined, so that the candidate text recommended to the user is more accurate, and the user requirements are better met.

An embodiment of the present application further provides a search recommendation method, where the method may include the following steps 201 to 209:

step 201, respectively performing feature extraction on the information to be searched and each candidate text to obtain a first word vector of the information to be searched and a second word vector of the corresponding candidate text.

In some embodiments, the feature extraction is performed on the information to be searched to obtain the first word vector, and the method may be implemented by the following steps 2011 and 2012: step 2011, determining a text vector, a position vector and an initial word vector of the information to be searched; step 2012, processing the text vector, the position vector and the initial word vector by using a bert (bidirectional Encoder responses from transformer) model to obtain the first word vector.

It should be noted that the primary input of the BERT model is the original Word Vector of each Word or phrase in the text, i.e., the initial Word Vector in step 2011, which may be initialized randomly, or pre-trained using an algorithm such as Word2Vector, etc. to serve as an initial value; the output is vector representation of each character or word in the text after full-text semantic information is fused. That is, the first word vector is a vector representation in which full-text semantic information of the information to be searched is fused, and the second word vector is also the same.

And the value of the text vector is automatically learned in the training process of the BERT model, is used for depicting the global semantic information of the text and is fused with the semantic information of the single character or word. Position vectors, because semantic information carried by words or words appearing at different positions of the text is different (such as "I love you" and "I love me"), the BERT model adds a different vector to the words or words at different positions respectively for distinguishing. Finally, the BERT model takes the sum of the text vector, the position vector, and the initial word vector as input to the model to obtain a first word vector.

Of course, the determination method of the second word vector is similar to the determination method of the first word vector, except that the processed text content is different, the former processed text is the candidate text, and the latter processed text is the information to be searched.

Step 202, processing the first word vector and a second word vector of the ith candidate text by using a classifier model obtained by training to obtain the probability that the information to be searched and the ith candidate text belong to similar categories; wherein i is a positive integer less than or equal to the total number of candidate texts in the candidate set;

step 203, determining the probability as the word vector correlation degree between the information to be searched and the ith candidate text.

In some embodiments, an initial classifier of the classifier model may be trained using a set of sample data to derive the classifier model. Each sample data of the sample data set comprises a word vector (the word vector is a result after feature extraction) corresponding to each of the two texts and a label indicating whether the two texts are similar, for example, if the two texts are similar, the label is 1, otherwise, the label is 0; thus, the classifier model with the binary classification function is obtained through training. Further, the classifier model can be used to determine the word vector correlation degree between the information to be searched and the candidate text.

It should be noted that, for each candidate text, the electronic device may determine the word vector correlation degree with the information to be searched through the above steps 202 and 203.

Step 204, according to the M division grammar, respectively performing character string division on the information to be searched to obtain M first character string sets; wherein M is an integer greater than 0;

step 205, according to the M-segmentation grammar, respectively performing character string segmentation on the ith candidate text to obtain M second character string sets.

In the embodiment of the present application, the value of M is not limited, that is, the electronic device may perform character string segmentation on the text according to one or more segmentation grammars. In some embodiments, the segmentation grammar may be an N-GRAM (N-GRAM). For example, the N-GRAM may be a 1-GRAM, a 2-GRAM, or a 3-GRAM, etc. The M-partition grammar may comprise any one of N-grams or a variety of different N-grams. For example, the M-partition syntax includes a 1-GRAM and a 2-GRAM. Alternatively, the M-partition grammar includes a 1-GRAM and a 3-GRAM; alternatively, the M-partition grammar includes a 2-GRAM and a 3-GRAM; alternatively, the M-partition syntax includes a 1-GRAM, a 2-GRAM, and a 3-GRAM.

For example, the information to be searched is Tencent cloud, and the information is subjected to string segmentation by using 1-GRAM, and the obtained first string set is { Teng/cloud }, wherein "/" is a segmentation symbol and exists for convenience of understanding. And performing character string segmentation on the information by using a 2-GRAM to obtain a first character string set of { Tencent/Wen cloud }.

It should be noted that a character is a broad description, i.e. a character may include words, letters, numbers, etc. One character string may include only one character or may include a plurality of characters. The character string segmentation method of each candidate text is the same.

Step 206, determining the character correlation degree between the information to be searched and the ith candidate text according to the M first character string sets and the M second character string sets.

For each candidate text, the electronic device determines the character relevance with the information to be searched in the same way. For example, the electronic device may determine the character relevance between each candidate text and the information to be searched through step 304 and step 305 of the following embodiments.

It should be noted that, in the embodiment of the present application, the determination order of the word vector relevancy and the character relevancy is not limited. The electronic equipment can determine the word vector relevancy first and then determine the character relevancy, and also can determine the character relevancy first and then determine the word vector relevancy; the electronic device may also determine character relevancy and word vector relevancy between the information to be searched and the candidate text in parallel.

Step 207, obtaining a penalty coefficient, wherein the penalty coefficient is used for characterizing the accuracy of the determination method of the character relevancy or the determination method of the word vector relevancy.

In some embodiments, the penalty factor is a confidence level of a determination method of character relevance or a confidence level of a determination method of word vector relevance.

And 208, determining an evaluation value of the corresponding candidate text according to the penalty coefficient, the character relevance corresponding to each candidate text and the word vector relevance.

The penalty factor may be preconfigured. In some embodiments, where the penalty factor is used to characterize a confidence level of a determination method of character relevance, the electronic device may determine a first product between the penalty factor and the character relevance; and determining the evaluation value of the corresponding candidate text according to the word vector correlation degree corresponding to each candidate text and the first product.

For example, the evaluation value Score of the candidate text may be determined according to the following formula (1)₁：

Score₁＝ScortBert₁+α₁*F₁(1)；

In formula (1), ScortBert₁Word vector representing candidate text and information to be searchedDegree of correlation, α₁Denotes a penalty factor, F₁And the character relevance of the candidate text and the information to be searched is shown.

In other embodiments, when the penalty factor is used to characterize the confidence of the determination method of the word vector relevance, the electronic device may further determine a second product between the penalty factor and the word vector relevance; and determining the evaluation value of the corresponding candidate text according to the character relevancy corresponding to each candidate text and the second product.

For example, the electronic device may also determine the evaluation value Score of the candidate text according to the following formula (2)₂：

Score₂＝α₂*ScortBert₁+F₁(2)；

Understandably, penalty factor α₁Characterised by the accuracy of the method for determining the degree of correlation of the characters α₂Characterized by the accuracy of the method of determining the relevance of the word vector, as exemplified by equation (1), α₁When the value of (2) is greater than 1, the accuracy of the determination method representing the character relevance is higher than that of the determination method representing the word vector relevance, so that the character relevance accounts for a larger proportion in the determination of the evaluation value, the evaluation value of the candidate text which has large word vector relevance but is actually far from the content of the information to be searched can be reduced, and the candidate text recommended to the user is more in line with the actual requirements of the user₁When the value of (a) is less than 1, the accuracy of the determination method of the correlation degree of the explanatory character is lower than that of the correlation degree of the word vector α₁And α₂May be the same or different.

In step 209, candidate texts corresponding to evaluation values satisfying specific conditions are recommended.

An embodiment of the present application further provides a search recommendation method, where the method may include the following steps 301 to 307:

step 301, determining word vector correlation between information to be searched and each candidate text in a candidate text set;

step 302, according to the M division grammar, respectively performing character string division on the information to be searched to obtain M first character string sets; wherein M is an integer greater than 0;

step 303, according to the M-partition grammar, performing character string partition on the ith candidate text to obtain M second character string sets; wherein i is a positive integer less than or equal to the total number of candidate texts in the candidate set;

and step 304, determining the distance between the first character string set and the second character string set obtained by adopting the same segmentation grammar.

In some embodiments, the distance may be the same number of strings for both sets. For example, the M partition syntax is 1-GRAM and 2-GRAM, and the information to be searched is: tencent cloud, candidate texts are: tencent is the gaming company. According to 1-GRAM, the information to be searched is subjected to character string segmentation, and the obtained first character string set is { Teng/news/cloud }, wherein, "/" is a segmentation symbol and exists for convenience of description. According to 1-GRAM, the candidate text is subjected to character string segmentation, the obtained first character string set is { Teng/news/Yes/games/Gong/Ses }, and then the two sets have the same character string as { Teng/news }, so that the distance between the two sets is easily obtained to be 2.

According to 2-GRAM, character string segmentation is carried out on the information to be searched, the obtained first character string set is { Tencent/news cloud }, character string segmentation is carried out on the candidate text, the obtained first character string set is { Tencent/news is/game/public/company }, the same character string in the two sets is { Tencent }, and therefore the distance is 1.

Step 305, determining the character correlation degree between the information to be searched and the ith candidate text according to each distance.

It should be noted that, for each candidate text, the determination method of the character correlation degree is the same, i.e., all the determination methods can be realized through the above steps 302 to 305. Taking the ith candidate text as an example, in some embodiments, the electronic device may determine a match score between the first set of character strings and the ith second set of character strings of the candidate text according to the jth distance, the total number of character strings included in the first set of character strings, and the total number of character strings included in the ith second set of character strings of the candidate text; wherein j is an integer greater than 0 and less than or equal to M; and weighting each matching score to obtain the character relevancy between the information to be searched and the ith candidate text.

That is, each distance corresponds to a match score, and the determination method for each match score is the same. For example, in the above example, the M-partition syntax is 1-GRAM and 2-GRAM, and the information to be searched is: tencent cloud, candidate texts are: tencent is the gaming company. The two texts are subjected to character string segmentation by adopting 1-GRAM, the first character string set obtained respectively is { Teng/Centrin/cloud } and { Teng/Centrin/Ye/Games/Gong/Set }, the total number of character strings contained in the first character string set is 3, the total number of character strings contained in the second character string set is 7, the distance between the two sets is 2, and therefore the determined accurate value is 2 based on the distance

The determined recall value is

Substituting these two values into the following formula (3) can obtain the matching score Match _ Scores corresponding to the distance:

in equation (3), Precision represents an accurate value, and Recall represents a Recall value.

Similarly, based on the two first character string sets determined by the 2-GRAM, the corresponding matching score can be obtained according to the formula (3).

The weight for each match score is related to how well the grammar is segmented. For example, the M-partition grammars are 1-GRAM and 2-GRAM, and the weight of the 2-GRAM-based match score may be configured to be greater than the weight of the 1-GRAM-based match score.

Of course, in some embodiments, the electronic device may directly add each of the matching scores corresponding to the ith candidate text, and determine the added result as the character relevancy corresponding to the candidate text.

And step 306, fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text.

In some embodiments, the electronic device determines an evaluation value of a corresponding candidate text according to a penalty coefficient, the character relevance corresponding to each candidate text and the word vector relevance. For example, the electronic device may determine the evaluation value Score of the candidate text according to the following formula (4)₁：

Score₁＝ScortBert₁+α₁*F₁(4)；

In formula (4), ScortBert₁Word vector relatedness representing candidate text and information to be searched, α₁Denotes a penalty factor, F₁And the character relevance of the candidate text and the information to be searched is shown.

Step 307 is to recommend a candidate text corresponding to an evaluation value satisfying a specific condition.

The internet has many search engines, such as hundredths, ***, dog search, etc., and the so-called search engine is a retrieval technology that searches out formulated information from the internet by using a specific strategy according to user requirements and a certain algorithm and then feeds back the formulated information to the user. One key technology in the search engine is to match the relevance of the search query statement and the result title, and then return the result with high relevance to the query statement.

In this key technology, the relevant solutions are mainly based on statistics, deep learning, language model methods, etc. Statistical methods, such as Term Frequency-Inverse Document Frequency (TF-IDF), Best Match (Best Match25, BM25) and other algorithms, need to calculate a large amount of user statistical information, such as the number of documents, Term Frequency and the like, and cannot acquire semantic information of a user. Deep learning-based methods, such as Deep Reconstruction Classification Networks (DRCNs), ARC-I, ARC-II, etc., require a large number of labeled related corpora, and the quality of the corpora greatly affects the model effect. The language model includes a correlation model (Word to Vector, Word2Vec) for generating Word vectors, Doc2Vec, etc., and the Word vectors calculated by the correlation model and the Doc2Vec include semantics, but have certain limitations and are not accurate enough.

Based on this, an exemplary application of the embodiment of the present application in a practical application scenario will be described below.

The method selects to use the BERT model to obtain the word vector, and compared with a correlation method, the word vector obtained by the model contains richer semantic features, but many wrong terms are brought by the BERT model. Based on this, in the embodiment of the application, a character string distance calculation model based on N-GRAM is added, and the more the same character, the higher the sentence score, the deviation caused by BERT can be suppressed to a certain extent.

In the embodiment of the application, the core points of the solution are as follows:

firstly, Word vectors are obtained through a BERT model, and the method has better Word vectors than Word2Vec and Doc2Vec and has more semantic information than TF-IDF and BM 25;

secondly, the BERT model may bring some results with great deviation, and in the embodiment of the present application, an N-GRAM-based string distance calculation model is also designed and added, and the model can automatically analyze the string matching situation and return a score (i.e. the character correlation degree described in the above embodiment), and the score is incorporated into BERT when the matching degree is higher and the score is higher.

As shown in fig. 2, the figure is a general flow chart of a model for determining an evaluation value of a candidate text answer, the model takes a query sentence and the answer as input, and returns a score through model calculation, and the higher the score is, the higher the correlation degree between the query sentence and the answer is.

It should be noted that:

1, the candidate text set is mainly crawled from the internet (e.g., micro blogs).

2, the model shown in fig. 2 is mainly divided into two parts, i.e. a BERT model and an N-GRAM model, and word vector correlation and character correlation of two input texts can be obtained through the two models respectively, for example, as shown in fig. 2, a query sentence: "james" and answer statements: "how to treat the first jensel, disclose jensems? "input into BERT model and N-GRAM model, the obtained word vector correlation degree is 0.73415, and the obtained character correlation degree is 0.5751.

The BERT model may be a pre-trained model that is published and then fine-tuned with some similar data set.

And 4, firstly counting all possible 1-GRAM and 2-GRAM sets of the query statement and the answer statement based on the string distance model of the N-GRAM, and then calculating corresponding Precision value Precision and Recall value Recall. Finally, a character correlation F1 value is calculated according to the following formula (5), and a higher value indicates a higher degree of matching.

In the formula, Precision₁And Recall₁Precise values and recall values determined based on 1-GRAM; precision₂And Recall₂Is an accurate value and a recall value determined based on the 2-GRAM.

Combining Score ScoreBert of the BERT model with F1 to obtain final Score Score, wherein the combining formula is shown as the following formula (6), wherein α is a penalty coefficient:

Score＝ScortBert+α*F₁(6)。

in evaluating the model effect, the test data selects some titles from different fields on the network. When searching for "james sports", the results returned using the BERT model alone are shown in fig. 3, from which it can be seen that the BERT model learns some parts of the same semantics, such as "exercisers", "high speed", etc., all relate to sports, but the model also produces many bad terms such as "stock market", etc.

When the BERT model was added to the N-GRAM and searched for "jemes sports", the results were returned as shown in fig. 4, and it can be seen from fig. 4 that the results produced qualitative changes compared to fig. 3, and the first 10 pieces were basically related to the query "jemes sports", and secondly, the model also provided information on the team where "jemes" was located, information on the foreign number, "sports".

Based on the foregoing embodiments, the present application provides a search recommendation apparatus, where the apparatus includes modules and units included in the modules, and may be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 5 is a schematic structural diagram of a search recommendation apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus 500 includes a first determining module 501, a second determining module 502, a fusing module 503, and a recommending module 504; wherein the content of the first and second substances,

a first determining module 501, configured to determine a word vector relevancy between information to be searched and each candidate text in the candidate text set;

a second determining module 502, configured to determine a character relevancy between the information to be searched and each of the candidate texts;

a fusion module 503, configured to fuse the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value of the corresponding candidate text;

a recommending module 504, configured to recommend a candidate text corresponding to an evaluation value that satisfies a specific condition.

In some embodiments, the second determining module 502 is configured to: according to the M segmentation grammars, respectively carrying out character string segmentation on the information to be searched to obtain M first character string sets; wherein M is an integer greater than 0; according to the M segmentation grammars, respectively carrying out character string segmentation on the ith candidate text to obtain M second character string sets; wherein i is a positive integer less than or equal to the total number of candidate texts in the candidate set; and determining the character correlation degree between the information to be searched and the ith candidate text according to the M first character string sets and the M second character string sets.

In some embodiments, the second determining module 502 is configured to: determining the distance between a first character string set and a second character string set obtained by adopting the same segmentation grammar; and determining the character relevancy between the information to be searched and the ith candidate text according to each distance.

In some embodiments, M is an integer greater than 1, and the second determining module 502 is configured to: determining a matching score between the first character string set and the ith second character string set of the candidate text according to the jth distance, the total number of character strings included in the first character string set and the total number of character strings included in the ith second character string set of the candidate text; wherein j is an integer greater than 0 and less than or equal to M; and weighting each matching score to obtain the character relevancy between the information to be searched and the ith candidate text.

In some embodiments, the fusion module 503 is configured to: obtaining a penalty coefficient, wherein the penalty coefficient is used for representing the accuracy of a determination method of character relevancy or a determination method of word vector relevancy; and determining an evaluation value of the corresponding candidate text according to the penalty coefficient, the character relevance corresponding to each candidate text and the word vector relevance.

In some embodiments, the first determining module 501 is configured to: respectively extracting the characteristics of the information to be searched and each candidate text to obtain a first word vector of the information to be searched and a second word vector of the corresponding candidate text; processing the first word vector and a second word vector of the ith candidate text by using a classifier model obtained by training to obtain the probability that the information to be searched and the ith candidate text belong to similar categories; determining the probability as the word vector correlation degree between the information to be searched and the ith candidate text; wherein i is a positive integer less than or equal to the total number of candidate texts of the candidate set.

In some embodiments, the first determining module 501 is configured to: determining a text vector, a position vector and an initial word vector of the information to be searched; and processing the text vector, the position vector and the initial word vector by using a BERT model to obtain the first word vector.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the search recommendation method is implemented in the form of a software functional module and sold or used as a standalone product, the search recommendation method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling an electronic device (which may be an intelligent mobile terminal (e.g., a mobile phone), a tablet computer, an electronic book, a notebook computer, a desktop computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present application provides an electronic device, fig. 6 is a schematic diagram of a hardware entity of the electronic device according to the embodiment of the present application, and as shown in fig. 6, the hardware entity of the electronic device 600 includes: comprising a memory 601 and a processor 602, said memory 601 storing a computer program operable on said processor 602, said processor 602 implementing the steps in the search recommendation method provided in the above embodiments when executing said program.

The memory 601 is configured to store instructions and applications executable by the processor 602, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 602 and modules in the electronic device 600, and may be implemented by a FLASH memory (FLASH) or a Random Access Memory (RAM).

Correspondingly, the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the search recommendation method provided in the above embodiment.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling an electronic device (which may be an intelligent mobile terminal (e.g., a mobile phone), a tablet computer, an electronic book, a notebook computer, a desktop computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A search recommendation method, the method comprising:

determining the word vector correlation degree between the information to be searched and each candidate text in the candidate text set;

determining the character relevancy between the information to be searched and each candidate text;

fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text;

and recommending candidate texts corresponding to the evaluation values meeting the specific conditions.

2. The method of claim 1, wherein the determining the character relevance between the information to be searched and each candidate text comprises:

according to the M segmentation grammars, respectively carrying out character string segmentation on the information to be searched to obtain M first character string sets; wherein M is an integer greater than 0;

according to the M segmentation grammars, respectively carrying out character string segmentation on the ith candidate text to obtain M second character string sets; wherein i is a positive integer less than or equal to the total number of candidate texts in the candidate set;

and determining the character correlation degree between the information to be searched and the ith candidate text according to the M first character string sets and the M second character string sets.

3. The method according to claim 2, wherein the determining the character correlation between the information to be searched and the i-th candidate text according to the M first character string sets and the M second character string sets comprises:

determining the distance between a first character string set and a second character string set obtained by adopting the same segmentation grammar;

and determining the character relevancy between the information to be searched and the ith candidate text according to each distance.

4. The method according to claim 3, wherein M is an integer greater than 1, and said determining the character correlation between the information to be searched and the i-th candidate text according to each of the distances comprises:

determining a matching score between the first character string set and the ith second character string set of the candidate text according to the jth distance, the total number of character strings included in the first character string set and the total number of character strings included in the ith second character string set of the candidate text; wherein j is an integer greater than 0 and less than or equal to M;

and weighting each matching score to obtain the character relevancy between the information to be searched and the ith candidate text.

5. The method according to claim 1, wherein the fusing the word vector relevancy and the character relevancy corresponding to each of the candidate texts to obtain the evaluation value of the corresponding candidate text comprises:

obtaining a penalty coefficient, wherein the penalty coefficient is used for representing the accuracy of a determination method of character relevancy or a determination method of word vector relevancy;

and determining an evaluation value of the corresponding candidate text according to the penalty coefficient, the character relevance corresponding to each candidate text and the word vector relevance.

6. The method of claim 1, wherein determining a word vector relevance between the information to be searched and each candidate text in the candidate text set comprises:

respectively extracting the characteristics of the information to be searched and each candidate text to obtain a first word vector of the information to be searched and a second word vector of the corresponding candidate text;

processing the first word vector and a second word vector of the ith candidate text by using a classifier model obtained by training to obtain the probability that the information to be searched and the ith candidate text belong to similar categories;

determining the probability as the word vector correlation degree between the information to be searched and the ith candidate text; wherein i is a positive integer less than or equal to the total number of candidate texts of the candidate set.

7. The method of claim 6, wherein the extracting the features of the information to be searched to obtain a first word vector comprises:

determining a text vector, a position vector and an initial word vector of the information to be searched;

and processing the text vector, the position vector and the initial word vector by using a BERT model to obtain the first word vector.

8. A search recommendation apparatus, comprising:

the first determination module is used for determining the word vector relevancy between the information to be searched and each candidate text in the candidate text set;

the second determination module is used for determining the character relevancy between the information to be searched and each candidate text;

the fusion module is used for fusing the word vector relevancy and the character relevancy corresponding to each candidate text to obtain an evaluation value corresponding to the candidate text;

and the recommending module is used for recommending the candidate texts corresponding to the evaluation values meeting the specific conditions.

9. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the search recommendation method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the search recommendation method according to any one of claims 1 to 7.