CN111814058A

CN111814058A - Pushing method and device based on user intention, electronic equipment and storage medium

Info

Publication number: CN111814058A
Application number: CN202010844662.2A
Authority: CN
Inventors: 刘曙铭
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-10-23

Abstract

The embodiment of the application discloses a pushing method and device based on user intention, electronic equipment and a computer readable medium, and relates to the technical field of computer application. The method comprises the following steps: acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords; respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model; calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vector according to the similarity; and acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushing. Therefore, the search intention of the user can be mined according to the search word of the user, thereby effectively pushing the content.

Description

Pushing method and device based on user intention, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer application, in particular to a pushing method and device based on user intention, an electronic device and a storage medium.

Background

With the rapid development of the mobile era, the information on the network is continuously increased, users often search the required information in the massive information through the search engine, and the popularization of the search engine is also the most effective internet advertising channel at present. However, the current advertisement delivery mode depends on manual construction of a tag system, so that a proper advertisement is selected from a large number of advertisements for delivery, and the mode depends on field experience, has strong subjectivity and cannot meet the complex and variable advertisement delivery requirements. Therefore, how to accurately acquire the intention information of the user and push the advertisement according to the intention of the user so as to improve the pushing efficiency is urgently needed to be solved.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide a push method and apparatus based on user intention, an electronic device, and a storage medium, which can effectively perform content push.

In a first aspect, an embodiment of the present application provides a push method based on a user intention, where the method includes: acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords; respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model; calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vector according to the similarity; and acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushing.

In a second aspect, an embodiment of the present application further provides a pushing apparatus based on a user intention, where the apparatus includes: the information acquisition module is used for acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords; the vector acquisition module is used for respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model; the determining module is used for calculating the similarity between the search semantic vector and the content semantic vector and determining a target content semantic vector from the content semantic vector according to the similarity; and the processing module is used for acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector and pushing the content.

In a third aspect, an embodiment of the present application further provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the method.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments, not all embodiments, of the present application. All other embodiments and drawings obtained by a person skilled in the art based on the embodiments of the present application without any inventive step are within the scope of the present invention.

Fig. 1 shows a schematic diagram of an application environment suitable for the embodiment of the present application.

Fig. 2 shows a flowchart of a push method based on user's intention according to an embodiment of the present application.

Fig. 3 shows a flowchart of a push method based on user's intention according to another embodiment of the present application.

Fig. 4 shows a flowchart of a push method based on user's intention according to another embodiment of the present application.

Fig. 5 is a flowchart illustrating a push method based on user's intention according to still another embodiment of the present application.

Fig. 6 shows a block diagram of a pushing device based on user's intention according to an embodiment of the present application.

FIG. 7 shows a block diagram of an electronic device for executing a push method based on user intent according to an embodiment of the present application;

fig. 8 illustrates a block diagram of a computer-readable storage medium for executing a push method based on user intention according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

At present, with the continuous development of the internet, the number of internet netizens has been increased explosively, advertisement users often search for needed information in the mass information of the internet, search engines gradually become indispensable tools, and advertisements also gradually spread by taking the internet as a carrier. Currently, a search engine generally performs retrieval according to a search text input by a user to obtain a search result related to the search text, and provides the search result to the user for viewing, so it is very important how to obtain a search intention of the user according to the search text of the user to achieve effective advertisement delivery.

However, currently, it is understood that the user search intention is generally to classify a search text of a user by using a text classification model, obtain a tag corresponding to the search text of the user, obtain a correlation between the search text of the user and an advertisement according to the tag of the advertisement to be delivered, and deliver the advertisement. The inventor researches the difficulties of the current advertisement putting method, and finds that due to the complex and changeable requirements of advertisers, the corresponding label system is extremely complex, hundreds of classification labels are usually needed, the text classification model is a supervision model, and each class needs to be labeled from massive user search data to acquire training sample data for training, so that a large amount of manpower is needed for acquiring the labels of the training sample data, and the accuracy of the classification model depends on the quality and the quantity of the training sample data.

The manual construction of the label system depends on personal domain knowledge, and the existing human errors can influence the identification capability of the final model. Although a batch of keywords corresponding to the tags can be obtained by a data analysis method, and the training sample data can be quickly obtained by labeling the tags of the user search text by a keyword matching method, the method has two disadvantages. On one hand, because the words have ambiguity, the user intentions of the same words expressed in different scenes may be completely different, training sample data obtained by a keyword matching method may have much interference, and the recognition capability of a text classification model obtained by using the training sample data is poor; on the other hand, the distribution space of training sample data obtained through keyword matching is relatively narrow, only partial text data representations of the current label can be found, a large number of corpus space representations of the current label are easily lost, the corpus space of a search text which can be identified by the model obtained through the training sample data is relatively narrow, the identification effect of the model is poor, and the final advertisement putting condition is not ideal.

The inventor researches the difficult points of the existing advertisement putting method, more comprehensively considers the advertisement putting requirement of the actual scene, provides the pushing method, the pushing device, the electronic equipment and the storage medium based on the user intention, and excavates the search intention of the user according to the search word of the user, thereby effectively pushing the content.

In order to better understand the push method, device, electronic device and storage medium based on the user intention provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The push method based on user intention provided by the embodiment of the application can be applied to the polymorphic interaction system 10 shown in fig. 1. The polymorphic interaction system 10 includes a terminal device 100 and a server 200, the server 200 being communicatively coupled to the terminal device 100. The server 200 may be a conventional server or a cloud server, and is not limited herein.

In some embodiments, the user logs in through an account at the user terminal, and all information corresponding to the account can be stored in the storage space of the server 100. The server 100 may be an individual server, or a server cluster, or a local server, or a cloud server. The server 100 may be configured to push some content to the user terminal, and specifically, may be a certain application program that pushes the content to the user terminal, and the application program displays the content, so that the content can be pushed to a user corresponding to the user terminal.

The server 100 may be connected to a plurality of user terminals, and may push the content to be pushed to all the user terminals, or may select one of the user terminals according to some policies and push the content to be pushed to the selected user terminal. The specific policy may be determined according to the content to be pushed and the user corresponding to each user terminal. In the embodiment of the present application, the content to be pushed may be advertisement information, for example, product discount information of a certain e-commerce application program.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The push method, device, terminal device and storage medium based on user intention provided by the embodiments of the present application will be described in detail below with specific embodiments.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a pushing method based on user's intention according to an embodiment of the present application, where the pushing method based on user's intention according to the embodiment is applicable to a server in the system, that is, an execution subject of the method may be the server, and the method is used to improve accuracy of content pushed to a user, and specifically, as shown in fig. 2, the method includes: s110 to S140.

S110, acquiring a mapping relation set of the search terms input by the user and the content to be pushed.

Specifically, if the input form of the user is a text, the search word in the text form input by the user is obtained, and if the input form of the user is a voice, the voice information is converted into the search word in the text form.

The mapping relation set of the content to be pushed comprises: the content to be pushed is a thing waiting to be pushed or a virtual thing, and may be in a text form, a video form, a picture form, and is not limited herein. For example, the content to be pushed may be news, a list, a video, and the like to be pushed, or may be recommended goods, stores, services, and the like. The mapping relationship between the push content and the keyword may be that one keyword corresponds to one push content, or that one keyword corresponds to a plurality of push contents, which is not limited herein. As one way, the set of mapping relationships of the content to be pushed may be stored in various data structures, for example, the set of mapping relationships of the content vector may be stored in the form of a list, a table, a hash table, an array, a tree, etc. according to different types of data structures.

In some embodiments, the keyword in the mapping relationship set of the content to be pushed may be a title corresponding to the content to be pushed, because the title is usually an abstraction and generalization of the content to be pushed, and may be used to characterize the content to be pushed, and the mapping relationship set between the content to be pushed and the title may be preset and stored locally in the terminal device or in the server.

In other embodiments, the keyword in the mapping relationship set of the content to be pushed may also be a piece of key text of the content to be pushed. Specifically, when the content to be pushed contains text content, if the content to be pushed includes an abstract, the keyword of the pushed content may be the abstract of the pushed content; if the summary is not included in the content to be pushed, the keyword of the pushed content may be a first segment or a last segment of text of the pushed content, because the first segment of text content generally introduces the content of the page, and the last segment of text content generally summarizes the content of the page.

S120: and respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model.

The semantic vector is a vector for mapping the text information to a preset vector space to obtain the semantics for representing the text information, the search semantic vector corresponding to the search word can be obtained by taking the search word as the input of the pre-trained semantic understanding model, and the content semantic vector corresponding to the keyword can be obtained by taking the keyword as the input of the pre-trained semantic understanding model. The semantic understanding model can be a supervised model or an unsupervised model, and is obtained by pre-training text data.

Specifically, semantic understanding models include, but are not limited to: deep Neural Networks (DNN), Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and transformers. In some embodiments, the semantic understanding model may be a transformer-based bi-directional encoded representation network model (BERT), a word vector model (doc2Vec), and so forth.

As a mode, the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword may be obtained according to a specified time frequency, so as to obtain a change situation of the search intention of the user, and more effectively perform information pushing.

S130: and calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vectors according to the similarity.

After a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword are obtained, the similarity between the semantic vector and the content vector can be calculated, wherein the similarity is used for representing the correlation between the search semantic vector and the content semantic vector, namely the correlation between the search word and the keyword, and specifically, the higher the similarity is, the more the keyword can express the user intention reflected by the search word.

In some embodiments, based on a pre-trained semantic understanding model, the obtaining of the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword respectively may be performed by a server, the obtained search semantic vector and content semantic vector may be stored in a local database of an offline server or a terminal device, and a corresponding push device or the terminal device in the offline server may calculate the similarity of the search semantic vector and the content semantic vector in real time through the stored data of the search semantic vector and the content semantic vector.

Specifically, the target semantic vector is determined from the content semantic vectors according to the similarity, the content semantic vector with the similarity satisfying a specified condition is used as the target semantic vector, the specified condition is a condition related to an actual pushing scene, and the target semantic vector may be one vector or a plurality of vectors. As one way, the specified condition may be greater than a specific threshold, and accordingly, the content semantic vector corresponding to greater than the specific threshold is taken as the target semantic vector. The specific condition may also be that N content semantic vectors with the maximum similarity are obtained, where N is a preset integer greater than 0, and accordingly, the N content semantic vectors with the maximum similarity are used as the target semantic vector.

As an implementation mode, keywords of the content to be pushed can be optimized according to the similarity between the search semantic vector and the content semantic vector, so that users can be attracted better, and the conversion rate of the content to be pushed is improved. For example, when the content to be pushed is advertisement information, the advertiser can obtain a user group that the advertisement information wants to reach, and by analyzing the content semantic vector with higher similarity to the search semantic vector of the user group, the keyword corresponding to the content semantic vector that the user is interested in can be obtained, so that the advertiser is guided to optimize the keyword in the advertisement information according to the keyword, and the advertisement conversion effect is improved.

S140: and acquiring the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and pushing.

After the target semantic vector is obtained, the corresponding keyword can be obtained according to the target semantic vector, the content to be pushed corresponding to the keyword is obtained based on the mapping relation set of the content to be pushed, and the content to be pushed is pushed.

As a mode, the similarity corresponding to each target content semantic vector can be obtained, and the contents to be pushed corresponding to the target semantic vectors are sorted according to the sequence of the similarity from large to small, so that the contents to be pushed corresponding to the target semantic vectors with higher similarity are arranged in front, and a user can conveniently select the contents.

As a mode, content semantic vectors may also be pre-specified, the similarity between the content semantic vector and each search semantic vector is respectively calculated, and a search semantic vector with the similarity satisfying specified conditions is obtained, so that a user who inputs a search word corresponding to the search semantic vector is pushed, and the push information is content to be pushed corresponding to the specified content semantic vector.

According to the information pushing method based on the user intention, after a search word input by a user and a mapping relation set of contents to be pushed are obtained, the mapping relation set of the contents to be pushed comprises mapping relations between the contents to be pushed and keywords, a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keywords are respectively obtained based on a pre-trained semantic understanding model, the similarity between the search semantic vector and the content semantic vector is calculated, a target content semantic vector is determined from the content semantic vector according to the similarity, and therefore the contents to be pushed corresponding to the target content semantic vector are obtained according to the target content semantic vector and are pushed. The content to be pushed is obtained by searching the similarity of the semantic vector and the content semantic vector, so that the pushed content can better accord with the searching intention of the user, and the accuracy of the pushed content is further improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a push method based on user's intention according to another embodiment of the present application, and a server applied in the system is shown, that is, an execution subject of the method may be the server, and the method is used to improve the accuracy of content pushed to a user, and specifically, as shown in fig. 3, the method includes: s210 to S260.

S210, acquiring a mapping relation set of the search terms input by the user and the content to be pushed.

Step S210 may refer to step S110.

In some embodiments, the keywords of the content to be pushed may be keywords included in a title of the content to be pushed, and the user history search data may be obtained through a terminal buried point log, where the user history search data may include user history search terms and the content to be pushed browsed by the user corresponding to the history search terms. As a mode, the mapping relationship between the content to be pushed and the title of the content to be pushed can be obtained according to the content to be pushed browsed by the user, so that the mapping relationship is stored in the mapping relationship set of the content to be pushed; as another mode, when the content to be pushed is acquired, the mapping relationship between the content to be pushed and the title of the content to be pushed may be stored.

S220: and respectively obtaining the feature vector of the search word and the feature vector of the keyword.

In the embodiment of the application, the semantic understanding model used for obtaining the semantic vector is a BERT model, the feature vector of the search word and the feature vector of the keyword are respectively obtained and used as the input of the BERT model, the feature vector is generated by a coder for text information, the dimensionality of data can be reduced by coding the text, but different semantic representations cannot be obtained by the vector, the corresponding semantic vector is obtained by inputting the feature vector into the BERT model, and the semantic vector can be subjected to semantic representation according to the context of the statement.

In some embodiments, the obtaining of the feature vector of the search term and the feature vector of the keyword respectively includes obtaining a text vector, a position vector, and an initial term vector of the search term, and fusing the text vector, the position vector, and the initial term vector of the search term to form the feature vector of the search term; and acquiring a text vector, a position vector and an initial word vector of the keyword, and fusing the text vector, the position vector and the initial word vector of the keyword to form a feature vector of the content to be pushed.

The text vector (Token entries) comprises preset identifiers [ CLS ] and [ SEP ], and a hidden layer vector corresponding to each word indicates that for each search word or keyword, the [ CLS ] identifier is set at the beginning of the sentence to represent the beginning of the sentence, the [ SEP ] identifier is used as a separation between two adjacent or parallel sentences, a symbol [ SEP ] can be set at the end of the sentence, an initial word vector (Segment entries) is used for representing different sentences, and a Position vector (Position entries) is used for representing the Position of each word of the sentence in the sentence.

The vector fusion is a vector fusion, which means that a plurality of vectors are converted into one vector, and the vector fusion can comprise vector splicing, vector addition and the like according to different fusion modes. As a mode, in the BERT model and the variant model of the BERT model, the position vector is obtained in the training process using the sentence, and the text vector, the position vector, and the initial word vector may be vector-spliced and then output through a full-link layer. The feature vector is obtained by fusing various vectors, and the semantics of the statement can be better sensed by combining the context.

S230: and taking the feature vector of the search word as the input of a bidirectional coding representation network, and representing the network through bidirectional coding to obtain a search semantic vector corresponding to the search word.

The bidirectional coding Representation network is a bidirectional coding Representation network (BERT) based on a Transformer, the BERT model simultaneously uses context and uses a bidirectional Transformer to represent text, the Transformer consists of an encoder and a decoder, the encoder mainly consists of a plurality of encoding modules, each encoding module comprises a self-attention layer and a feedforward neural network layer, the decoder is similar to the encoder and also comprises a plurality of decoding modules, each decoding module comprises a self-attention layer and a feedforward neural network layer, and in addition, one more coding decoding layer is added.

The BERT model is a typical two-stage model and can be divided into a pre-training stage and a fine-tuning stage, wherein the pre-training stage mainly utilizes a Transformer as a feature extractor to learn massive unlabeled texts so as to learn linguistic knowledge, and finally obtains a representation mode of the texts, namely semantic vectors corresponding to the texts. And in the fine tuning stage, according to the text semantic knowledge learned in the pre-training stage, the model is subjected to fine tuning learning based on the downstream actual business requirement, so that the downstream actual business requirement is adapted. In this embodiment, in a pre-training stage of the BERT model, a text vector, a position vector, and a feature vector of an initial word vector, which are fused with a search word, are used as inputs of the BERT model, so as to obtain a search semantic vector capable of performing semantic representation according to a context of a sentence.

As an implementation mode, user search behavior data can be acquired through buried point log data of a search engine application program or a webpage, wherein the user search behavior data comprises a search word input by a user and content to be pushed clicked in a search result of the search word by the user, the content to be pushed clicked by the user can represent the search intention of the user, the ALBERT model can be trained by taking the search word input by the user and the content to be pushed clicked in the search result of the search word by the user as weak supervision data, the association between the search word of the user and the content to be pushed can be acquired, the association can be sufficiently matched in an interactive mode during model training, and better text representation can be acquired.

As an embodiment, a variant model ALBERT model of the BERT model may be employed. Unlike the BERT model that employs 12-layer transformers, the ALBERT model uses only 4-layer transformers, reduces the number of model parameters by factoring the embedded parameterization and cross-layer parameter sharing, with training parameters of about 400 tens of thousands and a model size of only 14M. In addition, the ALBERT model replaces Next Sentence Prediction (NOP) of the BERT model with Sequence Order Prediction (SOP), so that the continuity capability of the model learning sentences is enhanced, the capability of an automatic supervision learning task is improved, a plurality of temporary variables can be saved by removing dropouts, the utilization rate of memory in the model training process is effectively improved, the efficiency of the model is improved, and the scale of required training data is reduced. Compared with the BERT model, although the accuracy of the ALBERT model can be slightly reduced by 1 to 2 percent, the training speed and the prediction speed of the model can be improved by 2 to 3 times.

Because the user searching behavior has real-time performance, namely for some events, a search engine needs to meet the searching requirement of the user within a limited time, on one hand, the ALBERT model can be used for training according to the user searching behavior data more quickly so as to obtain the searching semantic vector corresponding to the searching word, and on the other hand, the ALBERT model can be used for taking the user searching behavior data closer to the current date as training data so as to better meet the real-time performance of the user searching behavior. For example, for the same sample number, the training time of 1 day is needed by using the BERT model, so that only the buried-point log data before 1 day can be used for training when the BERT model is trained, while only the training time of 2 hours is needed by using the ALBERT model, the buried-point log data before 2 hours can be obtained for training, and the training data closer to the current time can better reflect the searching intention of the user with timeliness.

S240: and taking the feature vector of the content to be pushed as the input of a bidirectional coding representation network, and representing the network through bidirectional coding to obtain a content semantic vector corresponding to the keyword.

It can be understood that the method for obtaining the content semantic vector corresponding to the keyword is the same as the method for obtaining the search semantic vector corresponding to the search term, and refer to step S230 specifically.

S250: and calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vectors according to the similarity.

S260: and acquiring the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and pushing.

It should be noted that, for parts not described in detail in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.

The information pushing method based on the user intention provided by the embodiment of the application is characterized in that a semantic understanding model is a bidirectional coding representation network BERT model based on a converter, after a search word input by a user and a mapping relation set of contents to be pushed are obtained, the mapping relation set of the contents to be pushed comprises the mapping relation between the contents to be pushed and keywords, a feature vector of the search word and a feature vector of the keyword are respectively obtained, the feature vector of the search word is used as the input of the bidirectional coding representation network, a search semantic vector corresponding to the search word is obtained through the bidirectional coding representation network, the feature vector of the keyword is used as the input of the bidirectional coding representation network, a content semantic vector corresponding to the keyword is obtained through the bidirectional coding representation network, the similarity of the search semantic vector and the content semantic vector is calculated, a target content semantic vector is determined from the content semantic vector according to the similarity, and then acquiring the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and pushing. Through the BERT model, semantic vectors which are corresponding to the search terms and the keywords and are subjected to semantic representation according to the context of the sentence can be obtained respectively, so that the search intention of the user is obtained according to the semantic vectors for pushing, manual labeling and training expectation are not needed, on one hand, the labor for labeling is saved, and on the other hand, the problem of inaccurate pushing caused by the fact that the label dimension is too coarse is avoided.

Referring to fig. 4, fig. 4 is a flowchart illustrating a push method based on user's intention according to another embodiment of the present application, and a server applied in the system, that is, an execution subject of the method may be the server, and the method is used to improve the accuracy of content pushed to a user, and specifically, as shown in fig. 4, the method includes: s310 to S360.

S310: and acquiring a mapping relation set of search terms input by a user and contents to be pushed.

S320: and respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model.

S330: a vector distance between the search semantic vector and the content semantic vector is calculated.

The vector Distance may be Euclidean Distance (Euclidean Distance), manhattan Distance (manhattan Distance), Chebyshev Distance (Chebyshev Distance), normalized Euclidean Distance (normalized Euclidean Distance), Cosine of included angle (Cosine), etc., which is not limited herein.

As an embodiment, the search semantic vector and the content semantic vector may be stored in an offline server, and the information push device in the online server may obtain the search semantic vector and the content semantic vector of the offline server through a network, and calculate a vector distance between the search semantic vector and the content semantic vector.

In some embodiments, the vector distance may be a cosine distance of an included angle between the search semantic vector and the content semantic vector, and specifically, the vector length of the search semantic vector and the vector length of the content semantic vector may be calculated, thereby obtaining the vector length of the search semantic and the vector length of the content semantic; calculating a vector inner product of the search semantic vector and the content semantic vector; calculating a cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length as a vector distance between the search semantic vector and the content semantic vector.

S340: and determining the similarity of the search semantic vector and the content semantic vector according to the vector distance.

The similarity of the search semantic vector and the content semantic vector can be determined according to the vector distance, the larger the vector distance is, the larger the difference between the search semantic vector and the content semantic vector is, and correspondingly, the lower the similarity of the search semantic vector and the content semantic vector is.

As one way, the normalized vector distance may be used as the similarity of the search semantic vector and the content semantic vector. As another way, a loss function for calculating similarity according to vector distance can be integrated on a pre-trained semantic understanding model, and the vector distance is converted into the vector similarity expressed in a probability form through a softmax function.

S350: and taking the content semantic vector with the similarity meeting the specified condition as a target content semantic vector.

The specified condition may be set according to a policy of pushing information, and the target semantic vector may be one vector or a plurality of vectors. As one way, the specified condition may be greater than a specific threshold, and accordingly, the content semantic vector corresponding to greater than the specific threshold is taken as the target semantic vector. The specific condition may also be that N content semantic vectors with the maximum similarity are obtained, where N is a preset integer greater than 0, and accordingly, the N content semantic vectors with the maximum similarity are used as the target semantic vector.

S360: and acquiring the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and pushing.

The information pushing method based on the user intention, provided by the embodiment of the application, includes the steps of obtaining a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises mapping relations between the contents to be pushed and keywords, then respectively obtaining a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keywords based on a pre-trained semantic understanding model, calculating a vector distance between the search semantic vector and the content semantic vector, determining similarity between the search semantic vector and the content semantic vector according to the vector distance, taking the content semantic vector with the similarity meeting specified conditions as a target content semantic vector, and obtaining the contents to be pushed corresponding to the target content semantic vector according to the target content semantic vector for pushing. The vector distance is simple to calculate and easy to realize, and the similarity between the search semantic vector and the content semantic vector can be rapidly acquired in real time, so that the content can be efficiently pushed.

Referring to fig. 5, fig. 5 is a flowchart illustrating a push method based on user's intention according to still another embodiment of the present application, and a server applied in the system is shown, that is, an execution subject of the method may be the server, and the method is used to improve the accuracy of content pushed to a user, and specifically, as shown in fig. 5, the method includes: s410 to S450.

S410: and acquiring page text information of a user browsing page.

As a way, the browsing page of the user may also be a browsing page which is mainly used for browsing purposes and is not generated by an explicit retrieval behavior in the scene of the information flow, and is not limited herein. By acquiring the page text information of the page browsed by the user, the mapping relation between the user and the page text information can be obtained.

As a mode, the page text information of the page browsed by the user can be obtained through the search history behavior data generated when the user searches, where the page text information may include the content of the page browsed by the user and may also include context information when the user browses the page, and when the user browses the page, the page browsed by each click may reflect a tendency when the user browses, that is, each page text information browsed by the user may reflect intention information of the user. For example, most of the pages browsed by the user are interfaces related to a mother-infant forum, and it can be presumed that the user may be a beginner and parent who pays attention to the field of infants, so that the user may be interested in related contents of pushed infant products. For another example, by obtaining the page analysis browsed by the user, the number of words in the text content of the page is less, and the user may prefer to push short text content.

S420: and acquiring a page semantic vector corresponding to the page text information based on a pre-trained semantic understanding model.

The pre-trained semantic understanding model may be a BERT model, an ALBERT model, or other machine learning models, which is not limited herein. The page semantic vector corresponding to the page text information can be obtained by inputting the page textbook information into a semantic understanding model obtained by pre-training, wherein the page semantic vector is a vector used for representing semantic content of the page text information of a page browsed by a user and reflects intention information of browsing behaviors after the user searches.

S430: and calculating the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity.

The first similarity is used to represent the similarity between the search semantic vector and the content semantic vector, specifically calculate the cosine distance between the search semantic vector and the content semantic vector, and refer to steps S330 to S340 in the process of obtaining the first similarity.

S440: and calculating the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity.

The second similarity is used to represent the similarity between the page semantic vector and the content semantic vector, and it can be understood that the process of calculating the cosine distance between the search semantic vector and the content semantic vector to obtain the second similarity is similar to step S430, which refers to step S330 to step S340.

S450: and determining the semantic vector of the target content according to the first similarity and the second similarity.

As an implementation manner, different weight values may be respectively assigned to the first similarity and the second similarity corresponding to the content semantic vector, a value obtained by weighted summation of the first similarity and the second similarity is used as a comprehensive similarity, and the content semantic vector corresponding to the comprehensive similarity meeting a specified condition is used as the target content semantic vector.

As another implementation, a target content semantic vector may be specified, a search semantic vector corresponding to a first similarity meeting a first specified condition is obtained, and a first user group corresponding to the search semantic vector is further obtained; and acquiring a content semantic vector corresponding to the second similarity meeting a second specified condition, and further acquiring a second user group corresponding to the content semantic vector. And acquiring a target user for information pushing by taking the intersection of the first user group and the second user group, and pushing a target content semantic vector to the target user.

The information pushing method based on the user intention, provided by the embodiment of the application, includes the steps of obtaining a search word input by a user, respectively obtaining a search semantic vector corresponding to the search word and a content semantic vector corresponding to a keyword based on a pre-trained semantic understanding model, obtaining a page semantic vector corresponding to page text information based on the pre-trained semantic understanding model by obtaining page text information of a page browsed by the user, calculating a cosine distance between the search semantic vector and the content semantic vector, obtaining a first similarity, calculating a cosine distance between the page semantic vector and the content semantic vector, obtaining a second similarity, and determining a target content semantic vector according to the first similarity and the second similarity, so that content to be pushed corresponding to the target content semantic vector is pushed. By acquiring the page text information of the page browsed by the user, the user intention can be understood from more dimensions, so that a more effective information push method is realized.

Referring to fig. 6, a block diagram of a pushing apparatus 600 based on user's intention according to an embodiment of the present application is shown, where the apparatus is applied to a server in the above system, and the apparatus may include: an information acquisition module 610, a vector acquisition module 620, a determination module 630, and a processing module 640.

The information obtaining module 610 obtains a search term input by a user and a mapping relationship set of contents to be pushed, where the mapping relationship set of contents to be pushed includes: and mapping relation between the content to be pushed and the keywords.

Further, the information obtaining module 610 includes: search for data and obtain submodule, data storage submodule and video display submodule, wherein:

and the search data acquisition submodule is used for acquiring historical search data of the user, wherein the historical search data of the user comprises historical search words of the user and the content to be pushed browsed by the user corresponding to the historical search words.

And the data storage submodule is used for storing the mapping relationship between the content to be pushed and the title of the content to be pushed in the mapping relationship set of the content to be pushed.

The vector obtaining module 620 obtains a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword, respectively, based on a pre-trained semantic understanding model.

Further, the vector acquisition model 620 includes: the device comprises a feature vector acquisition submodule, a search semantic vector acquisition submodule and a content semantic vector acquisition submodule, wherein:

and the feature vector acquisition sub-module is used for respectively acquiring the feature vector of the search word and the feature vector of the keyword.

Further, the feature vector acquisition sub-module includes: search term vector fuses unit and keyword vector fuses unit, wherein:

and the search word vector fusion unit is used for acquiring the text vector, the position vector and the initial word vector of the search word, and fusing the text vector, the position vector and the initial word vector of the search word to form the feature vector of the search word.

And the keyword vector fusion unit is used for acquiring the text vector, the position vector and the initial word vector of the keyword, and fusing the text vector, the position vector and the initial word vector of the keyword to form the feature vector of the content to be pushed.

And the search semantic vector acquisition submodule is used for taking the feature vector of the search word as the input of the bidirectional coding representation network and obtaining the search semantic vector corresponding to the search word through the bidirectional coding representation network.

And the content semantic vector acquisition sub-module is used for taking the feature vector of the keyword as the input of the bidirectional coding representation network and obtaining the content semantic vector corresponding to the keyword through the bidirectional coding representation network.

The determining module 630 calculates the similarity between the search semantic vector and the content semantic vector, and determines a target content semantic vector from the content semantic vectors according to the similarity.

Further, the determining module 630 includes: the device comprises a distance calculation submodule, a similarity determination submodule and a target vector determination submodule, wherein:

a distance calculation sub-module for calculating a vector distance between the search semantic vector and the content semantic vector.

Further, the distance calculation sub-module includes: length calculation unit, inner product calculation unit and cosine distance calculation unit, wherein:

and the length calculating unit is used for calculating the vector length of the search semantic vector and the content semantic vector.

And the inner product calculating unit is used for calculating the vector inner product of the search semantic vector and the content semantic vector.

A cosine distance calculation unit for calculating a cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length as a vector distance between the search semantic vector and the content semantic vector.

And the similarity determining submodule is used for determining the similarity of the search semantic vector and the content semantic vector according to the vector distance.

And the target vector determination submodule is used for taking the content semantic vector with the similarity meeting the specified condition as the target content semantic vector.

And the processing module 640 acquires the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushes the content.

Further, the apparatus may further include: the system comprises a text information acquisition module, a page semantic acquisition module, a first calculation module, a second calculation module and a comprehensive determination module, wherein:

and the text information acquisition module is used for acquiring page text information of a page browsed by a user.

And the page semantic acquisition module is used for acquiring a page semantic vector corresponding to the page text information based on a pre-trained semantic understanding model.

And the first calculation module is used for calculating the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity.

And the second calculation module is used for calculating the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity.

And the comprehensive determining module is used for determining the semantic vector of the target content according to the first similarity and the second similarity.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 7, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device may be the server 100 described above. The electronic device 700 in the present application may include one or more of the following components: a processor 710, a memory 720, and one or more applications, wherein the one or more applications may be stored in the memory 720 and configured to be executed by the one or more processors 710, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 710 may include one or more processing cores. The processor 710 connects various parts throughout the electronic device 700 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720 and calling data stored in the memory 120. Alternatively, the processor 710 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 710 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 710, but may be implemented by a communication chip.

The Memory 720 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 720 may be used to store instructions, programs, code sets, or instruction sets. The memory 720 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal 100 in use, such as a phonebook, audio-video data, chat log data, and the like.

Referring to fig. 8, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.

To sum up, the pushing method, device, electronic device and computer-readable medium based on the user intention, provided by the application, obtain a search term input by a user and a mapping relationship set of contents to be pushed, where the mapping relationship set of contents to be pushed includes: mapping relation between the content to be pushed and the keywords; respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model; calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vector according to the similarity; and acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushing. Therefore, the search intention of the user can be mined according to the search word of the user, thereby effectively pushing the content.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A pushing method based on user intention is characterized by comprising the following steps:

acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords;

respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model;

calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vector according to the similarity;

and acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushing.

2. The method according to claim 1, wherein the semantic understanding model is a bidirectional coding representation network based on a converter, and the obtaining of the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword respectively based on the pre-trained semantic understanding model comprises:

respectively obtaining the feature vector of the search word and the feature vector of the keyword;

taking the feature vector of the search word as the input of the bidirectional coding representation network, and obtaining a search semantic vector corresponding to the search word through the bidirectional coding representation network;

and taking the feature vector of the keyword as the input of the bidirectional coding representation network, and obtaining a content semantic vector corresponding to the keyword through the bidirectional coding representation network.

3. The method according to claim 2, wherein the obtaining the feature vector of the search term and the feature vector of the keyword respectively comprises:

acquiring a text vector, a position vector and an initial word vector of the search word, and fusing the text vector, the position vector and the initial word vector of the search word to form a feature vector of the search word;

and acquiring a text vector, a position vector and an initial word vector of the keyword, and fusing the text vector, the position vector and the initial word vector of the keyword to form a feature vector of the content to be pushed.

4. The method of claim 1, wherein the calculating a similarity between the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vectors according to the similarity comprises:

calculating a vector distance between the search semantic vector and the content semantic vector;

determining the similarity of the search semantic vector and the content semantic vector according to the vector distance;

and taking the content semantic vector with the similarity meeting a specified condition as the target content semantic vector.

5. The method of claim 4, wherein the calculating a vector distance between the search semantic vector and the content semantic vector comprises:

calculating the vector length of the search semantic vector and the content semantic vector;

calculating a vector inner product of the search semantic vector and the content semantic vector;

calculating a cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length as a vector distance between the search semantic vector and the content semantic vector.

6. The method of claim 1, further comprising:

acquiring page text information of a user browsing page;

acquiring a page semantic vector corresponding to the page text information based on a pre-trained semantic understanding model;

calculating the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity;

calculating the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity;

and determining the semantic vector of the target content according to the first similarity and the second similarity.

7. The method according to any one of claims 1 to 6, wherein the keywords of the content to be pushed are keywords included in a title of the content to be pushed, and the obtaining of the search term input by the user and the mapping relationship set of the content to be pushed comprises:

acquiring historical search data of a user, wherein the historical search data of the user comprises historical search words of the user and the content to be pushed browsed by the user corresponding to the historical search words;

and storing the mapping relation between the content to be pushed and the title of the content to be pushed in the mapping relation set of the content to be pushed.

8. A push device based on user intent, comprising:

the information acquisition module is used for acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords;

the vector acquisition module is used for respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model;

the determining module is used for calculating the similarity between the search semantic vector and the content semantic vector and determining a target content semantic vector from the content semantic vector according to the similarity;

and the processing module is used for acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector and pushing the content.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable medium having stored program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1-7 when executed by the processor.